Sei sulla pagina 1di 565

Modelingand Analysis Principles

for
Chemicaland BiologicalEngineers
Michael D. Graham
James B. Rawlings

p(E,t)

0/

Publis ing

Modeling and Analysis Principles


for
chemical and BiologicalEngineers
Michael D. Graham and James B. Rawlings
Department of Chemical and Biological
Engineering
University of Wisconsin-Madison
Madison, Wisconsin

Publishing

Madison, Wisconsin

and printed
in Lucida using LATEX,
set
was
and bound
book
This
by
Cover
ham

M. and
design by Cheryl

James B. Rawlings, and


Michaelb.

by Nob Hill Publishing, LLC


CopyTight 2013
All rights reserved.
Nob Hill Publishing,

LLC

Cheryl M. Rawlings,

publisher

Madison, WI 53705

orders@nobhi11publ i shi ng.com


i shi ng.com
http : //www.nobhi publ
reproduced, in any form or by
any means
No part of this book may be
the
from
publisher.
writing
without permission in

Library of Congress Control Number: 2012956351


Graham, Michael D.

ModelingandAnalysis Principles for

Chemicaland Biological Engineers \

by Michael D. Grahamand James B. Rawlings


cm.

Includes bibliographical references (p.) and index.


ISBN 978-0-9759377-1-6 (cloth)

cal modeling.
1. Chemicalengineering. 2. Mathemati
I. Rawlings, JamesB. 11. Title.
Printed in the United States of America.
First Printing

May 2013

FSC

www.fscorg

MIX
Paper from
responsible sources

FSC' C002589

To my father and the memory of my mother.


MDG

Tomy graduate students, who have been some of


my best teachers.
JBR

Preface
by modern chemical and biological engineers
undertaken
.
Research
mathematical principles and
of
range
methods
corporates a wide
struggled
to
incorporate
authors
as the
book came about two-semester course sequence for new modernlip
or
graduate
ics into a oneof
aspects
essential
traditional
losing the
mathemah.s
dents, while not
decided
are
we
that
particularly
Topics
important
cal modeling syllabi. traditional texts include: matrix
in
factorizations
but not represented
basic
decomposition,
qualitative
value
dynamics
such as the singular
integral
representations
equations,
of partial
of nonlinear differential
stochastic
and
probability
processes, and
state
differential equations,
the
in
more
many
book.
find
will
reader
Thesetopics
estimation. The
often
have
which
texts,
a
many
bias towardthe
are generally absent in
early 20th-century physics. Wealsobe.
mathematics of 19th- through
substantial interest to activeresearchers
lieve that the book will be of
survey of the applied mathematics COmmonly
as it is in many respects a

engineering practitioners,and
encountered by chemical and biological
certainly absent in their chemicontains many topics that were almost
cal engineeringgraduate coursework.
Due to the wide range of topics that we have incorporated, the level
of discussion in the book ranges from very detailed to broadly descriptive, allowingus to focus on important core topics while also introducing the reader to more advanced or specialized ones. Someimportant
but technical subjects such as convergence of power series havebeen
treated only briefly,with references to more detailed sources. Ween-

courageinstructors and students to browse the exercises. Manyof


these illustrate applications of the chapter material, for example,the

numerical stability of the Verlet algorithm used in molecular dynamics


simulation. Others deepen, complement, and extend the discussionin
the text.

Duringtheir undergraduate education in chemical and biological


engineering,students become very accomplished at numericalexamples and problem solving. This is not a book with lots of numerical
examples. Engineeringgraduate students need to make the shiftfrom
applyingmathematical tools to developing and understandingthem.
As such, substantial emphasis in this book is on derivations and some-

vii

we believe the text contains a healthy mix of funshort proofs.


mathematics, analytical solution techniques, and numerical
Researche

and tools, because these guide analysis and underprinciples,


tures,
they also must be able to produce quantitative answers.
and
standing,
text will enable them to do both.
this
Wehope

methods.

JBR

MPG

Wisconsin
Madison,

Madison, Wisconsin

CO S

Acknowledgments
notes for graduate level analysis
lecture
the
of
out
Thisbookgrew the authors in the Department of Chemical and Biocoursestaughtby
the University of Wisconsin-Madison. We have
at
Engineering
logical
many graduate students taking these
of
feedback
the
benefitedfrom
with which they received some
enthusiasm
the
appreciate
classes,and
the notes. Especially Andres Merchan,
earlyandincompletedrafts of
helpful discussion and
provided
KushalSinha,and Megan Zagrobelny
assistance.

Wehavehad numerous helpful discussions with colleagues on many

topicscoveredin the text. JBRwould like to acknowledge especially


DaveAnderson,David Mayne, Gabriele Pannocchia, and Joe Qin for
their interest and helpful suggestions.

Severalcolleaguesgave us helpful reviews of book chapters. We

wouldlike to thank Prodromos Daoutidis, Tunde Ogunnaike, Patrick


Underhill,VenkatGanesan, Dave Anderson, and Jean-Luc Thiffeault
for their valuablefeedback. We are also grateful to colleagues who

respondedto a surveythat we conducted to gather information on


mathematical
modelingcourses for chemical and biological engineer-

ing graduatestudents. Their


valuable feedback had significant impact
on the content of this book.
Severalmembersof our
helpedus typeset solutionsresearch groups also reviewed chapters, and
to some of the exercises. Anubhav, Cuyler
Bates,AnkurGupta,
Rafael Henriquez, Amit
Kumar, Jae Sung Park, and
Sung-Ning
Wangdeserve special
mention.
JohnEatongenerously
provided his usual invaluable computing and
typesetting
expertise.MDGis
anceduringthe
grateful to his family for their forbearpreparation
of this book. Our
the staff at Nob
special relationship with
HillPublishing
again made the book production process

contents

LinearAlgebra Linear
and
Spaces
1.1 VectorsSubspaces .

1.1.1

1.12 Length, Distance, and Alignment .

1.1.3 Linear Independence and Bases


Operators and Matrices
1.2 Linear
1.2.1 Addition and Multiplication of Matrices

1.22 Transpose and Adjoint .


12.3 Einstein Summation Convention

12.4 Gram-Schmidt Orthogonalization and the QRDe-

2
2

4
5

6
8
9

composition

10
1.2.5 The Outer Product, Dyads, and Projection Operators
11
1.2.6 Partitioned Matrices and Matrix Operations
12
1.3 Systems of Linear Algebraic Equations
1.3.1 Introduction to Existence and Uniqueness .

1.32 SolvingAx = b: LU Decomposition


1.3.3 The Determinant . .

16
18

1.3.4 Rank of a Matrix .


19
1.3.5 Range Space and Null Space of a Matrix
20
and
Uniqueness
in Terms of Rank and
1.3.6 Existence
Null Space .
22
1.3.7 Least-Squares Solution
22
27
1.3.8 Minimum Norm Solution
28
1.3.9 Rank, Nullity, and the Buckingham Pi Theorem .
1.3.10Nonlinear Algebraic Equations: the Newton-Raphson
30

Method .

1.3.11Linear Coordinate Transformations


1.4 The Algebraic Eigenvalue Problem
1.4.1 Introduction
1.42 Self-Adjoint Matrices
1.4.3 General (Square) Matrices
1.4.4 Positive Definite Matrices
1.4.5 Eigenvalues, Eigenvectors, and Coordinate Trans-

formations

ix

33
33
33
35
37
41
42

ontehts

Decomposition . . .
1.4.6 Schur

Value Decomposition
1.4.7 Singular
of Matrices
1.5 Functions
Polyomial and Exponential .

15.1
1.52 OptimizingQuadratic Functions
1.5.3 VecOperator and Kronecker Product of Matrices

52

32

1.6 Exercises .

Equations
2 OrdinaryDifferential

69

2.1 Introduction. .
2.2 First-OrderLinear Systems .
22.1 SuperpositionPrinciple for Linear Differential Equations
22.2 HomogeneousLinear Systems with Constant Coefficients

22.3 QualitativeDynamicsof Planar Systems


22.4 LaplaceTransform Methods for Solving the InhomogeneousConstant-Coefficient Problem

22.5 Delta Function


2.3 Linear Equations with Variable Coefficients
2.3.1 Introduction
2.32 The Cauchy-EulerEquation

2.33 SeriesSolutionsand the Method of Frobenius

2.4 Function Spaces and Differential Operators


2.4.1 Functions as Vectors

2.42 Self-AdjointDifferentialOperators and SturmLiouville Equations

2.4.3 Existenceand Uniqueness of Solutions

2.5 LyapunovFunctions and Stability


2.5.1 Types of Stability

2.52 LyapunovFunctions. . .

97
97
98
98
99

102

104
110
112
112
113
118
118

126
133
145
145
148
153
155
158
158

2.5.3 Applicationto Linear Systems


2.5.4 DiscreteTime Systems
2.6 AsymptoticAnalysisand Perturbation Methods
2.6.1 Introduction
2.62 SeriesApproximations:Convergence, Asymptotic158
ness, Uniformity
162
2.6.3 Scaling,and Regular and Singular Perturbations
165
2.6.4 RegularPerturbation Analysis of an ODE.
166
2.6.5 Matched Asymptotic Expansions .

xi
contents

174
Method of Multiple Scales
2.6.6
Dynamics of Nonlinear Initial-ValueProblems 179
Qualitativ e
179
Introduction
2.7
2.7.1
179
Subspaces and Manifolds
2.7.2 Invariant
183
Special Nonlinear Systems
2.7.3 Some
187
Behavior and Attractors
2.7.4 Long-Time
Fundamental Local Bifurcations of Steady States 193
2.7.5 The
200
Solutions of Initial-Value Problems . . .
Numerical
2.8
201
Methods: Accuracy and Stability
2.8.1 Euler
204
Accuracy, and Stiff Systems
2.8.2 Stability,
204
Methods
2.8.3 Higher-Order
208
Solutions of Boundary-ValueProblems
Numerical
2.9
208
of Weighted Residuals
2.9.1 The Method
220
.
2.10 Exercises

Calculus and Partial Differential Equations


Vector
3
and Tensor Algebra .
3.1 Vector
3.1.1 Introduction
3.12 Vectors in Three Physical Dimensions
and Integral Theorems
3.2 DifferentialOperators
32.1 Divergence,Gradient, and Curl
3.22 The Gradient Operator in Non-CartesianCoordinates
32.3 The DivergenceTheorem
32.4 Further Integral Relations and Adjoints of Multidimensional Differential Operators
3.3 Linear Partial Differential Equations: Properties and
Solution Techniques .
3.3.1 Classification and Canonical Forms for SecondOrder Partial Differential Equations . . .
3.32 Separation of Variables and Eigenfunction Expansion with Equations involving V2
3.3.3 Laplace's Equation, Spherical Harmonics, and the
HydrogenAtom
3.3.4 Applications of the Fourier Transform to PDEs .
3.3.5 Green's Functions and Boundary-ValueProblems
3.3.6 Characteristics and D'Alembert's Solution to the
WaveEquation
3.3.7 Laplace Transform Methods . . .

253
253
253
253

256
256
258
264
269

271
271

272

287
291

297
305
308

xi
contents

174
of Multiple Scales
2.6.6 Method
Dynamics of Nonlinear Initial-Value Problems 179
Qualitative
179
2.7
Introduction
2.7.1
179
Subspaces and Manifolds
2.7.2 Invariant
183
Special Nonlinear Systems
2.7.3 Some
187
Behavior and Attractors
2.7.4 Long-Time
21.5 The Fundamental Local Bifurcations of Steady States 193
200
Solutions of Initial-ValueProblems
Numerical
2.8
201
2.8.1 Euler Methods: Accuracy and Stability
204
2.82 Stability,Accuracy, and Stiff Systems
204
2.8.3 Higher-OrderMethods
208
Solutions of Boundary-ValueProblems
2.9 Numerical
208
2.9.1 The Method of Weighted Residuals
220
Exercises

2.10

and Partial Differential Equations


3 VectorCalculus
Algebra
3.1 Vectorand Tensor
3.1.1 Introduction
3.12 Vectors in Three Physical Dimensions
3.2 DifferentialOperators and Integral Theorems
32.1 Divergence,Gradient, and Curl
3.22 The Gradient Operator in Non-Cartesian Coordinates
32.3 The DivergenceTheorem
32.4 Further Integral Relations and Adjoints of Multidimensional Differential Operators
3.3 LinearPartial Differential Equations: Properties and
Solution Techniques . .
3.3.1 Classificationand Canonical Forms for SecondOrder Partial Differential Equations

3.32 Separation of Variables and Eigenfunction Expansion with Equations involving V2


3.3.3 Laplace's Equation, Spherical Harmonics, and the
HydrogenAtom
3.3.4 Applications of the Fourier Transform to PDEs .
3.3.5 Green's Functions and Boundary-ValueProblems
3.3.6 Characteristics and D'Alembert's Solution to the
WaveEquation

3.3.7 Laplace Transform Methods .

253
253
253
253
256
256

258
264

269
271
271

272

287
291
297
305
308

xii

3.4 Numerical Solution of Initial-Boundary-Value

Contents

Problems.

3.4.1 Numerical Stability Analysis for the


Diffusion
3.42 Numerical Stability Analysis for the
Convection
3.4.3 Operator Splitting for Convection-Diffusion

Prob-

3.5 Exercises

4 Probability, Random Variables, and Estimation


4.1 Introduction and the Axioms of Probability
4.2 Random Variables and the Probability Density
Function
4.3 MultivariateDensity Functions
4.3.1 Multivariatenormal density
4.32 Functions of random variables.
4.3.3 Statistical Independence and Correlation .

4.4 Sampling

316

320
323
325

347
347
349
356
358
368
370
374

4.4.1 Linear Transformation


375
4.42 Sample Mean, Sample Variance, and Standard
Error 379
4.5 Central Limit Theorems
381
4.5.1 Identically distributed random variables
383
4.52 Randomvariables with different distributions
386
4.5.3 Multidimensionalcentral limit theorems .
387
4.6 Conditional Density Function and Bayes's Theorem
388
4.7 Maximum-LikelihoodEstimation
392
4.7.1 Scalar Measurement y , Known Measurement Variance
394
y,
Unknown
4.72 Scalar Measurement
Measurement
Variance

4.7.3 Vector of Measurements y, Different Parameters Corresponding to Different Measurements,


Known Measurement Covariance R
Vector of Measurements y, Different Parameters Correspondingto Different Measurements,
Unknown Measurement Covariance R .
4.7.5 Vector of Measurements y, Same Parameters for
all Measurements, Known Measurement Covariance R

399

404

410

412

contents

4.7.6 Vector
all Measurements,

xiii

Unknown

Measurement
co-

PLS regression .
4.8 PCA and
4.9 Appendix Proof of the Central
Limit Theorem
Exercises .
4.10

Models and Processes


5 Stochastic
5.1

414
416
425
430

Introduction .

455

5.2 Stochastic Processes for Continuous


Random Variables.
5.2.1 Discrete Time Stochastic
456
Processes
456
5.22 Wiener Process and Brownian
Motion .
459
52.3 Stochastic Differential Equations
.

52.4 Fokker-Planck Equation

5.3 Stochastic Kinetics .


5.3.1 Introduction, and Length and Time
Scales .

5.32 Poisson Process


5.3.3 Stochastic Simulation .

5.3.4 Master Equation of Chemical Kinetics .

53.5 Microscopic, Mesoscopic, and MacroscopicKinetic Models

5.4 Optimal Linear State Estimation

5.4.1 Introduction
5.42 Optimal Dynamic Estimator .

5.4.3 Optimal Steady-State Estimator


5.4.4 Observability of a Linear System
5.4.5 Stability of an Optimal Estimator
5.5 Exercises

A MathematicalTables
A.l LaplaceTransform Table
A.2 Statistical Distributions ..
A.3 Vector and Matrix Derivatives ...

A.3.1 Derivatives: Other Conventions


A.4 Exercises .

AuthorIndex
CitationIndex
SubjectIndex

463
470
475
475
477
483
486

492
498
498
501
506
508
511
513

528
528
531
532
536
538

540
543

546

List of Figures

1.1 The four fundamental subspaces of matrix


A

1.2 Least-squaressolution of Ax = b; projection


of b into
R(A) and residual r = Axo b in

23

for solving

26

N (AT).
1.3 An iteration of the Newton-Raphson method

1.4 The four fundamental subspaces of matrix


A=
1.5 Convexfunction. The straight line connecting USVT
on the function curve lies above the function;two points
(1 a) f (y) f(cxx + (1 cx)y) for all x, y. cxf(x) +
1.6 Contours of constant f (x) x TAx.
1.7 Two vectors in R2 and the angle between them.
1.8 Experimentalmeasurements of variable y versus
1.9 Measuredrate constant at several temperatures. x.
1.10 Plot of Ax as x moves around a unit circle.
1.11 Manipulatedinput u and disturbance d combine
to affect
y.

output

2.1

R2x2

regimes for the planar system dx/dt =


Ax

2.2 Dynamicalbehavior on the region boundaries for the


planar system dx/dt = Ax, A e R2x2
2.3 Particle of mass m at position y experiences spring force
Kyand applied force F(t). .

2.4 Functionf (x) = exp (


and truncated trigonometric Fourier series approximations with K = 2, 5, 10.
The approximationswith K = 5 and K = 10 are visually

indistinguishable from the exact function.


2.5 Truncated trigonometric Fourier series approximation to
f (x) = x, using K = 5, 10, 50. The wiggles get finer as K
increases
2.6 Functionf (x) = exp ( 8x 2) and truncated LegendreFourier series approximations with n = 2, 5, 10.

2.7 Functionf (x) = H(x) and truncated Legendre-Fourier


series approximations with n

xiv

10, 50, 100.

31

so

59

60
69
74
76

84

86

103

104
107

123

125

127
128

Figures
of
List

xv

to the initial-value
problem with
2.8 Solution
nonhomogeboundary
conditions.
neous

behavior; stability (left) and


2.9 Solution
asymptotic stability
(right). .
2.10 A simple mechanical system with total energy E,
internal
energy U, kinetic energy T = (1/2)mv2,
and potential
energy K = tngh.
2.11 The origin and sets D, Br, V(shaded), and B.
2.12 Leading-order inner Uo, outer uo, and composite solutions uoc, for Example 2.30 with = 0.2,K = 1, and
= 1.
2.13 Examples of invariant subspaces for linear systems: (a) =
1,2= 0; (b) 1,2= l i,3 = l.
2.14 Invariant subspaces of the linearized system (a) and invariant manifolds of the nonlinear system (b).
2.15 Contours of an energy function v (Xl, x2) or H(XI, x2).
2.16 Energylandscape for a pendulum; H = p2 Kcosq ; K
-2.
4

for
H
2P2
Landscape
+ {q
2.17
2.18 A limit cycle (thick dashed curve) and a trajectory (thin
solid curve) approaching it.
2.19 Periodic (left) and quasiperiodic (right) orbits on the surface of a torus. The orbit on the right eventuallypasses
through every point in the domain
2.20 A limit cycle for the Rssler system, a = b = 0.2, c = 1.

2.21A strange attractor for the Rssler system,a = b = 0.2,


c = 5.7.

2.22 Bifurcation diagram for the saddle-node bifurcation


2.23 Bifurcation diagram for the transcritical bifurcation
2.24 Bifurcationdiagrams for the pitchfork bifurcation.
2.25 Approximate solutions to = xusing explicitand implicit Euler methods with At = 2.1, along with the exact
e
solution x(t)
2.26 Stabilityregions for Adams-Bashforth methods; = Rx.
2.27 Stabilityregions for Adams predictor-corrector methods;
= Rx.
2.28 Stabilityregions for Runge-Kuttamethods; = Rx
2.29 Hat functions for N = 2
2.30 Approximate solutions to (2.91)using the finite element
method with hat functions for N = 6 and N = 12. The
exact solution also is shown

139
146

149
151

173
181
182

184
188
189
191

192
194

194
196
197
198

203
207
208
209
212

213

List of Figures

for the

Legendre

-Galerkin ap217
223
228

10.

xvi

uses
corrector.

nth-order

predictor

and nth-order

point Xo.
a
around
size
er and eo.

zero
vectors
to
ng and unit

snrillkl O)
3.1
3.3

two- divergence.
3.4 A

of the

3.6

260
263

265
266

3.2

a
equationin
Laplace's

239

square

Original
(a)
domain.

prob-

spherical harsurface
parts of the
real
to right,
Y44.

left
Y43,
domain.
31 From Y40, Y41, Y42,
physical
< 0 rex
the
monics
domain
in
in the
and sink
wave
source
right-traveling
3.8 A
initially
opposite sign
3.9 An
with
"image"
distance for
left-traveling
penetration
a
of
position
membrane
versus
3.10Concentration
rate constants. and sphere.
reaction
different
of slab, cylinder,
heating
3.11 Transient domain..
3.12 Wavy-walled
density p; (x)
probability
distribution, with
4.1 Normal
m) 2/2). .

(1/dQF)
2. The contour lines show
=
n
for
normal
4.2 Multivariate 95, 75, and 50 percent probability
containing

ellipses
form x TAx = b.
quadratic
of
geometry
The
4.3
c
=
4.4 Theregion X(c) for y
random
4.5 Ajointdensityfunction for the two uncorrelated
variables in Example 4.8.

4.6 Anearlysingularnormal density in two dimensions

278
290
303

307
313
334
339

355
360
360
370
372
376

4.7 Thesingularnormalresulting from y = Ax with rank


deficientA

378

xvii
Figures
Li5t

Of

Histogram

samples of uniformly distributed x.


of 10,000
10

of y =
of 10,000 samples
Histogram
marginals, marginal box, and
4.9
multivariate normal,

4.8

Tile

4.10 bounding

box.

384
384

398

squares fitting error (top) and validation erof


Thesum for PCRversus the number of principal comror (bottom)
validation indicates that four principal
cross
,
S
421
pollent
best.
are
components uares validation error for PCRand PLSR
of sq
sum
principal components/latent variThe
4.12versusthe number of
required
note that only two latent variablesare
,
422
ables
principal components.
four
versus
dataset.
versus measured outputs for the validation
Predicted
PLSR
4.13 : pCRusing four principal components. Bottom:
latent variables. Left: first output. Right: sectwo
using
ond output..
undermodeling. Top: PCRusing three principal
of
4.14Effect
PLSRusing one latent variable.
components.Bottom:
apindicator (step) function fl (w; x) and its smooth
The
4.15
proximation,f (w;x).
strain versus time data from a molecular dynamTypical
4.16
data file rohit. dat on the website
ics simulationfrom
che. wi sc.edu/jbraw/pri nci pl es.
www.
errvbl s . dat on the

versus x from data file


4.17Plotof y che . wi sc . edu/Njbraw/pri nci pl es
websitewww.

approximation to a unit step function, H (z 1).


4.18Smooth
with fixed sample
5.1 A simulation of the Wiener process
timeAt = 10-6 and D = 5 x 105.

423

424

428
444

445
451
461

5.2 Samplingfaster on the last plot in Figure 5.1; the sam-

ple time is decreased to At = 10-9 and the roughness is


restored on this time scale.
5.3 Arepresentative trajectory of the discretely sampled Brownian motion; D = 2, V = 0, n = 500.
5.4 Themean square displacement versus time; D = 2, V = 0,
n = 500..
5.5 Twofirst-order reactions in series in a batch reactor, cAO
l, CBO CC() O, kl = 2,

5.6 A samplepath of the unit Poisson process.

462

469

469
477
479

xviii

list Of

5.7 A unit Poissonprocess with more events;


sample
(top)and frequency distribution of event times
ath
5.8 Randomlychoosing a reaction with appropriate
ity. The interval is partitioned according to
the
relative
5.9 Stochasticsimulation of first-order series reaction
A
100 A molecules.
starting
5.10 Masterequation for chemical reaction A + B
probabilitydensity at state E changes due to
forward
5.11 Solutionto master equation for A + B

starting

With
20 A molecules, 100 B molecules and 0 C
molecules,
k
1/20, k-l =

5.12 Solution to master equation for A + B


starting With
200 A molecules, 1000 B molecules and O C
molecules
5.13 The equilibriumreaction extent's probability density
for
Reactions5.52 at system volume Q = 20 (top)
and
200 (bottom). Notice the decrease in variance in the Q
reaction extent as system volume increases.

B for no = 500, Q = 500.


Top:
discrete simulation; bottom: SDE

5.14 Simulationof 2 A

simulation.
5.15 Cumulative distribution for 2 A
B at t = 1 with no =
500, Q = 500. Discrete master equation (steps) versus
omega expansion (smooth)
5.16 The change in 95%confidence intervals for R(klk) versus

489

490

491

493

497

498

time for a stable, optimal estimator. We start at k = O

with a noninformative prior, which has an infinite confidence interval


5.17 Deterministic simulation of reaction A +
compared
to stochastic simulation.
5.18 Species A and B in a well-mixed volume element. Continuum and molecular settings.
5.19 Molecularsystem of volume V containing moleculesof
mass mAwith velocity VAi.. .

512
522
523

524

Tables
of
List

1.1

2.1

2.2
2.3

Quadratic function of scalar and vector argument.


small table of Laplace transform pairs. A more extensive
table is found in Appendix A.
Laplace transform pairs involving (5and its derivatives.
The linear differential equations arising from the radial
part of V 2y y = 0 in rectangular, cylindrical, and spher-

icalcoordinates..
3.1 Gradient and Laplacian operators in Cartesian,cylindrical,and spherical coordinates. . .
Larger table of Laplace transforms.
Statistical distributions defined and used in the text and
exercises.

summary of vector and matrix derivativesdefinedand


used in the text and exercises.

62

107
113

119
263

530
531

536

List of Examples and Statements

1.1
1.2
1.3
1.4

Definition: Linear space


Definition: Subspace
Definition: Norm
Example:Common transformations do not
commute
Matrixidentities derived

1.5 Example:

with index

notation

1.7 Theorem: Existenceand uniqueness of solutio ns


for square
systems
1.8 Example: Linearly independent columns, rows of
a matrix
1.10 Example: The geometry of least squares .
1.11 Theorem: Self-adjoint matrix decomposition
1.12 Example: A nonsymmetric matrix .
1.13 Example: A defective matrix
14 Example:Vibrational modes of a molecule
1.15 Theorem: Schur decomposition
1.16 Theorem: Symmetric Schur decomposition
1.17 Theorem: Real Schur decomposition
1.18 Definition: Convex function
1.19 Proposition: Full rank of ATA

10
11

19

20
24
39

40
43
46
47
48
58
82

Example: Particle motion


Example:A forced first-order differential equation
Example:Sets of coupled first-order differential equations
Example:Power series solution for a constant-coefficient
equation
2.5 Example:Frobenius solution for Bessel's equation of or-

2.1
2.2
2.3
2.4

der zero
2.6 Example:Fourier series of a nonperiodic function
2.7 Example:Generatingtrigonometric basis functions

2.8 Example: Bessel's equation revisited


2.9 Example: Legendre's differential equation and Legendre
polynomials
2.10 Theorem: Alternative theorem .

106
109
109
115

117
124
130
131
132
133

Examples
Listof

and Statements
xxi

2.11 Example:

steady-state temperature

2.12 Example:

Steady-state temperature

profile with

profile with

fixed end

insulated

134

Steady-state temperature
2.13 Example:
profile with
fixed flux
Fixed flux revisited .
Example:
137
2.14
141
2.15 Example: Nonhomogeneous boundary-value
problem
and
the Green's function
142
2.16 Definition: (Lyapunov) Stability
Attractivity .
147
2.17 Definition:
147
2.18Definition:Asymptotic stability
147
2.19Definition:Exponential stability
148
2.20 Definition: Lyapunov function
149
2.21Theorem: Lyapunov stability
150
2.22 Theorem: Asymptotic stability .
151
2.23 Theorem: Exponential stability
152
2.24 Theorem: Lyapunov function for linear systems .
155
2.25 Definition: Exponential stability (discrete time)
156
2.26 Definition: Lyapunov function (discrete time)
156
2.27 Theorem: Lyapunov stability (discrete time)
157
Asymptotic
Theorem:
stability
2.28
(discrete time) .
157
Exponential
Theorem:
stability (discrete time) .
2.29
157
2.30 Example: Matched asymptotic expansion analysis of the
reaction equilibrium assumption
169
2.31 Example: Oscillatory dynamics of a nonlinear system
2.32 Theorem: Poincar-Bendixson

176
189

3.1 Example: Gradient (del) and Laplacian operators in polar

(cylindrical)coordinates
3.2 Example:The divergence theorem and conservationlaws .
3.3 Example: Steady-state temperature distribution in a circular cylinder
3.4 Example:Transient diffusion in a slab

259
267

3.5 Example: Steady-state diffusion in a square domain

273
275
277

3.6 Example: Eigenfunction expansion for an inhomogeneous


problem

278

3.8 Example: Transient diffusion from a sphere

279
283

3.7 Example:Steady diffusion in a cylinder: eigenfunction


expansionand multiple solution approaches .

List of Examples

and

stQteth

xxii

field around a sphere in a linear


Temperature
Example:
3.9
perturbation: heat conduction around
Domain
Example:
3.10

28

a Fourier transform formula


Derivationof
in an unbounded domain.
3.11Example:Transientdiffusion
dimensions .
3.12Example:
multiple

diffusionfrom a wall with an imposed


Steady
3.13Example: profile .
concentration
and diffusion in a membrane
Reaction
3.14Example:
the wave equation

286
293

one and

294
296

309
314

3.15Example:
function of the normal density
Characteristic
356
4.1 Example: mean and covariance of the multivariate
The
4.2 Example:
normal
function of the multivariate normal 361
Characteristic
365
4.3 Example:
normal density

Marginal
4.4 Example:
. . .
Nonlinear transformation.
Example:
4.5
of two random variables .
4.6 Example:Maximum
implies uncorrelated
4.7 Example:Independent
imply independent?
4.8 Example:Does uncorrelated

366
369
369
371
371

4.9 Example:Independentand uncorrelated are equivalent


for normals

373

4.10 Definition:Density of a singular normal .


4.11 Example:Computing a singular density

376
377

Normaldistributions under linear transformation 379


4.12Theorem:
Sumof 10 uniformly distributed random variables 382
4.13Example:
4.14 Theorem:De Moivre-Laplace central limit theorem
4.15 Assumption:Lindeberg conditions

383

386
387

4.16Theorem:
Lindeberg-Feller
central limit theorem

4.17 Theorem:Multivariate CLTIID


4.18 Theorem:Multivariate CLTLindeberg-Feller
4.19 Example:Conditional normal density

387
387

4.20 Example:Morenormal conditional


densities
4.21Example:
Theconfidenceregion, bounding box, and marginal
box .

390
391

397

4.22Theorem:
Meanand varianceof samples from a normal . . 408
4.23

Example:Comparing
PCR and PLSR .
4.24

Theorem:Taylor's
theorem with bound on remainder

420

426

List of

Examp les and

Statements

xxiii

Example:Diffusion on a plane in Cartesian and polar co-

ordinatesystems
properties from sampling
Example:Average

468
3.2 Example:Transport of many particles suspended in a fluid 473
3.3 Example:Fokker-Planck equations for diffusion on a plane 474
3.4 Algorithm:First reaction method
483
485
3.3 Algorithm: Gillespie's direct method or SSA
509
3.6
Observability of a chemical reactor .
Example.
511
5.7
iteration and estimator stability
Theorem: Riccati
515
(with probability one)
3.8
Definition: Continuity
3.9

LinearAlgebra

1.1 Vectors and Linear Spaces


A vector is defined in introductory physics courses as a quantity having magnitude and direction. For example, the position vector of an
objectin three dimensions is the triple of Cartesian coordinates that
determine the position of the object relative to a chosen origin. Another
wayof thinking of the position vector is as a point in three-dimensional
space, generally denoted R3. This view leads us to the more general and

abstract definition of a vector: A VECTORIS AN ELEMENTOF A LINEAR


SPACE:

Definition 1.1 (Linear space). A linear space is a set V whose elements

(vectors)satisfy the followingproperties: For all x, y, and z in V and


for all scalars and
closure under addition
closure under multiplication

ax e V

definition of the origin


definition of subtraction

a(x) =
(a + )x = ax + x

+ y) = ax + ay

Ix = x, Ox = 0

Naturally,these properties apply to vectors in normal 3-1)space;


but they also apply to vectors in any finite number of dimensions as
1

Linear Algebra

well as to sets whose elements are, for example, 3 by 3 matrices


or
trigonometric functions. This latter case is an example of a function
space; we will encounter these in Chapter 2. Not every set of vectors
forms a linear space, however. For example, consider vectors pointing

from the origin to a point on the unit sphere. The sum of two such
vectors will no longer lie on the unit spherevectors definingpoints
on the sphere do not form a linear space. Regarding notation, many

readers will be familiar with vectors expressed in boldface type, x, v,


etc. This notation is especially common in physics-based problems
where these are vectors in three-dimensional physical space. In the
applied mathematics literature, where a vector takes on a more general
definition, one more commonly finds vectors written in italic type as
we have done above and will do for most of the book.

1.1.1 Subspaces

Definition 1.2 (Subspace).A subspace S is a subset of a linear space


V whose elements satisfy the following properties: For every x, y e S
and for all scalars

x +y e S

ax e S

closure under addition


closure under multiplication

(1.1)

For example, if V is the plane (R2), then any line through the origin
on that plane is a subspace.

1.1.2 Length, Distance, and Alignment


The idea of a norm generalizes the concept of length.

Definition 1.3 (Norm).A norm of a vector x, denoted llxll, is a real


number that satisfies

llaxll =

llxll

llxll > O,Vx


llxll
llx + yll llxll + llyll

triangle inequality

The Euclidean norm in VI is our usual concept of length


n

llx112= ElXi12

and Linear Spaces


1.1Vectors

in whichXi is the ith component of the vector. Unless otherwise noted,


this is the norm that will be used throughout this book, and will generallybe denoted simply as llxll rather than llx112.It should be noted,
however,that this is not the only definition of a norm, nor is it always
the most useful. For example, the so-called Ip norms for vectors in
are defined by the equation
l/p
n

lixilp=
particularlyuseful are the cases p = 1, sometimes called the "taxicab
norm"(why?)and p = co: llxlloo= maxi IXil.
generalizes the dot product of elementary algeThe INNERPRODUCT
bra and measures the alignment of a pair of vectors: an inner product

of twovectors, denoted (x, y) is a scalar that satisfies

(ax,y) =
(x, x) > 0, if x *O
Theoverbar denotes complex conjugate. Notice that the square root of
satisfies all the properties of a norm, so it is
the inner product
is
a measureof the length of x. The usual inner product in

(x, y) =

XiYi

in whichcase
= llx112.This is a straightforward generalization
of the formula for the dot product x y in R2 or R3 and has the same
geometricmeaning

(x, y) = ltxll llyll cose

whereis the angle between the vectors. See Exercise 1.1 for a derivation. If we are considering a space of complex numbers rather than real
numbers,the usual inner product becomes

(x,y) =

i=l

Xii

If (x, y) = O,then x and y are said to be ORTHOGONAL.

Linear Algebra

a
Finally,we can represent a vector x in Rn as single columnof
XT as a Row
and define its TRANSPOSE
VECTOR,
elements, a COLUMN
VECTOR

Nowthe inner product (x, y) can be written xTy if x and y are


and x T if they are complex.
1.13 Linear Independenceand Bases
If we have a set of vectors, say {Xl,x2, x3}, in a space V, this set is said
(1.1)if the only solution to the equation
INDEPENDENT
to be LINEARLY

is

DEPENDENT.
Otherwisethe set is LINEARLY
A

= 0 for all i.

if it contains a set of n linearly independent


space V is 'I-DIMENSIONAL
vectors, but no set of n + 1 linearly independent vectors. If n LIvectors
can be found for any n, no matter how large, then the space is INFINITEDIMENSIONAL.

Everything said above holds independent of our choice of coordinate


system for a space. To actually compute anything, however, we need

a convenientway to represent vectors in a space. We define a BASIS


{el, e, e3 J as a set of LI vectors that SPANthe space of interest, i.e.,
every vector x in the space can be represented

x = alel +

+ a3e3 +

If a space is n-dimensional,then a basis for it has exactlyn vectors


and vice versa. For example,in tR3 the unit vectors in the x, y, and z

directions form a basis. But more generally, any three LIvectors form
a basis for R3,
Although any set of LIvectors that span a space form a basis, some
bases are more convenient than others. The elements of an ORTHONOR
MAL(ON)basis satisfy these properties

(ei,ej)

0, i

each basis vector has unit length


the vectors are mutually orthogonal

These properties may be displayed more succinctly

Operators and Matrices


Linear
1.2

DELTA.In an orthonormal
is called the KRONECKER
ij
symbol
The
can be expressed
vector
basis,any

and Matrices
1.2 Linear Operators

transforms one vector into another. Operators appear


AnOPERATOR
in applied mathematics. For example, the operator d/dx
everywhere
optransformsa function f (x) into its derivative. More abstractly, an

of A)
DOMAIN
eratorA is a mapping that takes elements of one set (the
of A). LINEAR
andconvertsthem into elements of another (the RANGE

operatorssatisfythe followingproperties for all vectors u and v in


theirdomain and all scalars A'

(1.2)

A(au) = a(Au)

Wefocushere on operators on finite-dimensionalvector spaces [VI;


operatorson spaces of complex numbers are similar. (In Chapter 2 we
wmlookat an important class of operators in function spaces.) In these

spaces,and havingchosen a coordinate system in which to represent


vectors,any linear operator can be expressed as multiplication by a
A matrix is an array of numbers
MATRIX.
A ll

A 12

A 21

'422

Aln

Aml Am2
Thefirst subscript of each element denotes its row, while the second
denotesits column. The transformation of a vector
into another

Y=

xn
thenoccursthrough matrix-vector multiplication. That is: y = Ax,

whichmeans

Linear

Algebra

In this example,the matrix A is m by n (rowsby columns);


it is an
element of the linear space
and multiplication by A
maps
in Rtt into vectors in Rm. That is, for the function defined vectors
by matrix
multiplication, f (x) = Ax, f :
Rm. Some readers will
be
with matrices written in bold and the matrix-vector product familiar
between
matrix A and vector x
as either Ax or A
x.
One can also think of each row of A as a vector. In this
case
component of y can be thought of as the dot product betweenthe ith
row of A and the vector x. This is probably the best way to theith
remember
the actual algebra of the matrix-vector multiplication formula.
A
intuitive and general geometric interpretation (which will be more
used extensively as we proceed through the chapter) is allowed by
considering
each column of A as a vector, and thinking of the vector y as a
linear
combination of these vectors. That is, y is in the space spannedby
the
columns of A. If we let the ith column of A be the vector Ci,then
y = XICI +
n

+ X3C3+

(Note that in this equation xj is a scalar component of the vectorx,

while cj is a vector.) This equation implies that the number of columns


of A must equal the length of x. That is, matrix-vectormultiplication
only makes sense if the vector x is in the domain of the operatorA.

12.1 Addition and Multiplication of Matrices


The following terminology is used to describe important classes of matrices.

1. A is SQUARE
if m = n.

2. A is DIAGONAL
if A is square and Aij = 0 for i * j. Thisis

sometimes written as A = diag(al, Q, ...,an) in whichai is the


element Aii for i = 1,2
n.
3. A is UPPER(LOWER)
TRIANGULAR
if A is square and Aid = 0 for
4. A is UPPER(LOWER)
HESSENBERG
if A is square and Aij = 0 for
5. A is TRIDIAGONAL
if it is both upper and lower Hessenberg.

and Matrices
Operators
Linear

if A is square and Aij = Aji, i, j = 1,2,. .. , n.


SYMMETRIC
is
A
6.
if A and B both have
of two matrices is
ADDITION
domain and range, then
same
the

(A + B)ij = Aij + Bij

the matrices cannot be added.


Otherwise,
is simple
MULTIPLICATION
SCALAR
is not as simple. The product AB
MULTIPLICATION
MATRIX-MATRIX
of the rows of A with the columns of B. If
isthematrixof dot products
AB only exists if n = p. Otherwise, the
Ae Rtnxnand B e RPxq,then
with the columns of B. If ul
lengthsof the rows of A are incompatible
representsthe ith row of A, and vj the jth column of B, then

1,...q

(AB)ij = ui
Equivalently,

n
(AB)ij= E AikBkj,i=

= l,...q

SoABis an m by q matrix. Note that the existence of AB does not


implythe existence of BA. Both exist if and only if n = p and m = q.
Evenwhenboth products exist, AB is not generally equal to BA. In
otherwords,the final result of a sequence of operations on a vector
generallydepends on the order of the operations, i.e., A(Bx) * B(Ax).
Oneimportantexception to this rule is when one of the matrices is the
I. The elements of I are given by Iij = ij, so for
MATRIX
IDENTITY
example, in R3x3

100
010
001

Foranyvector x and matrix A, Ix = x and Al = IA.


Example1.4: Common transformations do not commute
LetmatricesA and B be given by
o

Linear

Algebra

Show that the operations of


factor in the "2" direction.
stretchinsame
g and
commute.
rotating a vector do not
Solution
The matrices AB and BA are
1

- 7
Sincethese are not equal, we conclude that the two vector operations

do not commute.

1.2.2 Transpose and Adjoint


For every matrix A there exists another matrix, called the TRANSPosE
of A and denoted AT, such that (AT)ij = Aji. The rows of A become
the columns of AT and vice versa. (We already saw this notionin the
contextof vectors: viewingx as a matrix with one column,thenXT
is a matrix with one row.) A matrix that equals its transpose satisfies
this can occur only for square
Aji = Aij and is said to be SYMMETRIC;
matrices. Someproperties of the transpose of a matrix are

(AB) T = B TA T

(ABC) T = c TB TA T

Properties involving matrix-vector products follow from the treatment


of a vector x as a matrix with only one column. For example
(Ax) T = x T A T

If A, x, and y are real, then the inner product between the vectorAx
and the vector y is given by
(Ax) Ty =

(1.3)

One can generalize the idea of a transpose to more generalopera-

tors. The ADJOINT


of an operator L (not necessarily a matrix) is denoted

L* and is defined by this equation

(Lx,y) =

(1.4)

12 LinearOperators and Matrices

If L is a real matrix A, then (Lx, y) becomes (Ax)Ty and comparison


of (1.3)and (1.4) shows that

if L is a complexmatrix A then we show in the following


Similarly,
section that

Byanalogywith this expression for matrices, we will use the notation


x* = RTfor vectors as well. Some general properties of the adjoint of
an operator are

If L = L* , then L is said to be SELF-ADJOINTor HERMITIAN.Self-adjoint

operators have special properties, as we shall see shortly, and show up


in many applications.

1.23 Einstein Summation Convention


Noticethat when performing matrix-matrix or matrix-vector multipli-

cations,the index over which the sum is taken appears

in the

formula,while the unsummed indices appear only once. For example,


in the formula

(ABC)ij= E E AikBklClj
the indicesk and I appear twice in the summations, while the indices i
andj only appear once. This observation suggests a simplified notation
for products, in which the presence of the repeated indices implies

summation,so that the explicitsummation symbols do not need to

be written. Using this EINSTEINSUMMATION


CONVENTION,
the inner

ProductxTy is simply XiYiand the matrix-vectorproduct y = Ax is


Yi = Aijxj. This convention allows us to concisely derive many key
results.

Linear

10

Algebra

derived with index notation


Example1.5: Matrixidentities
identities using index notation
Establishthe followingmatrix

(c) AAT = (AAT)T


(b) (AB)T = BTAT
(a) (Ax,y) = (x,Ty)
T
(e) ATA = (ATA) T
(d) A + AT = (A + AT)

Soluon
T
(a) (Ax, y) = (x, y)
(Ax, y) = AijXji
= XjAiji

= xjAjOi
= xjjiYi

= (x,Ty)

(A + AT)ij = Aij + Aji

= Aji + Aij

(b) (AB) T = BTAT


T
(AB) T
ij = (AikBkj)

= AjkBki
= BkiAjk
BikAkj

= (BTAT)ij

= (A + AT)T

ij

(e) ATA = (ATA) T


(ATA) ij = AikAkj

= AkiAkj

= AjkAki
(AAT) ij = AikAkj

= AikAjk
= AjkAik

= (ATA) ji

= (ATA)Tj

AjkAki
(AAT)T

1.2.4 Gram-SchmidtOrthogonalization and the QR Decomposition


We will encounter a number of situations where a linearly independent
a
set of vectors are available and it will be useful to construct from them

Operators and Matrices


Linear
1.2

11

vectors. The classical approach to doing this is called


setoforthogonalorthogonalization. As a simple example, consider LI
GLAM-SCHMIDT
which we wish to find an orthogonal pair ul
vectorsVI and v2, from
loss of generality we can set
and112.Without

component of v'2that is orthogonal to


Itisstraightforwardto find the
just subtract from c'2 the component that is parallel to ul (the
illwe
of V2onto ul)
PROJECTION

(V2,u1)ul

Inhigherdimensions,where we have v3, v4, etc., we continue the process,subtractingoff the components parallel to the previously determinedorthogonal vectors
(V3,u1)
(V3,u2)
u2
2
llul 11

and so on.

Wecan apply Gram-Schmidtorthogonalization to the columns of


anym x n matrix A whose columns are linearly independent (which
impliesthat m n). Specifically,we can write

whereQ is an m x n matrix of orthonormal vectors formed from the


columns
of A and R is an n x n upper triangular matrix. This result is

knownas the QR DECOMPOSITION.


We have the following theorem.
Theorem1.6 (QRdecomposition). If A e
has linearly independentcolumns, then there existsQ

anduppertriangular R such that

with orthonormal

columns,

SeeExercise1.38 for the proof. Becausethe columns of Q are or-

thonormal,QTQ = I.

1.2.5The Outer Product,

Dyads, and Projection Operators

Given
twoLIvectors VI and in [RTZ,
Gram-Schmidtuses projection to
constructan orthogonal
pair
u 1 = VI

Linear

12

Algebra

definition (u, v) = uTv.


haveused the inner product equation is linear in v2,Observe that the
so we should
right-handside of the second form
be
= AV2,where A is a
able to put this equationin the important
matrix.
concepts so we
The form of A illustrates someA = I P, where
eXpIicit1y
constructit here. We can write
PV2

Notingthat aTb = bTa for vectors a and b, this rearranges to


PV2 = 1(1V2)
whichhas the form we seek if we move the parentheses to have

PV2= (ll
That is, P is givenby what we will call the OUTERPRODUCT
between
generally,
the
outer
project uvT between
l and itself: 1T. More
a
DYAD,
called
that
satisfies the following
vectors u and v is a matrix,
properties
(uvT)ij = uiVj
(uv T)w = u(v Tw)
w T(uv T) = (w Tu)v T

wherew is anyvector.The outer product is sometimes denoted u v.

Whenthe notationu v is used to represent the inner product, u e v

or uv is used to represent the outer.


Finally,returningto the specificcase P = ll , we can observe that
PW = l (l w); the operation of P on w results in a vector that is the
The
OPERATOR.
projectionof w in the l direction: P is a PROJECTION
operator I P is also a projectionit takes a vector and produces the
projectionof that vector in the direction(s) orthogonal to l. We can
checkthat both 1T and I ll satisfy the general definition of a
projection operator
1.2.6 PartitionedMatrices and Matrix Operations
It is often convenient to consider a large matrix to be composed of other

matrices,rather than its scalar elements. We say the matrix is partiexplicit,


tionedinto other smaller dimensional matrices. TO make this

ii

Operators and Matrices


Linear
1.2

13

submatrix as follows. Let matrix A e Rtnxn,and define


firstwedefinea
< < ik m, and 1 jl < j2 <
i2
<
il
1
indices
matrix S, whose (a, b) element is
thenthe k x C
Sab = Aia,jb
submatrix of A.
is calleda
is partitioned when it is written as
AmatrixA e
All

AIC

'412

A21 A22

Akl Ak2
whereeachAij is an mi x nj submatrix of A. Note that Ei=l mi = m
andEj=lnj = n. Two of the more useful matrix partitions are columnpartitioningand row partitioning. If we let the m-vectors ai, i
1,2 n denote the n column vectors of A, then the column partitioning of A is
A = al

a2

Ifwelet the row vectors (1 x n matrices) j,j = 1, 2, ... , m denote the


m rowvectors of A, then the row partitioning of A is

al

a2

am

The operations of matrix transpose, addition, and multiplication


becomeevenmore useful when we apply them to partitioned matrices.
Considerthe two partitioned matrices
All

A12

Akl Ak2

AIC

Bll

Bml

B12

Bin
Bmn

inwhichAij has dimension Pi x qj and Bij has dimension ri x sj. We


thenhave the following formulas for scalar multiplication, transpose,
matrixaddition,and matrix multiplication of partitioned matrices.

Linear Algebra

14

1. Scalarmultiplication.
Ml 1 A12
RAk1 Ak2

2. Transpose.
11

21

18

2e

ke

3. Matrixaddition. If Pi = ri and qj = sj for i = 1


k andj =
Cand k = m and = n; then the partitioned matricescan
be added
Cll
Cij = Aij + Bij
Ckl

C, then we saythe
4. Matrixmultiplication.If qi = ri for i = 1
partitioned matrices conform, and the matrices can be multiplied
Cll

Cle

Ckl

These formulas are all easily verified by reducing all the partitioned
matrices back to their scalar elements. Notice that we do not haveto
remember any new formulas. These are the same formulas that we
learned for matrix operations when the submatrices Aij and Bij were
scalar elements (exceptwe normally do not write the transposefor
scalars in the transpose formula). The conclusion is that all the usual
rules apply provided that the matrices are partitioned so that all the
implied operations are defined.

1.3 Systems of Linear Algebraic Equations


1.3.1 Introduction to Existence and Uniqueness

Any set of m linear algebraic equations for n unknowns can be written

in the form

Ax = b

of Linear Algebraic Equations


1.3Systems

15

b e
and x (e Rti) is the vector of
whereA e R
unknowns.
consider the vectors Ci that form the columns of A. The solution x (if

it exists)is the linear combination of these columns that equals b


b = XICI +
+ X3C3+ ... + xncn

Thisviewof Ax = b leads naturally to the following result. The system


of equations
AX = b,

Rmxn X e Rn, b e Rm

has at least one solution x if and only if the columns of A are not
linearlyindependent from b.

For example, if m = n = 3 and the columns of A form an LIset, then


theyspan R3. Therefore, no vector b e R3 can be linearly independent

fromthe columns of A and therefore Ax = b has a solution for all

b e R3. Conversely, if the column vectors of A are not LI, then they do

not span R3 so there will be some vectors b for which no solution x


exists.

Considerthe case where there are the same number of equations


as unknowns: n = m. Here the above result leads to this general
theorem.

Theorem1.7 (Existenceand uniqueness of solutions for square systems).If A e Rnxn, then


The prob(a) If the columns of A are 11, then the matrix is INVERTIBLE.
lem Ax = b has the followingproperties:
problem)

has only the trivialsolution

(1) Ax

= 0 (the homogeneous

(2)Ax

= b (the inhomogeneous problem) has a unique nonzero

(1)Ax

number of nonzero solutions.These


= O has an infinite

solutionfor all b * 0.
or
(b)If the columns of A are NOT1.1,then the matrix is SINGULAR
In this case:
NONINVERTIBLE.
solutions comprise the NULLSPACEof A.

(2) Forb * 0, Ax

= b has either:

(i) No solution, if b is 1.1of the columns of A. That is, b is not

of A, or
in the RANGE

Linear
16

solution to Ax
ticular

Algeb

= b and any
combination

x = XH + XP where Axpofth
i.e.,
0,
=
Ax
lutions of
Qhd

13.2

LU Decomposition
b:
=
Ax
Solving

of explicitly constructing solutions.


issue
the
to
Forthe
n = m and to case (a)
Wenow turn
to
attention
Oftheabove
present,we restrict we can define the INVERSE
of A, denoted
case,
this
In
theorem.
that satisfies
operator
matrix
This is a
of A-I )
(definition
I
=
1. A-I A
2. AA-I = 1
1
3. (AB)-I = B-l A

that A-IAx = A-lb reduces to x = A-lb


The first property implies
finding A-I. Finding A-I is not necessary,
so Ax = b can be solved by
a
however,to solve Ax = b, nor is it particularly efficient. Wedescribe
widelyused approach called LU decomposition.
LUdecompositionis essentially a modification of Gaussianelimi.
nation,with which everyone should be familiar. It is based on thefact
that triangularsystems of equations are easy to solve. Forexample,
this matrix is upper triangular

123
048

007

Allthe elements below the diagonal are zero. Since the third rowhas
onlyone nonzero element, it corresponds to a single equationwitha
it
singleunknown. Once this equation is solved, the equationabove
has only a singleunknown and is therefore easy to solve,andso
LUdecompositiondepends on the fact that a square matrixAcanbe
written A = LU,where L is lower triangular and U is upper triangular
Usingthis fact, solving Ax = b consists of three steps, the firstofwhid
takes the most computation:
1. Find L and U from A: LU factorization,

2. SolveLc = b for c:

forward substitution.

of Linear
1.3Systems

Algebraic Equations

= c for x: back substitution.


3. SolveUx
steps are simple operations, because L and U are trianThelatter two L and U are independent of b, so to solve Ax = b
gular.Notethat
values of b, then once A is factored, only the inexformanydifferent
The LU
pensivesteps 2 and 3 of the above process need be repeated.
decompositionprocedure (first step above) is illustrated on the matrix

A=

352
082
628

step a. Replacerow 2 with a linear combination of row 1 and

row

2 that makes the first element zero. That is, r2 is replaced

by r2 L21r1, where L21 = A21/ All. For this example, A21 is

already zero, so L21= 0 and r2 is unchanged.


Stepb. Replacerow 3 with a linear combination of row land row
3 that makes the first element zero. That is, r3 is replaced
byr3 L3ff1, where 1.31= 2431/All. So L31= 6/3 = 2 and A
is modified to
3

0 82

Stepc. Nowthe first column of the matrix is zero below the diagonal. Wemove to the second column. Replace row 3 with a
linear combination of row 2 and row 3 that makes the sec-

ond element zero. That is, r(3) is replaced by r3 L32TQ,

where 1.32 A32/A22. So 1.32= 1and A is modified to

352
082
006

=U

This matrix is now the upper triangular matrix U. For a matrix in higher dimensions, the procedure would be continued
until all of the elements below the diagonal were zero. The
matrix L is simply composed of the multipliers Lij that were
computed at each step
1

L = L21
L31 L32 1

Linear

18

Algebra

elements of L are 1 and all aboveNote that all the diagonal

diagonalelements are zero. The elements on the diagonal


of U are called the PIVOTS.

Lc = b and then Ux
Now,for any vector b the simple systems

as written, the method willfail


can be solved to yield x. Notice that
if
Modern
procedure.
computational
the
of
step
any
at
routines
Aii = 0
actually compute a slightly different factorization PA = LU wherep
is a permutation matrix that exchanges rows to avoid the case Aii= 0

(see Exercise 1.9). With this modification, known as PARTIALPIVOTING,

even singular or nonsquare (m > n) matrices canbe factored. However,


the substitution steps will fail except for values of b in the range ofA.
To see this, try to perform the back substitution step with a matrixU
that has a zero pivot.

13.3 The Determinant


In elementarydiscussionsof the solution to Ax = b that are based
of the matrix A, denoted detA,
RULE,the DETERMINANT
on CRAMER'S

arises. One often finds a complicated definition based on submatrices, but having the LUdecomposition in hand a much simpler formula
emerges (Strang, 1980). For a square matrix A that can be decomposed
into LU,the determinantis the product of the pivots
n

If m permutations of rows must be performed to complete the decomposition, then the decomposition has the form PA = LU, and

detA = (1)m

Uii

The matrix A-I exists if and only if detA * 0, in which case detA 1
(detA)-1. Another key property of the determinant is that

detAB = detA detB


The most important use of the determinant that we will encounterin

this book is its use in the ALGEBRAIC


that appears
PROBLEM
EIGENVALUE
in Section 1.4.

of
Systems
1.3

Linear Algebraic Equations

19

of a Matrix
13.4 Rank
the rank of a matrix, it is useful to establish the followBeforewe definematrices: the number of linearly independent columns
of
ingproperty equal to the number of linearly independent rows.
is
of a matrix
Linearly independent columns, rows of a matrix
Example1.8:
Assume A has c linearly independent columns and r
GivenA e Rtnxn.
rows. Show c = r.
linearlyindependent
Solution

be the set of A's linearly independent column vectors. Let


Let
vectors and i be all of A's row vectors, so
theai be all of A's column
partitioned by its columns or rows as
theA matrix can be
n

al

a2

am
Eachcolumnof the A matrix can be expressed as a linear combination
of the c linearly independent Vi vectors. We denote this statement as
follows

aj
A:mxn

V:mxc

A:cxn

in whichthe column vector j e RC contains the coefficients of the

linearcombination of the Vi representing the jth column vector of matrixA. If we place all the j,j = 1, . .. , n next to each other, we have
matrixA. Next comes the key step. Repartition the relationship above
as follows

51

ai
A:mxn

V:mxc

A:cxn

20
lineQF

and we see that the rows of A can be expressed


as Iin
of the rows of A. The multipliers of the i th
row

of
elementsof the ith row of V, written as the
row vector
resslble as linear
combinations

of A,but we do not know if the rows of A

o;

dependent row (column) vectors of A are also


the linearly
column (row) vectors of AT. Combining c r
with r c, independent
we conclude

and the result is established. The number of


columnsof a matrix is equal to the number of linearly independent
linearlyindependent
rows, and this number is called the rank of
the matrix.

Definition 1.9 (Rankof a matrix). The rank of a matrix


is the number
of linearly independent rows, equivalently, columns, of the
matrix.

We also see clearly why partitioned matrices are so useful. Theproof


that the number of linearly independent rows of a matrix is equaltothe
number of linearly independent columns consisted of little morethan
partitioning a matrix by its columns and then repartitioning the same
matrix by its rows. For another example of why partitioned matricesare

useful, see Exercise 1.17 on deriving the partitioned matrix inversion


formula,which often arises in applications.

13.5 Range Space and Null Space of a Matrix


GivenA e R"txn, we define the range of A as

R(A)= {y e

I y = Ax, x e VI}

with
generated
be
The range of a matrix is the set of all vectors that can
linearly
the
are
VI
the product Ax for all x e IV. Equivalently, if Vi e
the

span of
independent columns of A, then the range of A is the

Linear
systems of

Algebraic Equations

21

is for the range of A. GivenA e Rtnxn,we define the

are a bas follows


Vi
Tbe
of A as
space
ull
MA)

range and null


similarlythe

= {x e

IAx = 0}

spaces of AT are defined to be

R(AT) = {x e

ye

Ix=ATy,

N(AT) = {y e Rm I

= 0}

is the set of linearly independent rows of A, transR(AT)


for
basis
A
vectors. We can show that these four sets also
posedto make column
properties of a subspace, so they are also subspaces
satisfythe two
1.14).
(seeExercise
A. We know from the previous examLetr be the rank of matrix
to the number of linearly independent rows of A
equal
is
r
that
ple
number of linearly independent columns of A.
andis also equal to the
the dimension of R (A) and R (AT) is also r
Equivalently,
dim(R(A)) = dim(R(AT)) = r = rank(A)
relations
Wealsocan demonstrate the following pair of orthogonality
amongthese four fundamental subspaces
R(A)

L N(A T )

R(A T ) L N(A)

the first orthogonality relationship. Let y be any element of


Consider
T
N(AT). WeknowN(AT) = {y e Rm I A y = 0}. Transposing this
relationand using column partitioning for A gives
T
y A=o

y Ta1 Y a2

Thelast equationgives yr ai = 0, i = 1, , n, or y is orthogonal to


everycolumnof A. Since every element of the range of A is a linear
combination
of the columns of A, y is orthogonal to every element of
R(A),whichgives N(AT) L R(A). The second orthogonality relation-

shipfollowsby switching the roles of A and AT in the preceding argument(seeExercise1.15). Note that the range of a matrix is sometimes
calledthe image, and the null space is sometimes called the kernel.

Linear

22

Algebra

Terms of Rank and Null


13.6 Existence and Uniqueness in
Space
The FUNDAMEN
Wereturn now to the general case where A e
complete characterization
TAL THEOREMOF LINEARALGEBRAgives a

of the existenceand uniqueness of solutions to Ax = b (Strang,1980):


and Rm into the four
every matrix A decomposes the spaces
funda1.1.
The
answer
Figure
to
in
depicted
the
mental subspaces
questionof
to
solutions
Ax
=
b
can be summarized
existenceand uniqueness of
as follows.

1. Existence. Solutionsto Ax = b exist for all b if and onlyif the


rows of A are linearly independent (m = r).

2. Uniqueness. A solution to Ax = b is unique if and onlyif the


columns of A are linearly independent (n = r).

We can also state this result in terms of the null spaces. A solution
to Ax = b exists for all b if and only if N(AT) = {O}and a solution
to Ax = b is unique if and only if N(A) = {O}. Moregenerally,a
for a particular b if and only if b e R(A),by
solution to Ax = b
the definition of the range of A. From the fundamental theorem,that
means y Tb = 0 for all y e N(AT). And if N(AT) = {0}we recover
the existence condition 1 stated above. These statements providea
succinct generalization of the results described in Section 1.3.1.
13.7 Least-Squares Solution for Overdetermined Systems
Nowconsider the OVERDETERMINED
problem, Ax = b where A e

with m > n. In general, this problem has no exact solution,because


the n columns of A cannot span (Rtn, the space where b exists. This
problem arises naturally in fitting models to data. In general,the best
we can hope for is an approximate solution x that minimizes the resid-

ual (or error) r = Ax b. In particular, the "least squares"method


attempts to minimize the square of the Euclidean norm of the residual,
llr112= rTr. Replacingr by Ax b, this quantity (dividedby 2)reduces

to the function

P(x) = x TA TAx x TA T b + b T b

P is a scalar function of x and the value of the vector x that minimizes


P is the solution we seek. That is, we now want to solve P/x1 = 0, l,-

Linear Algebraic
of
3 Systems

23

Equations
Ax = b

R(AT)

N(AT)

MA)

Figure 1.1. The four fundamental subspaces of matrix A (after


(Strang, 1980, p.88)). The dimension of the range of A

and AT is r, the rank of matrixA. The nullspaceof A

and range of A T are orthogonal as are the nullspace of


AT and range of A. Solutions to Ax = b exist for all b
if and only if m = r (rows independent).A solution to
Ax = b is uniqueif and only if n = r (columnsindependent).

l,...,n, or in different notation, VP(x) = 0. Performingthe gradient

operationyields

= AljAjkXk Xl

or in matrix form

dx
Therefore,the condition that P be minimized is equivalent to solving

24

lineqt

EQUATIONS.
These are called the NORMAL
Notice

that

easy to solve as LUX = ATb. In Exercise 1.41 You


are
s of A are
dependent.l

If ATAhas full rank, the inverse is uniquely

defined
write the least-squares solution to the normal equations ,and
as
Xls =
b
The matrix on the right-hand side is ubiquitous in
as the pseudoinverse of A (or least-squares
lems,it is
Moore-Penrose
doinverse in honor of mathematician E. H. Moore
and
physicist Roger Penrose) and given the symbol At. The
Xls = At b

At = (ATA) IAT

The normal equations have a compelling geometric interpretation


that illustrates the origin of their name. Substituting r into thenormal
equations gives the condition ATr = O. That is, the residual r = Ax-b
is an element of the null space of AT, N(AT), which means r is orthog.
onal, i.e., normal, to the range of A, R (A) (right side of Figure1.1).This

is just a generalization of the fact that the shortest path (minimum


llrll)
b
not
point
on
a
that
and
plane
plane
is
a
connecting
perpendicular
to
the plane. Note that this geometric insight is our seconduseofthe
fundamental theorem of linear algebra, This geometric interpretation
is perhaps best reinforced by a simple example.

Example1.10:The geometry of least squares


Weare interested in solving Ax = b for the following A and b.
1

A = 21
I

1
1

The
to remember.
easy
least
at
is
IPutting proof aside for a moment, the condition
has morerows
squares
least
solutiol
A in the overdetermined system for which we apply
least-squares
columns, So the rank of A is at most the number of columns. The rank of Aequals
i.e.,
is unique if and only if the rank is equal to this largest value,
number of columns.

.1

Linear Algebraic Equations


of
Systems
1.3

25

is the rank of A? Justify your answer.


What
(a)
a sketch of the subspace R (A).
Draw
(b)
a sketch of the subspace R (AT).
Draw
(c)
a sketch of the subspace N(A).
(d) Draw
sketch of the subspace N(AT).
(e) Drawa
to Ax = b for all b? Justify your answer.
(f)Is there a solution
(g)Is there a solution for the particular b given above? Justify your
answer.

(h)Assumewe give up on solving Ax = b and decide to solve instead


the least-squares problem
min(Ax b) T(Ax b)

Whatis the solution to this problem, xo?


(i) Is this solution unique? Justify your answer.

(j) Sketchthe location of the b0 for which this xo does solve Ax = b.


In particular, sketch the relationship between this b0 and one of
the subspaces you sketched previously. Also on this same drawing, sketch the residual r = Ax o b.
Solution

(a) The rank of A is 2. The two columns are linearly independent.


(b)R(A) is the xy plane in R3.
(c) R(AT) is R2. Notice these are not the same subspaces, even though

they have the same dimension 2.

(d)N(A) is the zero element in R2.


(e)N(AT)is the z axis in R3.
(f)No.The rows are not independent.

Linear

26

Algebra

MAT)

RCA)

Figure 1.2: Least-squaressolution of Ax = b; projectionof b into

R(A) andresidualr = Ax o b in N(ATL

(g) No. The range of A does not have a nonzero third element and
this b does.
(h) The solution is

(i) Yes, the least-squares solution is unique because the columnsof


A are linearly independent.

(j) The vector b is decomposed into b0 e R(A) and r = Axo b e


N(AT).

We want Axo = b0, so b0 =


= Pb and the projection operator is P =
The residual is r Axo b =
(P I)b and we have for this problem

100
000

0
0

O
0

-1

of Linear Algebraic Equations


Systems
1.3

Substitutingin the value for b gives


1

b0 =

-1

The spaces R (A) and N(AT) are orthogonal, and therefore so are
b0 and r. The method of least squares projects b into the range

of A, giving b0, and then solves exactly Ax = b0 to obtain xo.


Theserelationships are shown in Figure 1.2.

Theaboveanalysis is only the beginning of the story for parameter


estimation.Wehave not dealt with important issues such as errors in
themeasurements,quantifying the uncertainty in parameters, choice
ofmodelform, etc. Manyof these issues will be studied in Chapter 4
as part of maximum-likelihood estimation.

13.8 MinimumNorm Solution of the Underdetermined Problem


Considerthe case of solving Ax = b with fewer equations than unproblem. Assume that the
knowns,the so-called UNDERDETERMINED
rowsof A are linearly independent, so a solution exists for all b. But
wealsoknowimmediately that N(A) * {O},and there are infinitely
manysolutions. One natural way to choose a specific solution from
solutheinfinitenumber of possibilities is to seek the MINIMUM-NORM
2 subject to the constraint that Ax = b.
tion.Thatis, we minimize llx11
Byanalogywith the approach taken above in constructing the leastsquaressolution, we define an objective function
P(x) = x Tx z T (Ax b) = XiXi Zi(AijXj bi)

wherenow z is a vector of Lagrange multipliers. The minimization


conditionP/xk = 0 is thus
Xk = zjAjk

orx = ATz. Inserting this into the equation Ax = b yields


Sincethe rows of A are linearly independent, AAT is full rank.2 We can

solvethis equation for z and insert into the equation x = ATz that we
2
Transposethe result of Exercise 1.41.

Linear Algebra

28

found above to deduce the

minimum-norm solution
(1.6)

Note the similarityin the solution structure of the underdetermined,


minimum-normproblem to the overdetermined, least-squares problem
given in (1.5). The singular value decomposition, which we introduce
in Section 1.47, allows for a unified and general treatment of both the
underdeterminedand overdetermined problems.
13.9 Rank, Nullity, and the Buckingham Pi Theorem
As engineers,we often encounter situations where we have a number
of measurementsor other quantities di and we expect there to be a
functionalrelationship between them

In general, we would like to have a dimensionless representation of this


relation,one that does not depend on the units of measurement, i.e.,

where each

has the form

11= dfl df2X

and the exponents ai are chosen so that


each Ili is dimensionless. If the
set

of n quantitiesdi depend on m units


(kilograms, meters, seconds,
amperes,
the key question is: what is the relationship between n,
m, and the number I of
dimensionless
characterize the relationship between variables Ili that is required to
the variables?
We
address this issue with a specific
example. Consider fluid
flow through a tube. The fluid
has density p and viscosity n, and flows
with averagevelocity U
through a tube with radius R and length L,
driven by a pressure drop
to mean "has dimensionS
of,"we seek dimensionlessAP. Defining
quantities of the form
11=

[=

kg m al m a2 kg
s2m2

3 kg ms a4
m2s2

(m) a5 (m) a6

29

Linear Algebraic Equations


of
Systems

must cancel, so we require that


units
Allthe
al + a3 + a4 = 0
al + a2 3a3 a4 + a5 + a6 = 0
m:
2m a2 a4 = 0
s:
Thisis a system

of three equations with six unknowns and has the form

m = 3,n = 6, and x =

- 0,whereA e

We

at most three LI columns, so in six dimensions there


has
A
that
know
three dimensions that cannot be spanned by these
mustbe at least
it is easy show that A does have three LI
threecolumns. In this case
that there are 6 3 = 3 families of solutions
means
which
columns,
proper dimensionless quantities. By inspection, we
ai that willyield
and
(1, 2,
1, 1,
= (0, 1,
can find the solutions x
(0, 0, 0, 0, 1, I)T,

yielding the three dimensionless

PUR

Ap

113 =

groups

as the
Readerswith a background in fluid mechanics will recognize
(Bird, Stewart, and Lightfoot, 2002).
NUMBER
REYNOLDS
Becausethe solution to Ax = 0 is not unique, this choice of dimensionlessgroups is not unique: each Ili can be replaced by any nonzero
powerof it, and the IliS can be multiplied by one another and by any
constantto yield other equally valid dimensionless groups. For exam= -AP
-, fluid mechanicians
ple,112can be replaced in this set by 112113
FACTOR.
recognizethis quantity as the FRICTION
Nowwe return to the general case where we have n quantities and m
units.BecauseA has m LI rows (and thus m LI columnssee Example
1.8),it has a nullspace of n m dimensions, and therefore there is an
n - m dimensionalsubspace of vectors x that will solve Ax = 0. This
given a problem with n
resultgivesus the BUCKINGHAM
PI THEOREM:
dimensional
parameters containing m units, the problem can be recast
interms of I = n m dimensionless groups (Lin and Segel, 1974). This
theoremholds under the condition that rank(A) = m; in principle it is
possiblefor the rank of A to be less than m. One somewhat artificial
examplewhere this issue arises is the following: if all units of length are

representedas hectares per meter, then the equations corresponding


to thosetwo units would differ only by a sign. They would thus be
redundantand the rank of A would be one less than the number of
units.If m were replaced by rank(A), then the Pi theorem would still
hold.

Linear

30

Algebra

which the Buckingham Pi theorem


A less trivial example in
can cause
problems involving mixtures.

confusion is the case of

One might

chemical species A and moles of


pect that moles (or masses) of
chemical

species B (or mole or mass fractions of these species) would be independent units, but they are not. Unlike kilograms and meters, which
cannot be added to one another, moles of A and moles of B can be added

to one another so they do not yield separate equations for exponents


the way that kilograms and meters do.
13.10 Nonlinear Algebraic Equations: the Newton-Raphson Method
Manyif not most of the mathematical problems encountered by engineers are nonlinear: second-orderreactions, fluid dynamics at finite
Reynoldsnumber, and phase equilibrium are a few examples. Wewill
write a general nonlinear system of n equations and n unknowns as

f(x) = o

(1.7)

and f e VI. In contrast to the case with linear equations,


where x e
where LUdecompositionwill lead to an exact and unique solution(if
the problem is not singular), there is no general theory of existence and
uniqueness for nonlinear equations. In general, many solutions can
exist and there is no way of knowing a priori where they are or how
many there are. To find solutions to nonlinear equations, one almost
always needs to make an initial guess and use an iterative method to
find a solution. A powerful and general method for doing this is called
NEWTON-RAPHSON
iteration.

Consideran initialguess x and assume for the moment that the

exact solution xe is givenby x + d, where d is as yet unknown, but is


assumed to be small, i.e., the initial guess is good. In this case

f(xe) = f (x + d) = 0
We next expandingthe right-hand side in a Taylor series aroundx.
It is now convenientto switch to component notation to express the
second-orderTaylor series approximation for vector f

1 2fi
3)
djdl + O (lld11
2 xjXl x
where the notation O(P) denotes terms that are "of order P,"which
means that they decayto zero at least as fast as P in the limit (5-10,
fi
fi(x + d) = fi(x) +
xj x

Linear Algebraic Equations


of
1.3Systems

31

dx x

f(x)

f(x+)

x
method for solving
Figure 1.3: An iteration of the Newton-Raphson
f(x) = 0 in the scalar case.

the terms
Anapproximatesolution to this equation can be found if
yielding the
thatare quadratic and higher degree in d are neglected,
linearizedproblem

x x
Settingf(x+d) = 0 and defining the JAc0BIANmatrixJij(x) = fi/xj
thiscanbe rearranged into the linear system

J(x)d = f(x)
Thisequation can be solved for d (e.g., by LUdecomposition) to yield a
newguess for the solution x + = x + d in which we use the superscript
x+ to denote the variable x at the next iterate. Denoting the solution
byd = J-1
the process can be summarized as
x + = x J -1 (x)f(x)

(1.8)

32

Linear

Algebra

This equation is iterated until llx+ xll or


reaches a
prescribed
error tolerance. One iteration of (1.8) is depicted for a scalar
function
in Figure 1.3.

An important question for any iterative method is how


converges.To address this issue for the Newton-Raphson rapidlyit
method,let
e = x xe be the difference between the approximate solution
and the
+

exact solution. Similarly, 6+ = x xe and therefore c+


Usingthis result and (1.8),the evolution equation for the error

is

Taylor expandingthis equation around xe yields, again in


indexnotation due to the Taylor series,
Jj1

Ei+= Ei Ji-jllxe+ Xl

El + O

1 Jjk
O + JjklxeEk +

EkEl + O

2 Xl

Ji-j1
I -1 Jjk
Jjk
+
-J
X1
2 ij Xl

= Jj1Jjk I
ikEk+

EIEk

Ji-j1

I -1 Jjk
Jjk
+
J
X1

EIEk + O

Ji-j1
X1

Jjk + -J -1 Jjk
2 ij Xl

16k+0 (11611
3)

-1 Jjk
(Jj1Jjk)

AJ
Xl
2 ij Xl

1 -l Jjk
ik
-J
Xl
Jjk
I 1

EIEk+ O (116113)

EIEk+ O (116113)

+ O (11611
3)

This result, which we can summarize


illustrates
as 116+11
= O (116112),
that given a sufficiently good guess,
the Newton-Raphson iteration Converges rapidly, specificallyquadratically,
to the exact solution.

For example,if the error in iteration

error after step k + 1 is

(1.8)after step k is 10-2,the

10-4 and after step k + 2 is

10-8. Inde d

| 4 The

Problem
Algebraic Eigenvalue

33

of whether a code for implementing Newton-Raphsonis


check
a good
to v
correctis a sufficiently good guess is given. If the initial guess is
onlyholds if
not converge, or alternately may converge to a
the iteration may
from the initial guess.
far
solution
Coordinate Transformations
Linear
13.11
the components of a matrix operator depend on the
above,
noted
As
in which it is expressed. Here we illustrate how the
coordinatesystem
operator change upon a change in coordinate
componentsof a matrix
vectors x and y and a matrix operator A, where
system.Consider two
take x and y to be two-dimensional,in
Ax. For example, we can
which case

x and x12,where
Nowconsider new variables
x'l = TIIXI + T12 X 2

X'2= T21Xl +

the same vector, but


Thiscanbe written x' TX. Here x and x' are
coordinate
representedin the original (unprimed) and new (primed)
valsystems,and T is the operator that generates the new coordinate
there is
uesfromthe original ones. It must be invertibleotherwise
we
nota unique mapping between the coordinate systems. Therefore,
x = T-l x' and y = AT-I x'; the matrix AT-I yields the
canWTite
mappingbetween x' and y. If we also consider a coordinate transfor-

mationof the vector y of the form y' = Wy, then y' = WAT-1x'

Thematrix WAT-I provides the mapping from x' to y'. Some importantcoordinatetransformations that take advantage of the properties
of the operator A are described in Section 1.4.

1.4 The Algebraic Eigenvalue Problem


1.4.1 Introduction

Eigenvalue
problems arise in a variety of contexts. One of the most
importantis in the solution of systems of linear ordinary differential
equations.Consider the system of two ordinary differential equations

dz

dt

= Az

(1.9)

34

Linear

Here z e R2 and A e R2x2.If we guess, based on what we

Algebra

know about

then we have that


Ax = Rx

(1.10)

If we can find a solution to this equation, then we have a


(1.9). (To obtain the general solution to (1.9) we must findsolutionto
two solus
tions to this problem.) This is the algebraic version of the
EIGENVALUE
PROBLEM.

The eigenvalue problem can be rewritten as the homogeneous


sys.
tem of equations
(A - 1)x = O
As with any homogeneous system, this generally has only the
trivial
solution x = 0. For special values of R, known as the EIGENVALUES
of A, the equation has a nontrivial solution, however. The solutions
corresponding to these eigenvalues, which can be real or complex,are
the EIGENVECTORS
of A. Geometrically, the eigenvectors of A are those
vectors that change only a scalar multiple when operated on by A. This
property is of great importance because for the eigenvectors,matrix
multiplication reduces to simple scalar multiplication; Ax can be replaced by Rx. Becauseof this property, the eigenvectors of a matrix
provide a natural coordinate system for working with that matrix. This
fact is used extensively in applied mathematics.

From the existence and uniqueness results for linear systemsof


equations that we saw in Section 1.3.1, we know that the abovehomogeneous problem has a nontrivial solution if an only if A RI is
noninvertible: that is, when
det(A - M) = O

This equation is called the CHARACTERISTIC


for A, and det(AEQUATION
M) is the CHARACTERISTIC
POLYNOMIAL.
For an n x n matrix; this poly-

nomial is always nth degree in A;this can be seen by performingLU

decomposition on A M; therefore, the characteristic polynomial has n


roots (not necessarily all real or distinct). Each root is an eigenvalue,so
an n x n matrix has exactly n eigenvalues. Each distinct eigenvaluehas
a distinct (i.e. linearly independent) eigenvector. Each set of repeated

roots will have at least one distinct eigenvector,but may havefewr


than the multiplicity of the root. So a matrix may have fewerthan

Algebraic Eigenvalue
4 The

35

Problem

The nature of the eigenvectors depends on the structure

the eigenvalues of a matrix may be found by finding


principle,
In

characteristic polynomial. Since polynomials of degree


its
of
theroots
four cannot be factored analytically, approximate numerithan
greater
must be used for virtually all matrix eigenvalue problems.
methods
cal
methods for finding the roots of a polynomial, but
Thereare numerical
this procedure is difficult and inefficient. An extremely
practice,
in
method, based on the QR factorization of a matrix (Exrobustiterative
most commonly used technique for general matrices.
ercise1.38),is the
only the "dominant" eigenvalue (the eigenvalue with the
cases,
some
In
(Exercise
METHOD
be found. The POWER
largestmagnitude)needs to
technique for this problem. Generalizations of
1.57)is a rapid iterative
method form the basis of powerful KRYLOV
theideabehind the power
methods for iterative solutions of many computational linear
sUBSPACE
Bau Ill, 1997).
algebraproblems (Trefethen and
1.42 Self-Adjoint Matrices
Considerthe real symmetric (thus self-adjoint) matrix

Thecharacteristicequation for A is 2 4 + 3 = 0 and its solutions


are = 1,2= 3. The corresponding eigenvectors x = VI and x = v2
are solutions to

Thesesolutionsare (to within an arbitrary multiplicativeconstant)


1

-1

Notethat these vectors, when normalized to have unit length, form an

ONbasis for R2. Now let

AVectorx in R2 can now be represented in two coordinate systems,


eitherthe originalbasis or the eigenvectorbasis. A representation in

Linear Algebra

36

by a ', so x' tx'l x'21T


the eigenvectorbasis will be indicated
is the
of
x
expressed
in
coordinates
the
eigenvector
vector containingthe
coordinate
the
that
transformation
shown
be
basis. It can
between
these bases is definedby Q, so that x = Qx' and x' = Q-lx. Remember
that A is defined in the original basis so Ax makes sense, but Ax' does
not. However,we can write

Ax = A(xlV1 + x5V2)

= x'1AV1+ x!2AV2
= x'1IV1+ x!22V2
= QAx'
where

Therefore,Ax = QAx'. Usingthe transformationx' = Q-lx gives

that Ax = QAQ-1x, or A = QAQ-I . This expression can be reduced


further by noting that since the columns of Q form an orthonormal
basis, QkiQkj= ij, or QTQ = I. Since Q-I Q = I by definition,
it
follows that Q-1 = QT. Matrices for which this property holdsare
called ORTHOGONAL.
In the complex case, the property becomes Q-1
QT and Q is said to be UNITARY.Returning to the example, the property

means that A can be expressed

A = QAQT
As an example of the usefulness of this result, consider the system
of equations

= Ax = QAQTx
dt

By multiplying both sides of the equation by QT and using the facts


that QTQ = I and x' = QTx, the equation can be rewritten

dt
or dX1/dt = Xl, dx2/dt = 3x2. In the eigenvectorbasis, the differential equations are decoupled. They can be solved separately.
The above representation of A can be found for any matrixA that
satisfies the self-adjointness condition A = T. We have the foll(Ning
theorem.

Algebraic Eigenvalue
| 4 The

Problem

37

is self(Self-adjointmatrix decomposition). If A e
1.11
Tbeorem
thereexistsa unitary Q e cnxn and real, diagonal A e
then
adjoint,
nxnsuch that

A = QAQ*

of A, Aii, are the eigenvalues of A. The


elements
diagonal
The
real, even if A is not. The columns of the matrix Q
all
are
eigenvalUeS
eigenvectors Vi corresponding to the eigenvalues.
(normalized)
arethe
are orthonormal and form a basis for cn.
eigenvectors
The
every self-adjoint matrix operator, there
Thisresult shows that for
basis, in which the matrix becomes diagonal.
orthogonal
natural
a
is
the matrix. Since the eigenDIAGONALIZES
Thatis, the transformation
multiplication reduces to simple contraction
valuesare all real, matrix
(eigenvector)coordinate axes. In this basis, any
orstretchingalong the
equations containing Ax relinearsystemsof algebraic or differential
equations.
duceto n decoupled
We have
Thatthe eigenvalues are real can be established as follows.
noting that A* = A.
Av= liv and, by taking adjoints, v *A = v*, after

the first on the left by v* and the second on the right by v


Multiply
andsubtractto obtain 0 = (R )v*v. We have that v* v is not zero

sincev * 0 is an eigenvector, and therefore = and is real.


IfAhas distinct eigenvalues, the eigenvectors are orthogonal, which
is alsoreadilyestablished. Given an eigenvalue and corresponding
Vi,we have that Avi = iVi.Let (j,vj) be another eigeneigenvector
pairsothat Avj = jvj. MultiplyingAvi = iVion the left by , and
and subtracting gives (Ri j) (Vi*vj) = 0.
Avj= j on the left by
Iftheeigenvaluesare distinct, * j this equation can hold only if
vtvj = 0, and therefore Vi and vj are orthogonal.
Forthe case of repeated eigenvalues, since orthogonality holds for
eigenvalues
that are arbitrarily close together but unequal, we might

expectintuitivelythat it continues to hold when the eigenvaluesbecomeequal.This turns out to be true, and we delay the proof until we
haveintroducedthe Schur decomposition in Section 1.4.6.

1.43 General(Square) Matrices


Although
manymatrices arising in applications are self-adjoint, many
othersare not, so it is important to include the results for these cases.
Nowtheeigenvectorsdo not necessarily form an ONbasis, nor can the
matrixalwaysbe diagonalized. But it is possible to come fairly close.
Thereare three
cases:

Linear Algebra

38

1. If A is not self-adjoint, but has distinct eigenvalues (Ri * j, i * j),


then A can be diagonalized

A-SAS 1

(1.11)

As before, A = S-IAS is diagonal,and contains the eigenvalues


(not necessarily real) of A. The columns of S contain the corresponding eigenvectors. The eigenvectors are LI, so they form a
basis, but are not orthogonal.
2. If A is not self-adjoint and has repeated eigenvalues, it may still be
the case that the repeated eigenvalues have distinct eigenvectors
e.g. a root with multiplicity two that has two linearly independent
eigenvectors. Here A can be diagonalized as above.

3. If A is not self-adjoint and has repeated eigenvalues that do not


yield distinct eigenvectors,it cannot be completely
a matrix of this type is called DEFECTIVE.
Nevertheless,it can
always be put into Jordan form J
A = MJM -I

(1.12)

where J = M-IAM is organized as follows: each distinct eigenvalue appears on the diagonal with the nondiagonal elements of
the correspondingrow and column being zero, just as above,
However, repeated eigenvalues appear in JORDANBLOCKS
with
this structure (shown here for an eigenvalue of multiplicity three)

In the case of repeated eigenvalues,we can distinguish between


ALGEBRAIC
multiplicity and GEOMETRIC
multiplicity. Algebraic
multiplicityof an eigenvalueis simply its multiplicityas a root
of the characteristic equation. Geometricmultiplicity is the number of distinct eigenvectors that correspond to the repeated eigenvalue. In case 2 above, the geometric multiplicity of each repeated
eigenvalueis equal to its algebraic multiplicity. In case 3, the algebraic multiplicity exceeds the geometric multiplicity.
For a non-self-adjoint5 by 5 matrix with repeated eigenvalues

Algebraic Eigenvalue
4 The

Problem

37

is self(Self-adjoint matrix decomposition). If A e


1.11
Tbeorem
existsa unitary Q e cnxn and real, diagonalA e

adjoint,then there
ttxnsuch that

A = QAQ*

of A, Aii, are the eigenvalues of A. The


The diagonalelements
are all real, even if A is not. The columns of the matrix Q
eigenvalUeS
eigenvectors Vi corresponding to the eigenvalues.
arethe (normalized) orthonormal and form a basis for Ctl.
are
Theeigenvectors that for every self-adjoint matrix operator, there
Thisresult shows
basis, in which the matrix becomes diagonal.
is a natural orthogonal
the matrix. Since the eigenDIAGONALIZES
Thatis, the transformation
reduces to simple contraction
valuesare all real, matrix multiplication
or stretchingalong the (eigenvector)coordinate axes. In this basis, any
linearsystems of algebraic or differential equations containing Ax re-

duceto n decoupled equations.


That the eigenvalues are real can be established as follows. We have
Av = Av and, by taking adjoints, v* A = v*, after noting that A* = A.

the first on the left by v* and the second on the right by v


Multiply
andsubtractto obtain 0 = (A)v*v. We have that v* v is not zero
sincev * 0 is an eigenvector,and therefore = and is real.

IfA has distinct eigenvalues, the eigenvectors are orthogonal, which


is alsoreadily established. Given an eigenvalue and corresponding
eigenvectorVi,we have that Avi = iVi.Let (j,vj) be another eigen-

pairso that Avj = jvj. MultiplyingAvi = iVion the left by v}, and
Avj = on the left by
and subtracting gives (Ri
= 0.
If the eigenvaluesare distinct, *
this equation can hold only if
Vi*vj= 0, and therefore Vi and vj are orthogonal.
Forthe case of repeated eigenvalues, since orthogonality holds for
eigenvaluesthat are arbitrarily close together but unequal, we might
expectintuitivelythat it continues to hold when the eigenvalues becomeequal. This turns out to be true, and we delay the proof until we
haveintroduced the Schur decomposition in Section 1.4.6.
1.4.3 General (Square) Matrices

Althoughmany matrices arising in applications are


self-adjoint, many
Othersare not, so it is important to
include the results for these cases.
Nowthe eigenvectorsdo not
necessarily form an ONbasis, nor can the
matrixalwaysbe diagonalized.
But it is possible to come fairly close.
Thereare
three cases:

38

Linear

AlgebtQ

but has distinct eigenvalues


1. If A is not self-adjoint,
(Ri *
diagonalized
then A can be

j),

A = SAS-I

(1.11)

As before,A = S-IAS is diagonal, and contains the


(not necessarilyreal) of A. The columns of S containeigenvalues
the
sponding eigenvectors. The eigenvectors are
LI, so they
form

2. IfA is not self-adjoint and has repeated eigenvalues, it may


stillbe
the case that the repeated eigenvalues have distinct eigenvectors
e.g. a root with multiplicity two that has two linearly
independent
eigenvectors. Here A can be diagonalized as above.

3. If A is not self-adjoint and has repeated eigenvalues that do not


yield distinct eigenvectors, it cannot be completely diagonalized;
a manixof this type is called DEFECTIVE.
Nevertheless, it can
always be put into Jordan form J
A = MJM -1

(1.12)

where J = M-IAM is organized as follows: each distinct eigenvalue appears on the diagonal with the nondiagonal elementsof
the correspondingrow and column being zero, just as above.
However,repeated eigenvalues appear in JORDANBLOCKS
with
this structure (shown here for an eigenvalue of multiplicity three)

In the case of repeated eigenvalues, we can distinguish between


multiplicity. Algebraic
ALGEBRAIC
multiplicity and GEOMETRIC
multiplicityof an eigenvalue is simply its multiplicity as a root
of the characteristic equation. Geometric multiplicity is the number of distinct eigenvectors that correspond to the repeated eigenvalue. In case 2 above, the geometric multiplicity of each repeated
al-

eigenvalueis equal to its algebraic multiplicity. In case 3, the


gebraic multiplicity exceeds the geometric multiplicity.

For a non-self-adjoint5 by 5 matrix with repeated eigenvalues

Algebraic Eigenvalue
| 4 Te

39

Problem

are
The eigenvectors corresponding to the distinct eigenvalues
the corresponding columns

of M. A distinct eigenvector does not

existfor each of the repeated eigenvalues,but a GENERALIZED

can be found for each occurrence of the eigenvalue.


EIGENVECTOR
These vectors, along with the eigenvectors, form a basis for R?i.
nonsymmetric matrix
Example1.12; A
eigenvectors of the nonsymmetric matrix
Findthe eigenvalues and

andshowthat it can be put in the form of (1.11).


Solution

Thismatrix has characteristic equation (1 R)(3 R) = 0 and thus has


eigenvalues = 1, R = 3. For = 1, the eigenvector solves

andit is straightforward to see that this is satisfied by [Xl, X2JT= VI =

[1, OJT. For R = 3 we have

whichhas solution v2 = [1, IJ T. Here VI and v'2 are not orthogonal,


but they are LI, so they still form a basis. Letting

onecan determine that


O

Linear Algebra

40

Sincethe columns of S are not orthogonal, they cannot be normalized


to form a matrix that satisfies S-1 = ST. Nevertheless, A can be diagonalized
s -I AS = A =

Example 1.13: A defective matrix


Find the eigenvalues and eigenvectors of the nonsymmetric matrix

and show that it cannot be put in the form of (1.11),but can be put in
the form of (1.12).
Solution
The characteristic equation for A is (3 = 0, so A has the repeated
eigenvalue = 3. The eigenvector is determined from

which has solution x = VI = [1,01T. There is not another nontrivial


solution to this equation so the repeated eigenvalue = 3 has onlyone
eigenvector. We cannot diagonalize this A.
Nevertheless, we will seek to nearly diagonalize it, by finding a gen-

eralized eigenvectorv2 that allows us to construct a matrix

satisfying
bf-I AM = J =

Multiplyingboth sides of this equation by M yields that


1

which can be rearranged to

-t

Algebraic
| 4 The

Eigenvalue Problem

41

can be rewritten as the pair of equations


Thisequation

Thefirst of these is simply the equation determining the true eigenvector VI,while the second will give us the generalized eigenvector v2. For
this equation is
thepresent problem

Asolutionto this equation is v2 = [0, 1/2]T. (Any solution v'2must be


LIfrom VI. Why?) Constructing the matrix

0 1/2
onecan show that

o 2
and that
J = M -I AM =

Notethat we can replace v2 by va +

for any

and still obtain this

result.

1.4.4 Positive Definite Matrices

Positivedefinite and positive semidefinite matrices show up often in


applications.Here are some basic facts about them. In the following,A

is real and symmetric and B is real. The matrix A is POSITIVEDEFINITE


(denotedA > 0), if

x TAx > 0, V nonzero x e [R?t


Thematrix A is POSITIVESEMIDEFINITE
(denoted A
x TAx

0,

V x e Rn

Youshould be able to prove the following facts.

O,

e eig(A)

0), if

42

Linear Algebra

2. A 0

R 20, Reeig(A)

3.

B TABO

VB

BTAB > 0
5. A > 0 and B full column rank
BTAB > O

4. A > 0 and B nonsingular


6. Al > 0,A2

7. A > O

8. For A 0,

A = Al + A2 > O

Az > 0 V nonzero z e

Ax = O

If symmetric matrix A is not positive semidefinite nor negative semidefinite, then it is termed indefinite. In this case A has both positiveand
negative eigenvalues.

1.4.5 Eigenvalues, Eigenvectors, and Coordinate Transformations


Under the general linear transformation

Y = Ax

(1.13)

all the components of the vector y are coupled to all the components
of the vector x via the elements of A, all of which are generally nonzero.
We can always rewrite this transformation using the eigenvaluedecomposition as
Y = MJM-1x
Now consider the coordinate transformation x' = M-I x and y' =
M-1y. In this new coordinate system, the linear transformation, (1.13)
becomes

Y' = Jx'

In the "worst case scenario,"J has eigenvalueson the diagonal,some


values of 1 just above the diagonal and is otherwise zero. In the more

usual scenario J = A and each component of y' is coupled only to one


component of x'the coordinate transformation associated with the
eigenvectors of A provides a coordinate system in which the different
components are decoupled. This result is powerful and is used in a
wide variety of applications.

Further considering the idea of coordinate transformationsleads


naturally to the question of the dependence of the eigenvalueproblem
on the coordinate system that is used to set up the problem. Giventhat
Ax = Rx

1.4

43

Eigenvalue Problem
TheAlgebraic

TX,where T is invertiblebut otherwise arbitrary; this


let us take x' =
unprimed
expressionrepresents a coordinate transformation between
1.3.11.
as we have already described in Section
andprimed coordinates,

and thus
Nowx = T-lx',

AT-I x' = AT-I x'

side
Multiplyingboth sides by T to eliminate T-1 on the right-hand

yields

TAT I x' = Rx'

are the
Recallthat we have done nothing to the eigenvaluesthey eigenthe
samein the last equation of this sequence as the first. Thus

if two
valuesof TAT-I are the same as the eigenvalues of A. Therefore,
a
matricesare related by a transformation B = TAT-I, which is called
their eigenvalues are the same. In other
TRANSFORMATION,
SIMILARITY
under similarity transwords,eigenvaluesof a matrix are INVARIANT
formations.
In many situations, invariants other than the eigenvalues are used.

Thesecan be expressed in terms of the eigenvalues. The two most


commonare the TRACEof a matrix A

trA =

Aii

and the determinant

= (-1) m

n
i=l

Uii -

Example1.14: Vibrational modes of a molecule

Theindividualatoms that make up a molecule vibrate around their


equilibriumpositions and orientations. These vibrations can be used

to characterize the molecule by spectroscopy and are important in determiningmany of its properties, such as heat capacity and reactivity.
Weexaminehere a simple model of a molecule to illustrate the origin
and nature of these vibrations.
Let the ath atom of a molecule be at position xa = [xa, ya, za] T and
havemass ma. The bond energy of the molecule is U (Xl, x2, x3, ,.. , XN)
whereN is the number of atoms in the molecule. Newton's second law
for each atom is
U(x1, ,XN)
d2xa
dt2
xa

Linear Algebra

44

and M be a 3N x 3N

Let X =

on the diagonals. That


diagonalmatrix with the masses of each atom
= M66 =
is, Mll M22= M33= ml, M44 = M55
= TIN. Now the equations of motion for the
= M3N,3N
M3N-1,3N-1
coordinates of the atom become
U(X)
d2xj
Mij dt2

An equilibriumshape Xeqof the molecule is a minimum of the bond

energy U, and can be found by Newton-Raphson iteration on the problem


= 0. AssumeXeqis known and characterize small-amplitude
vibrations around that shape.
Soluon

Let R = X Xeqbe a small perturbation away from the equilibrium


d2x
shape. Because+ = O,this perturbation satisfies the equation
d2Rj

Mij dt2

U(Xeq+ R)

Taylor expandingthe right-hand side of this *tion,


that

xeq= 0, and neglecting terms of O(IIRII ) yield


U(Xeq + R)

xt
where
Hik =

using the fact

HikXk

2U

XiXk xeq

is called the HESSIAN


matrix for the function U. Thus the governing
equation for the vibrations is given by

Mijd2xj = -HikXk
By definition, H is symmetric. Furthermore, rigidly translating the en-

tire moleculedoes not change its bond energy, so H has three zero
eigenvalues,with eigenvectors

o, 1,01 T
o, o, 11T

1.4 TheAlgebraic Eigenvalue Problem

45

Thesecorrespondto movingthe wholemoleculein the

x, y, 'and z
directionS,respectively. Furthermore, because Xeq
is a minimum of
energy,
H
is
also
bond
the
positive semidefinite.
Weexpect the molecule to vibrate, so we will
seek oscillatory solutions. A convenient way to do so is to let
R(t) = ze iwt + e -iwt

cos wt + i sin
recallingthat for real w, ei(0t
wt. Substituting into the
equation
yields
governing
-w 2Mij(Zje iWt+ je -iwt ) = -Hik(ae iWt+ ae -Wt)

Gatheringterms proportional to eiwt and e-iwt, we can see that this


equationwill be satisfied at all times if and only if
(0 2Mijzj = HikZk

(1.14)

Thislooks similar to the linear eigenvalue problem, (1.10), and reduces


exactlyto one if all atoms have the same mass m (in which case M
ml).

Wecan learn more about this problem by considering the properties of M and H. Since M is diagonal and the atomic masses are positive,M is clearly positive definite. Also recall that H is symmetric
positive semidefinite. Writing M = L2, where L is diagonal and its diagonalentries are the square roots of the masses, we can qmitethat
w2L2Z = I-IZ, Multiplyingby L-1 on the left yields w 2LZ = L-I HZ
and letting = LZ results in w 2 = L-I HL-I . This has the form
of an eigenvalue problem f-I = w 2, where fl = LA I-IL-I . Solving
this eigenvalueproblem gives the frequencies w at which the molecule
vibrates. The corresponding eigenvectors , when transformed back
into the original coordinates via Z = L-I , give the so-called "normal
modes."Each frequency is associated with a mode of vibration that in
generalinvolves different atoms of the molecule in different ways. -Becausefl is symmetric, these modes form an orthogonal basis in which
to describethe motions of the molecule. A further result can be obtained by multiplying (1.14) on the left by ZT, yielding
w 2z TMZ = z THZ

BecauseZTMZ > 0 and ZTHZ 0, we can concludethat (02 0 with


equality only when Z is a zero eigenvectorof H. This result shows

Linear Algebra

46

that the frequenciesw are real and thus that the dynamics are purely

oscillatory.
Observe that the quantity ZTMZ arises naturally in this problem: via

the transformation = LZ it is equivalentto the inner product T

It is straightforward to show that for any symmetric positive definite


W, the quantity xTWy satisfies all the conditions of an inner product
between real vectors x and y; it is called a WEIGHTED
inner product.

eigenvectors
are
the
orthogonal
In the current case,
under the usual
"unweighted"inner product, in which case the vectors Z = L-I are
orthogonal under the weighted inner product with W = M.
1.4.6 Schur Decomposition
A major problem with using the Jordan form when doing calculations
on matrices that have repeated eigenvalues is that the Jordan form is
numerically unstable. For matrices with repeated eigenvalues, if di-

agonalizationis not possible,it is usually better computationallyto

use the Schur form instead of the Jordan form. The Schur form only
triangularizes the matrix. Triangularizing a matrix, even one with repeated eigenvalues, is numerically well conditioned. Golub and Van
Loan (1996, p.313) provide the following theorem.
Theorem 1.15 (Schur decomposition). If A e
unitary Q e c nxn such that

then there existsa

Q*AQ = T
in which T is upper triangular.

The proof of this theorem is discussed in Exercise 1.43. Note that


even though T is upper triangular instead of diagonal,
its diagonal elements are still its eigenvalues. The eigenvalues
of T are also equal
to the eigenvaluesof A because T is a the result
of a similaritytransformation of A. Even if A is a real matrix, T can
be complex because
the eigenvalues of a real matrix may come
in complex conjugate pairs.
Recall a matrix Q is unitary if Q *Q I. You
should also be able to
prove the following facts (Horn and Johnson,
1985).
1. If A e
and BA = I for some B e
then
(a) A is nonsingular
(b) B is unique

Algebraic
1.4 The

47

Eigenvalue Problem

is unitary if and only if


2, The matrix Q
1
(a) Q is nonsingular and Q* = Q

(b) QQ* = I
(c) Q* is unitary
(d) The rows of Q form an orthonormal set
(e) The columns of Q form an orthonormal set
If A is self-adjoint,then by taking adjoints of both sides of the Schur
decompositionequality,we have that T is real and diagonal, and the
columnsof Q are the eigenvectors of A, which is one way to show that
the eigenvectorsof a self-adjoint matrix are orthogonal, regardless of
whetherthe eigenvaluesare distinct. Recallthat we delayed the proof
of this assertion in Section 1.42 until we had introduced the Schur
decomposition.

If A is real and symmetric, then not only is T real and diagonal, but
Q can be chosen real and orthogonal. This fact can be established by
notingthat if complex-valuedq = a + bi is an eigenvector of A, then
so are both real-valued vectors a and b. And if complex eigenvector qj
is orthogonalto qk, then real eigenvectors aj and bj are orthogonal to
real eigenvectorsak and bk, respectively. -The theorem summarizing
this case is the following (Golub and Van Loan, 1996, p.393), where,
again,it does not matter if the eigenvalues of A are repeated.
Theorem 1.16 (Symmetric Schur decomposition). If A e

is sym-

metric,then there existsa real, orthogonal Q and a real, diagonal A

suchthat

Q TAQ = A = diag(1,

,n)

wherediag(a, b, c, .. .) denotes a diagonal matrix with elements a, b, c

on the diagonal.

Notethat the {M}are the eigenvaluesof A and the columns of Q,

{qi},are the corresponding eigenvectors.


For real but not necessarily symmetric A, you can restrict yourself
to real matrices by using the real Schur decomposition (Golub and Van
Loan,1996,p.341). But the price you pay is that you can achieve only

blockupper triangular T, rather than strictly upper triangular T.

48

Linear Algebra

Theorem 1.17 (RealSchur decomposition). IfA e R'txn then there


ists a real, orthogonal Q such that
Rim
QTAQ =

R22

R2m

Rmm

in which each Rii is either a real scalar or a 2x2 real matrix having com-

plex conjugate eigenvalues; the eigenvalues of Rii are the eigenvalues

ofA.

1.4.7 Singular Value Decomposition


Another highly useful matrix decomposition that can be applied to nonsquare in addition to square matrices is the singular value decomposition (SVD).Any matrix A e c mxn has an SVD
A = USV*

in which U e

and V e

are square and unitary

V*V = VV* = In
and S - Rtnxnis partitioned as
O(mr)xr

in which r is the rank of the A matrix. The matrix 2 is diagonaland


real

in which the diagonal elements, iare known as the singular values of


matrix A. The singular values are real and positive and can be ordered
from largest to smallest as indicated above,
Connection of SVD and eigenvalue decomposition. Given A e cmxn

with rank r, consider the Hermitian matrix AA* Rtnxm,also of


e
rank r. Wecan deduce that the eigenvalues of
AA* are real and nona
negative as follows. Given (R,v) are an eigenpair of AA*, we have

1.4 The Algebraic

49

Eigenva/ue Problem

AA*v= v,v * O. Taking inner products of both sides with respect


to v and solving for gives = v*AA*v/v*v. We know v* v is
a real, positive scalar since v * 0. Let y = A*v and we have that
= y*y/v*v and we know that y* y is a real scalar and y* y 0.
O. And we can connect the eigenvalues and
Therefore is real and
of AA* to the singular values and vectors of A. The r
eigenveCtS
nonzero eigenvalues of AA* (i)are the squares of the singular values
(i)and the eigenvectors of AA* (qi) are the columns of U (ui)
= i2(A) i = 1 r

Nextconsider the Hermitian matrix A*A e Rnxn,also of rank r. The


r nonzero eigenvalues of A*A (M)are also the squares of the singular

values (i)and the eigenvectors of A*A (Qi)are the columns of V (Vi)

= i
2(A) i = 1

Theseresults follow from substituting the SVDinto both products and


comparingwith the eigenvalue decomposition
=
= US2U* = QAQ 1

-1
= VS 2V* = QAQ

Real matrix with full row rank. Consider a real matrix A with more
columnsthan rows (wide matrix, m < n) and full row rank, r = m. In
this case both U and V are real and orthogonal, and the SVDtakes the
form

in whichVI contains the first m columns of V, and V2contains the


remainingm n columns. Multiplyingthe partitioned matrices gives
T
A = UEVI
and notice that we do not need to store the V2matrix if we wish to
representA. This fact is handy if A has many more columns than
rows,n > m because V2e
requires a large amount of storage
compared to A.

Linear Algebra

50

Ax b

R(At)

RCA)

(ui)l-l

N(AT)

Figure 1.4: The four fundamentalsubspaces of matrix A = USVT.

The range of A is spannedby the first r columnsof


The range AT is spanned by the first
U,
r columnsof V, {VI, ...,vr}. The null space of A is
spanned by

and the null space of AT is

spanned by

Real matrix with full column rank. Next consider the case in which
real matrix A has more rows than columns (tall matrix, m > n) and full
column rank. In this case the SVDtakes the form

2
in which UI contains the first n columns of U, and U2 contains the
remaining n m columns. Multiplyingthe partitioned matrices gives
A = UIEV T

and notice that we do not need to store the U2 matrix if we wish to


represent A.

1.4 The

Algebraic Eigenvalue Problem

51

SVDand fundamental theorem of linear algebra. The SVD provides


an orthogonal decomposition of all four of the fundamental subspaces
of matrix A. Consider first the partitioned SVDfor real-valued A
A = [UI

U2

VIT

Nowconsider AVkin which k r + 1. Because VI,is orthogonal to


vr}, we have AVk = 0, and these n r orthogonal VI,
VI = {VI,... ,
span the null space of A. Because the columns of VI are orthogonal to
this set, they span the range of AT. Transposing the previous equation
gives
A T = VIEUIT

andwe have {ur+l,... , um} span the null space of AT. Becausethe
columnsof UI are orthogonal to this set, they span the range of A.

Theseresults are summarized in Figure 1.4.


SVD and least-squares problems. We already have shown that if
A has independent columns, the unique least-squares solution to the
overdeterminedproblem
minllAx

b11

is given by
Xls = (A TA) 1
ATb
Xls = At b

TheSVDalso provides a means to compute Xls. For real A, the SVD


satisfies

S.
AT=

A = UIEV T
A T A = V2UTUIEV

T = vz 2v T

Thepseudoinverseis therefore given by

A t = v2 -2v TvE(JIT
At =

I UIT

and the least-squares solution is


Xls = VE -1 U Tb

52

Linear

Algebra

SVD and underdetermined problems. We already have


shownthat
if A has independent rows, the unique minimum-norm
solution to the
min llx112 subject to Ax = b
is given by
Xmn = A T (AA T ) b

The SVDalso provides a means to compute xmn. In this case


we
A = UEVITand substituting this into the minimum-norm solution have
gives
xmn = VIE-IUTb

Note the similarity to the least-squares solution above.

1.5 Functions of Matrices


1.5.1 Polynomial and Exponential
We have already defined some functions of square matrices usingmatrix multiplication and addition. These operations create the classof
polynomial functions

+ aol
with A e c nxn, e C, i = 1, n. Wewish to expand this set of functions so that we have convenient ways to express solutions to coupled
sets of differential equations, for example. Probably the most important function for use in applications is the matrix exponential.The
standard exponential of a scalar can be defined in terms of its Taylor
series
2!

3!

This series convergesfor all a e C. Noticethat this expressionis an


infinite-order series and therefore not a polynomial function. Wecan
proceed to define the matrix exponential analogously
2!

3!

and this series convergesfor all A e cnxn. Let's see why the matrix
exponential is so useful. Consider first the scalar first-order linear dif-

ferential equation

dx = ax
dt

x(O) = xo

x e

e R

1.5 Functions

53

of Matrices

is
whicharises in the simplest chemical kinetics models. The solution
givenby
x(t) = xoeat
and this is probably the first and most important differential equation
that is discussed in the introductory differential equations course. By
definingthe matrix exponential we have the solution to all coupled sets
of linear first-order differential equations. Consider the coupled set of
linearfirst-orderdifferential equations

all a12
dt

xn

aln

a2n

anl an2

ann

xn

with initial condition


X10
X2(0)

X20

xno

whichwe express compactly as

dx
dt

= Ax

x(O) = xo

x e Rn , A e [Rnxn

(1.15)

Thepayoff for knowing the solution to the scalar version is that we also
knowthe solution to the matrix version. Wepropose as the solution
x(t) = eAtxo

(1.16)

Noticethat we must put the xo after the eAt so that the matrix multiplicationon the right-hand side is defined and gives the required n x 1
columnvector for x(t). Let's establish that this proposed solution is
indeedthe solution to (1.15). Substituting t = 0 to check the initial
condition gives

x(0) = eA0 xo = e oxo = Ixo = xo

Linear Algebra

and the initial condition is satisfied. Next differentiating the matrix


exponentialwith respect to scalar time gives
g e.4t= (1 + tA + A2+ LA3 +
2!

dt

3!

2!

+ )
+ LAI + t2A2
1!

2!

We have shov,mthat the scalar derivative formula d/dt(eat) = aeat


also holds for the matrix case, d/dt(eAt) = AeAt. We also could have
factored the A to the right instead of the left side in the derivation above

to obtain d/dt(eAt) = eAtA.Note that although matrix multiplication


does not commute in general, it does commute for certain matrices,
such as eAt and powers of A. Finally, substituting the derivative result
into (1.15)gives

= A (eAtxo) = (AeAt)xo = A(eAtxo) = Ax


and we see that the differential equation also is satisfied.
Another insight into functions of matrices is obtained when we consider their eigenvalue decomposition. Let A = SAS-I in which we assume for simplicitythat the eigenvalues of A are not repeated so that
A is diagonal. First we see that powers of A can be written as follows
for p 1

ptimes

(SAS-I )

ptimes

= SAPS-I
Substituting the eigenvalue decomposition into the definition of the

1.5

Functions of

55

Matrices

exponential gives
matrix
t3
t2
eAt = 1 + tA + -A2 + ----A3+
3!
2!
2

= SS -I + tSAS -1 + SRS
2!

3!

eAt = seAts1

examining the
Therefore,we can determine the time behavior of eAtby
behaviorof
e1t

eAt

e2t

ent

andwededuce that eAtasymptotically approaches zero as t -+ 00if and

n. We also know that eAt is oscillatoryif


onlyifRe(i)< 0, t' 1
anyeigenvaluehas a nonzero imaginary part, and so on.
The matrix exponential is just one example of expanding scalar
functionsto matrix functions. Any of the transcendental functions

(trigonometricfunctions, hyperbolic trigonometric functions, logarithm,


squareroot, etc.) can be extended to matrix arguments as was shown
herefor the matrix exponential (Higham, 2008). For example, a square
root of a matrix A is any matrix B that satisfies B2 = A. If A = SAS-I
1/2 1/2,...).
then one solution is B = SAI / 2S -1 where A l / 2 = diag(1
Moregenerally, Al / 2 can be replaced by Q*A 1/2Q for any unitary ma-

trixQ. Moreover,for any linear scalar differential equation having solutionsconsistingof these scalar functions, coupled sets of the correspondinglinear differential equations are solved by the matrix version
of the function.

Boundon eAt. When analyzing solutions to dynamic models, we often


wishto bound the asymptotic behavior as time increases to infinity. For
lineardifferentialequations, this means we wish to bound the asymptoticbehaviorof eAt as t 00.Webuild up to a convenient bound in a
fewsteps. First, for scalar z e C, we know that
le Z l =

Re(z)+lm(z)i

Re(z)

Re(z)

Linear Algebra

56

diagonal
Similarly,ifwe have a
of eD is
then the matrix norm

matrixD e cnxn, D = diag(dl, d2,.


IleDxIl
IIXII

fact, if this max over the real parts of the


In
(Re(di)).
maxi
in which =
, then x = er achieves the maximumin
eigenvaluesoccurs for index i*
nonnegative time argument t 0, we also
IleDxll/ llxll. Givena real,

have that

t 20

we can use A = SAS-I to obtain


Next,if the matrix A is diagonalizable,
eAt = seAts-1

and we can obtain a bound by taking norms of both sides


IleAtll = IISeAtS -l ll s IISII IleAt ll IIS -I ll

For any nonsingularS, the product IISIIIIS-Ill is defined as the condition number of S, denoted K(S). A bound on the norm of eAtis
therefore

IleAtll

A diagonalizable

in which = maxi Re(i) = max(Re(eig(A))). So this leaves only the


case in which A is not diagonalizable. In the general case we use the

Schur form A = QTQ*, with T upper triangular. Van Loan (1977) shows
that3
IINtlIk
(1.17)
t0
Y
IleAtII
k!

in which N = T A where A is the diagonal matrix of eigenvalues and

N is strictlyupper triangular, i.e., has zeros on as well as belowthe


diagonal. Note that this bound holds for any A e cnxn. VanLoan
(1977)also shows that this is a fairly tight bound compared to some
popular alternatives. If we increase the value of by an arbitrarily small
amount, we can obtain a looser bound, but one that is more convenient
for analysis. For any R' satisfying

R' > max(Re(eig(A)))


3Note that there is a typo in (2.11) in Van Loan (1977), which isorrected

here.

57

of Matrices
Functions
1.5

constant
thereis a

c > O such that

t 20

IleAt

(1.18)

also for any A e cnxn. Note that the constant c deholds


result
This
e matrix A. Establishingthis result is discussed in Exercise
th
on
pends
one useful consequence of this bound, consider
1.71.To demonstrate
all eigenvalues of A have strictly negative real parts.
thecasein which such that Re(eig(A)) < R' < 0, and (1.18)tells us
Thenthere exists R'
0 exponentially fast as t 00for the entire class of "stable"
that eAt
do not need to assume that A has distinct eigenvalues
We
matrices.
A
for example, to reach this conclusion.
or is diagonalizable,
Quadratic Functions
15.2 Optimizing
in many engiOptimizationis a large topic of fundamental importance
control, and process
neeringactivitiessuch as process design, process
important
operations.Here we would like to introduce some of the
functions.
conceptsof optimization in the simple setting of quadratic
to make this discussion
Younowhave the required linear algebra tools
accessible.

finding the
Scalarargument. The reader is undoubtedly familiar with
maximumand minimum of scalar functions by taking the first deriva-

to
fiveand setting it to zero. For conciseness, we restrict attention
(unconstrained)minimization, and we are interested in the problem4
min f (x)
Whatdo we expect of a solution to this problem? A point x o is termed

a minimizerif f (xo) f (p) for all p. A minimizer xo is unique if


noother point has this property. In other words, the minimizer x o is
uniqueprovidedf (xo) < f (p) for all p * xo. We call x o the minimizerand f 0 = f (xo) the (optimal) value function. Note that to avoid
confusionf 0 = minx f (x) is called the "solution" to the problem, even
thoughxo is usually the item of most interest, and x o = arg minx f (x)
is calledthe "argument of the solution."
and is unique, and how
Wewish to know when the minimizer
to computeit. We consider first the real, scalar-valued quadratic functionof the real, scalar argument x, f (x) = (1/2)ax2 + bx + c, with
4Wedo not lose generality with this choice; if the problem of interest is instead
use the following identity to translate: maxx f(x) = minx f(x).

58

Linear Algebra

a, b, c, x e R. Puttingthe factor of 1/2 in front of the quadratic


term
is a convention to simplifyvarious formulas to be derived next. If
we
take the derivativeand set it to zero we obtain

f (x) = (1/2)ax2 + bx + c
x o = bla
This last result for xo is at least well defined provided a * 0. Butif
we are interested in minimization,we require more: a Ois required
for a unique solution to the problem minx f (x). Indeed, taking a second derivativeof f G) gives d2/dx2f(x) = a. The condition a 0 is

usually stated in beginning calculus courses as: the function is concave

upward. This idea is generalizedto the conditionthat the functionis

strictly convex,which we define next. Evaluating f (x) at the proposed


minimizer gives f 0 = f (xo) = (1/2)b2/a+c.
Convex functions. Generalizing the simple notion of a function having posive curvature (or being concave upward) to obtain existence
and uniqueness of the minimizer leads to the concept of a convexfunction, which is defined as follows (Rockafellarand Wets, 1998, p. 38).
Definition 1.18 (Convex function). Let function f G) be defined on all
reals. Consider two points x, y and a scalar that satisfy O
1.

The function f is convexif the followinginequality holds for all x, y


and

[0, 1]

f (ax + (1 a)y)

af(x) + (1

Figure 1.5 shows a convex function. Notice that if you draw a straight

line connectingany two points on the function curve, if the function


f

is convex the straight line lies above the function. We say the function

f() is STRICTLY
CONVEX
if the inequalityis strict for all x * y and
f (ax + (1 a)y) < af(x) + (1 a)f(y)

Notice that x and y are restricted to be nonequal and is restricted to


lie in the open interval (O,1) in the definition of strict convexity(orno
function would be strictly convex).
That strict convexity is sufficient for uniqueness of the minimizeris
established readily as follows. Assume that one has found a (possibly
nonunique) minimizer of f( i), denoted xo, and consider another point

Functionsof Matrices

59

fly)

flax + (1 a)y)

f(x)
x
Figure 1.5: Convex function. The straight line connecting two points
on the function curve lies above the function; af(x) +

(1

f (ax + (1 a)y) for all x, y.

p * x o. Weknow that f (p) cannot be less than f (xo) or we contradict

optimalityof x o. We wish to rule out f (p) = f (xo), also, because


equalityimplies that the minimizer is not unique. If f G) is strictly

convexand f (p) = f (xo), we have that

f(ax o + (1 a)p) < af(x 0) + (1

= f (xo)

Soforallz = ax o + (1 with e (0, 1), z * x o and f (z) < f(x 0),


whichalso contradicts optimality of x o. Therefore f(x 0) < f (p) for
allp * x o, and the minimizer is unique. Notice that the definition of

convexitydoes not require f ( ) to have even a first derivative, let alone

a secondderivativeas required when using a curvature condition for


uniqueness.But if f ( ) is quadratic, then strict convexityis equivalent
to positivecurvature as discussed in Exercise 1.72.
Vectorargument. We next take real-valued vector x e [VI,and the
generalreal, scalar-valued, quadratic function is f (x) = (1/2)xTAx +
bTx+ c with A e Rnxn b e RI, and c e R. Without loss of generality,
wecan assume A is symmetric.5 We know that the eigenvalues of a
51fA is not symmetric, show that replacing A with the symmetric A = ( 1/ 2) (A + AT)
doesnot change the function f (

60

Linear Algebra

as

Figure 1.6: Contoursof constantf (x) = x TAx; (a) A > O (or A <
O), ellipses; (b) A 0 (or A 0), straight lines; (c) A

indefinite, hyperbolas. The coordinate axes are aligned


with the contours if and only if A is diagonal.

symmetric matrix are real (see Theorem 1.16);the followingcases are


of interest and cover all possibilitiesfor symmetric A: (a) A > 0 (or
A < 0), (b) A 0 (or A 0), and (c) A indefinite. Figure 1.6 shows
contours of the quadratic functions that these A generate for x e R2.
SinceA is the parameter of interest here, we set b = 0, c = 0.6 The
positive cost contours are concentric ellipses for the case A > 0. The
contour for f = 0 is the origin, and minimizingf for A > 0 has the
origin as a unique minimizer. This problem corresponds to findingthe
bottom of a bowl. If A < O,contours remain ellipses, but the sign of
the contour value changes. The case A > Ohas the origin as a unique
maximizer. This problem corresponds to finding the top of a mountain.
For the case A 0, the positive contours are straight lines. The
line through the origin corresponds to f = 0. All points on this line
6Notethat c merely shifts the function f G) up and down,and b merelyshifts the

origin, so they are not important to the shape of the contours of the quadratic function

61

of Matrices
Functions
J.5

minimizersfor f G) in this case, and the minimizer is nonunique.


function is convex but not strictly convex. The function
Thequadratic long valley. As before, if A 0, contours remain
correspondsto a
sign of the contour value changes. For A 0,
straightlines, but the
but is not unique. The function is now a ridge.
themaximizer
techniques for numerically finding optima with
specialized
some
And
approaching this case are known as "ridge
badlyconditioned functions
regression."

hyperbolas.
ForindefiniteA, Figure 1.6 shows that the contours are
point in this case, because the function reTheoriginis termed a saddle
prefers to maintain
semblesa horse's saddle, or a mountain pass if one
without bound in
thetopographymetaphor. Note that f G) increases
bound

thenortheastand southwest directions,but decreases without


minimizer nor
in the southeast and northwest directions. So neither a
an important
a maximizerexists for the indefinite case. But there is
the
classof problems for which the origin is the solution. These are
minmaxor maxmin problems. Consider the two problems

maxminf (x)

minmaxf(x)

(1.19)

TheseIdnds of problems are called noncooperative games, and players


one and two are optimizing over decision variables Xl and x2, respectively.In this type of noncooperativeproblem, player one strives to
maximizefunctionf while player two strives to minimize it. Noncooperativegames arise in many fields, especially as models of economic
behavior.In fact, von Neumann and Morgenstern (1944) originally developedgame theory for understanding economic behavior in addition
to other features of classical games such as bluffing in poker. These
kindsof problems also are useful in worst-cases analysis and design.
Forexample,the outer problem can represent a standard design optimizationwhile the inner problem finds the worst-case scenario over
someset of uncertain model parameters.
Anotherimportant engineering application of noncooperative games
arisesin the introduction of Lagrangemultipliers and the Lagrangian
functionwhen solving constrained optimization problems. For the
quadraticfunction shown in Figure 1.6 (c), Exercise 1.74 asks you to

establishthat the origin is the unique solution to both problems in


(1.19).The solution to a noncooperativegame is known as a Nash
equilibriumor Nash point in honor of the mathematician John Nash

whoestablished some of the early fundamental results of game theory


(Nash, 1951).

62

Linear

Algebra

20
10

f(x)

Scalar

Vector

(1/2)ax 2 + bx + c

(1/2)x T Ax + b T x + c

ax + b

Ax + b

b/a
(1/2)b2/a + c

-(1/2)b

- xo)2 + f 0

f(x)

TA-1 b + c

(1/2) (x

xo) + fo

Table 1.1: Quadratic function of scalar and vector argument; a > 0,


A positive definite.

Finally, to complete the vector minimization problem, we restrict attention to the case A > O.Taking two derivatives in this case produces
f (x) = (1/2)x T Ax

+ b Tx + c

(x) = (1/2)(Ax + A T x) + b = Ax + b

Setting df/dx = Oand solving for xo, and then evaluating f (xo) gives
x o = A-I b

f 0 = (1/2)bTA-1 b + c

63

of Matrices
5 Functions

the scalar and vector cases are summarized in Table


for
results
These
in the last line of the table that one can reparameterize
also
Notice
1.1.
in terms of xo and f 0, in place of b and c, which is
thefunctionf()
often useful.

Revisiting linear least squares.


1.3.7
problemof Section

Consider again the linear least-squares

min(1/2) llx bll

with the nowherewe have changed Ax b to Ax b to not conflict


special case of a
tationof this section. We see that least squares is the
quadraticfunction with the parameters
c = (1/2) T

ObviouslyA is symmetric in the least-squares problem. We have alreadyderived the fact that T > 0 if the columns of A are independent
in the discussion of the SVDin Section 1.4.7. So independent columns

of A correspond to case (a) in Figure 1.6. If the columns of A are not


independent,then A 0, and we are in case (b) and lose uniqueness
of the least-squares solution. See Exercise 1.64 for a discussion of this
case.It is not possible for a least-squares problem to be in case (c),
whichis good,because we are posing a minimization problem in least
squares.So the solution to a least-squares problem always exists.

115.3Vec Operator and Kronecker Product of Matrices

Weintroducetwo final matrix operations that prove highly useful in


applications,but often are neglected in an introductory linear algebra
course.These are the vec operator and the Kronecker product of two
matrices.

The vec operator. For A e


the vec operator is defined as the
restackingof the matrix by its columns into a single large column vector

Linear

Algebra

All
.421

ml

All '412

A12

Aln

A 21

A 22

A2n

Am I

Ant2

mn

A22

vecA =
m2

Aln
A2n

mn

Note that vecA is a column vector in Rtnn. If we denote the n column


vectors of A as ai, we can express the vec operator more compactly
using column partitioning as

an

vecA

a2

an
Matrix Kronecker product. For A e Rtnxn and B e
necker product of A and B, denoted A B, is defined as

AllB A12B
A21B

A22B

AmIB Am2B

A2nB

the Kro-

(1.20)

AmnB

B,and
Note that the Kronecker product is defined for all matrices A and
the matrices do not have to conform as in normal matrix multiplication.
above,
By counting the number of rows and columns in the definition outer
we see that matrix A B e RntPXnq.Notice also that the vector
general
product, defined in Section 12.5, is a special case of this more
matrix Kroneckerproduct.

of Matrices
Functions
1.5

65

We next establish four useful identities insomeuseful identities.


and Kronecker product.
volvingthe vec operator
A)vecB

vec(ABC) = (CT

D) = (AC)

(A

(BD) A, C conform, B,D conform (1.22)


1

-I
(AB) -1 = A

(1.21)
(1.23)

A and B invertible

(1.24)

Be
and C e RPxq. Let the
Establishing(1.21). Let A e
columnpartitions of matrices B and C be given by

Weknowfrom the rules of matrix multiplication that the jth column


ofthe product ABC = (AB)C is given by ABcj. So when we stack these
columnswe obtain
ABCI

vec(ABC) =

ABC2

ABcq

Nowwe examine the right-hand side of (1.21). We have from the definitionsof vec operator and Kronecker product
C12A C22A

CplA
cp2A

bl
b2

ClqA C2qA

cpqA

bp

CllA C21A
(CT A)vecB=

Thejth row of this partitioned matrix multiplication can be rearranged


as follows

Clj
c-2j

CljAb1+ C2jAb2+ + C?jAbp = Abl

cpj

bp] cj
= ABcj

66

Linear

Algebra

Inserting this result into the previous equation gives

(CT e A)vecB =

ABCI
ABC2

ABcq

which agrees with the expression for vec(ABC).

Establishing(1.22).Herewelet A e

B e vxq, C e

so that A,C conform and B,D conform. Let


D RQXS
be
the column vectors of matrix C. We know from the rules of matrix
multiplication that the jth (block) column of the product (Ae B)(Ce D)
is given by A e B times the jth (block) column of C D, whichis

AllB

AlnB

CljD

AllB

AmnB

cnjD

Am1B

AlnB Clj
cnj

= (Acj
= (Acj) (BD)
Since this is the jth (block)column of (A B) (C D), the entire matrix
is

(AC2) (BD)

ACI AC2

and the result is established.

Acrl

(BD)

(Acr) (BD)I

67

of Matrices
Functions
5

and B e v xq, from the definiGivenA e


Establishing(1.23).
and cross product we have that
transpose
of
tion
AlnB
AllB A12B
A21B

A2nB

A22B

AmnB
A21B T
A22B T

Am2BT

AlnB T A2nB T

AmnBT

AllB T
A12BT

Establishing(1.24). Apply (1.22)to the followingproduct and obtain


=1
e B-1 ) = (AA-I ) (BB-I ) =
andtherefore

-1

singular values, and rank of the Kronecker product.


Eigenvalues,
Whensolvingmatrix equations, we will want to know about the rank
oftheKroneckerproduct A B. Since rank is closely tied to the singularvalues,and these are closely tied to the eigenvalues, the following
identitiesprove highly useful.
A and B square
nonzero singular values

eig(A B) = eig(A)eig(B)

rank(A B) = rank(A)rank(B)

(1.25)
(1.26)
(1.27)

Givenour previous identities, these three properties are readily estab-

lished.LetA and B be square of order m and n, respectively. Let


(R,v) and (u, w) be eigenpairs of A and B, respectively. We have that

AveBw = (v) (uw) =


gives

e w). Using(1.22)on Av e Bw then

andweconcludethat (nonzero) mn-vector (v w) is an eigenvector of


ASBwithproduct u as the corresponding eigenvalue. This establishes
(125).Forthe nonzero singular values, recall that the nonzero singular
valuesof real, nonsquare matrix A are the nonzero eigenvalues of AAT

68

Linear

Algebra

(and ATA).We then have for (A) and (A) denoting nonzero s'
and eigenvalues, respectively

(AeB) =

(BBT))

which establishes (1.26). Since the number of nonzero singular values


of a matrix is equal to the rank of the matrix, we then also have (1.27).

Properties(1.21)-(1.27)are all that we require for the materialin


this text, but the interested reader may wish to consult Magnusand
Neudecker(1999)for a more detailed discussion of Kroneckerprod.
ucts.

Solving linear matrix equations. We shall find the properties(1.21)(1.27)highly useful when dealing with complex maximum-likelihood
estimation problems in Chapter 4. But to provide here a small illustration of their utility, consider the following linear matrix equationfor
the unknown matrix X
AXB = C

in which neither A nor B is invertible. The equations are linear in X, so


they should be solvable as some form of linear algebra problem. But
since we cannot operate with A-I from the left, nor B-l from the right,
it seems difficult to isolate X and solve for it. This is an examplewhere
the linear equations are simply not packaged in a convenient form for
solving them. But if we apply the vec operator and use (1.21)we have
(BT A)vecX = vecC
Note that this is now a standard linear algebra problem for the unknown
vector vecX. We can examine the rank, and linear independence of the
rows and columns of BTe A, to determine the existence and unique
ness of the solution vecX, and whether we should solve a least-squares
problem or minimum-normproblem. After solution, the vecXcolumn
vector can then be restacked into its original matrix form X if desired.
Exercise 1.77 provides further discussion of solving AXB = C.
As a final example, in Chapter 2 we will derive the matrix Lyapun0V
equation, which tells us about the stability of a linear dynamic system,

ATS + SA = -Q

1.6

69

Exercises

A and Q are given, and S is the unknown. One way


matrices
inwhich
solving the matrix Lyapunov equation is to apply the vec
about
think
to
obtain
operatorto

= vecQ

algebra problem for vecS. Although this


linear
this
solve
andthen
characterizing the solution, given the special
for
useful
is
approach
equation, more efficient numerical solution
Lyapunov
the
of
structure
coded in standard software. See the function
methodsare available and
for example. Exercise 1.78 asks you
MATLAB,
Iyap(A' ,Q) in Octave or
example using the Kronecker product approach
to solvea numerical
result to the 1yap function.
andcompareyour

1.6 Exercises
in R2
Exercise1.1: Inner product and angle
Considerthe two vectors a,b e R2

shovmin Figure 1.7 and let denote


the angle between them. Show the
usualinner product and norm formu-

las

llall =

(a,b) = E aibi
i

satisfythe following relationship with


the angle

cos e =

Il all

b2

al

Il b Il

Thisrelationship allows us to generalizethe concept of an angle between


vectorsto any inner product space.

bl

Figure 1.7: Two vectors in R 2

and the angle between them.

Exercise1.2: Scalingand vector norm

Considerthe vector x e R2,whose elements are the temperature (in K) and pressure
300
(inPa)in a reactor. A typical value of x would be
1.0 x 106

300
x 106 be two measurements of the state
1.2
=
1.0 x 106
Ofthe reactor. Use the Euclidean norm to calculate the error 1b' xll for the
twovalues of y. Do you think that the calculated errors give a meaningful idea

(a) Let Yl =

310

of the difference between

and Y2?

70

Linear

Algebra

n
=

12 Wi

where Xi is the ith component of the vector x. Show that this


formula
if and only if Wi > O for all i. This is known as a weighted norm, is a norm
Withweight

(c) Propose a weight vector that is appropriate for the example in part
(a). justify

Exercise1.3:linear independence
Verify that the followingsets are u.
= (2,0, 11T,e3 = [1, 1, Il T.
(a) el = [O,

=
= 11 + 2i,1 (b) el = + i, 1 Hint: boress alel + a2e2 + a3e3 O. Taking inner products of this equation
with
el, Q, and e3 yields three linear equations for the O(i.

Exercise 1.4: Gram-Schmidtprocedure


Using Gram-Schmidt orthogonalization, obtain ON sets from the LI sets given in the
previous problem.

Exercise 1.5: Failure of Gram-Schmidt


The Gram-Schmidtprocess will fail if the initial set of vectors is not Ll.
(a) Construct a set of three vectors in R3 that are not LI and apply Gram-Schmidt,
pinpointing where it fails.

(b) Similarly,in an n-dimensional space, no more than n LI vectors can be found.


Construct a set of four vectors in R3 and use Gram-Schmidt to show that if three
of the vectors are U, then a fourth orthogonal vector cannot be found.

Exercise1.6: linear independenceand expressing one vector as a linear


combinationof others
We often hear that a set of vectors is linearly independent if none of the vectors in the
set can be expressed as a linear combination of the remaining vectors. Althoughthe
statement is correct, as a definition of linear independence, this idea is a bit unwieldy

because we do not know a priori which vector(s) in a linearly dependent set is(are)
expressible as a linear combination of the others.

The following statement is a more precise variation on this theme. Giventhe vectors
k,Xi e Rn are linearly independent and the vectors {Xi,a} are linearly

dependent, show that a can be expressed as a linear combination of the Xi.


stateUsing the definition of linear independence provided in the text, prove this
ment.

71

Exercises

Someproperties of subspaces
1.7:
Exercise
properties
following
Establishthe

element is an element of every subspace.


(a) Thezero
any set of j elements of Rti is a subspace (of the linear space Pi).
(b) The span of
zero subspace, a subspace cannot have a finite, largest element.
(c) Exceptfor the
the zero subspace, is unbounded.
Hence,every subspace, except

subspaces in 2-D and 3-1)


Exercise1.8: Some
(a) Considerthe

line in R2

1
1

Drawa sketch of S. Show that S is a subspace.

(b) Nextconsiderthe shifted

line

s' = yly= 1

1
1

Drawa sketch of S'. Show that S' is not a subspace.

(c) Describeall of the subspaces of R3.

Exercise1.9:Permutation matrices
(a) Giventhe matrix

100
P=OOI
010

showthat PA interchanges the second and third rows of A for any 3 x 3 matrix.
Whatdoes AP do?

(b)A generalpermutation matrix involving p row exchanges can be written P =

PpPp-1... P2P1where Pi corresponds to a simple row exchange as above. Show


that P is orthogonal.

Exercise1.10: Special matrices


Consideroperations on vectors in R2.

(a) Constructa matrix operator that multiplies the horizontal (Xl) component of
a vectorby 2, but leaves its vertical component (x2) unchanged.
(b) Constructa matrix operator B that rotates a vector counterclockwise by an angle
of 2TT/3.

(c) Computeand draw ABx and BAX for x = 1


2

(d) Showthat B3 = I. With drawings, show how this makes geometric sense.

72

Linear

Algebra

Exercise1.11:Integal operators and matrices

k(t,s)x(s)ds
of the operator.
where k(t,s) is a known function called the KERNEL
(a) Show that K is a linear operator.
(b) Read Section 2.4.1. Use the usual (i.e., unweighted) inner product on
the interval

(c) An integral can be approximated as a sum, so the above integral


operatorcan
be approximatedlike this:

Ka {x(iAt)) = E

= I,N

where At = I/N. Showhow this approximation can be rewritten as a standard


matrix-vectorproduct. What is the matrix approximation to the integraloperator?

Exercise 1.12: Projections and matrices


Givena unit vector n, use index notation (and the summation convention)to simplify
the following expressions:

(a) (nn T)(nn T)u for any vector u. Recalling that nn T is the projectionoperator,
what is the geometricinterpretation of this result?
(b) (I 2nn T)2. What is the geometric interpretation of this result?

Exercise 1.13: Use the source, Luke

Someonein your research group wrote a computer program that takes an n-vector
input, x e VI and returns an m-vector output, y e Rm.
All we know about the function f is that it is linear.
authorhas
The code was compiled and now the source code has been lost; the codefor
graduated and won't respond to our email. We need to create the sourceno longer
function f so we can compile it for our newly purchased hardware, which
do is execute
runs the old compiled code. To help us accomplish this task, all we can
the function on the old hardware.
write the source
can
you
before
make
to
(a) Howmany function calls do you need
code for this function?
functionf from
linear
the
construct
you
(b) Whatinputs do you choose, and how do
the resulting outputs?

|6

73

Exercises

matters worse, your advisor has a hot new project idea that requires
(c)Tomake a program to evaluate the inverse of this linear function,
youto write

this is possible. How do you respond? Give a complete


andhas asked you if
and uniqueness of x given y.
answerabout the edstence

range and null space of a matrix are subspaces


Exercise1.14: The
show that the sets R(A) and N(A) satisfy the properties of a subA e Rmxn,
Given

therefore R (A) and N(A) are subspaces.


space,(1.1),and

1.15:Null space of A is orthogonal to range of AT


Exercise
mxn show thatN(A) L R(A T ).
Given A e R

Exercise1.16:Rank of a dyad

n x n dyad uvT?
Whatis the rank of the

1.17:Partitionedmatrix inversion formula


Exercise
(a) Letthe matrix A be partitioned

as

and E are square.


in whichB,C,D, E are suitably dimensioned matrices and B
elimination.
Derivea formula for A-I in terms of B, C,D, E by block Gaussian
Checkyour answer with the CRCHandbook formula (Selby, 1973).

both B-l and


(b)Whatif B-l does not exist? What if E-l does not exist? What if
E-l do not exist?

1.18:The four fundamental subspaces


Exercise

Findbasesfor the four fundamental subspaces associated with the following matrices

1100
0101

Exercise
1.19:Zero is orthogonal to many vectors
Provethat if

x z = y z

for allz e

then

or,equivalently,prove that if

for all v e Rtt


then

Linear

Algebrq

16
12

10

0.5

1.5

Figure 1.8: Experimental measurements of variable y versus x.

Exercise 1.20: Existence and uniqueness


Find matrices A for which the number of solutions to Ax = b is
(a) O or 1, depending on b.
(b) 00,independent of b.
(c) Oor 00,depending on b.

(d) 1, independent of b.

Exercise1.21:Fitting and overfitting functions with least squares


One of your friends has been spending endless hours in the laboratory collectingdata
on some obscure process, and now wants to find a function to describethe variable
y's dependenceon the independent variable, x.

x
y

0.00 0.22 0.44


2.36 2.49 2.67

0.67
3.82

0.89
4.87

1.11
6.28

1.33
8.23

1.56
9.47

1.78 2.00
12.01 15.26

Not having a good theory to determine the form of this expression, your friendhas
chosen a polynomialto fit the data.
(a) Consider the polynomial model

y(x) = ao + al x + a2X2 + ... + anxn

the
Expressthe normal equations for finding the coefficients ai that minimize
sum of squares of errors in y.

75

| .6 Exercises

(b) Using the x- and y-data shown above and plotted in Figure 1.8, solve the leastsquares problem and find the a that minimize
2

in which na is the number of measurements and n is the order of the polynomial.


Do this calculation for all polynomials of order O n 9.

(c) For each n, also calculate the least-squares objective

and plot

versus n.

(d) Plot the data along with your fitted polynomial curves for each value of n. In
particular, plot the data and fits for n = 2 and n = 9 on one plot. Use the range
0.25 x 2.25 to get an idea about how well the models extrapolate.
and the appearance of your plots, what degree poly(e) Basedon the values of
nomialwould you choose to fit these data? Whynot choose n = 9 so that the
polynomialcan pass through every point and = 0?

Exercise1.22: Least-squares estimation of activation energy


Assumeyou have measured a rate constant, k, at several different temperatures, T,
and wish to find the activation energy (dividedby the gas constant), E/R, and the
preexponentialfactor, ko, in the Arrhenius model
(1.28)
k = koe-E/RT
Thedata are shown in Figure 1.9 and listed here.
T(K)
k

300
1.82

325
1.89

350
2.02

375
2.14

400
2.12

425
2.17

450
2.15

475
2.21

500
2.26

(a) Take logarithms of (1.28)and write a model that is linear in the parameters
In(ko) and E/R. Summarize the data and model with the linear algebra problem
Ax = b

in whichx contains the parameters of the least-squares problem


In(ko)

Whatare A and b for this problem?


(b) Find the least-squares fit to the data. What are your least-squares estimates of
In(ko) and E/R?

(c) Is your answer unique? How do you know?

(d) Plot the data and least-squares fit in the original variables k versus T. Do you
have a good fit to the data?

Linear

76

Algebra

2.4

2.2

300

400

350

450

500

Figure 1.9: Measured rate constant at several temperatures.

Exercise123:

and uniqueness of linear equations

Considerthe followingpartitioned A matrix, A e Rtnxn

in which Al e RPXPis of rank p and p < min(m,n).


(a) What are the dimensions of the three zero matrices?
(b) What is the rank of A?
(c) What is the dimension of the null space of A? Compute a basis for the null space
of A.
(d) Repeat for AT.

(e) For what b can you solve Ax = b?

that
(f) Is the solution for these b unique? If not, given one solution Xl, such
AXI = b, specify all solutions.

Exercise 1.24: Reaction rates from production rates


Considerthe followingset of reactions.

02

C02

H2 + 02 =H20

+ 202

C02 + 2H20

co + 2H20

77

1.6 Exercises

(a) Giventhe species list, A = CO" 02 C02 ?H2 AH20 CH4] write out the
stoichiometricmatrix, v, for the reactions relating the four reaction rates to the
six production rates
(1.29)

(b) Howmany of the reactions are linearly independent.


(c) In a laboratory experiment, you measured the production rates for all the species
and found

meas

-11T

Is there a set of reaction rates rex that satisfies (1.29) exactly? If not, how do you
know? If so, find an rex that satisfies Rmeas= VTrex.

(d) If there is an rex, is it unique? If so, how do you know? If not, characterize all
solutions.

Exercise1.25: Least-squares estimation


A colleaguehas modeled the same system as only the following three reactions

co + -02

C02

H2 + -02 =H20

CH4+ 202

+ 2H20

(a) Howmany of these reactions are linearly independent?


(b) In another laboratory experiment, you measured the production rates for all the
species and found
meas

-4.5

Is there a set of reaction rates rex in this second model that satisfies (1.29) exactly? If so, find an rex that satisfies Rmeas= VTrex. If not, how do you know?
(c) If there is not an exact solution, find the least-squares solution, rest. What is the
least-squares objective value?
(d) Is this solution unique? If so, how do you know? If not, characterize all solutions
that achieve this value of the objective function.

Exercise 1.26: Controllability


Considera linear discrete thne system governed by the difference equation
+ 1) = Ax(k) + Bu(k)

(1.30)

in whichx(k), an n-vector, is the state of the system, and u(k), an m-vector, is the
manipulatableinput at time k. The goal of the controller is to choose a sequence of
inputs that force the state to follow some desirable trajectory,
(a) Whatare the dimensions of the A and B matrices?

Linear

78

should redesign the system before

trying to design a controller

Algebra

for it. This


is an

A system is said to be connollableif n input values exist


that can move the system from any initial condition, xo, to any final state
X(n).
Byusing (1.30),show that x(n) can be expressed as
x(n) = Anxo + An-IBu(0) + An-2Bu(1) + + ABu(n 2) + Bu(n
1)
Stack all of the u (k) on top of each other and rewrite this expression in Partitioned.
matrix form,

x(n) = An xo + C

(1.31)

u(n 1)
What is the C matrix and what are its dimensions?
(c) Whatmust be true of the rank of C for a system to be controllable, i.e.,forthere
to be a solution to (1.31)for every x(O) and x(n)?
(d) Considerthe followingtwo systems with 2 states (n = 2) and 1 input (m = 1)
x(k+ 1) =

x(k)+

1 u(k)

x(k+ 1) =

x(k) +

Noticethat the input only directly affects one of the states in both of these
systems. Are either of these two systems controllable? If not, show whichx(n)
cannot be reached with n input moves starting from x (O) 0.

Exercise 1.27: A vector/matrix derivative


Considerthe followingderivative for A, C e Rtnxn x, b e Rn
C =

xx

T b)

Or expressed in component form

Cij= (AxxTb)i i =
dxj

j = 1,...,n

Find an expression for this derivative (C) in terms of A, x, b.

1.6 Exercises

79

1.28: Rank equality with matrix

products

Givenarbitrary B e Rtnxn, and full rank A e Rtnxm


andC e Rnxn establish the
followingtwo facts
rank(AB) rank(B)
rank(BC) = rank(B)
Use these to show

rank(ABC) = rank(B)

Exercise1.29: More matrix products


Findexamplesof 2 by 2 matrices such that
(b) A2 = I,with A a real matrix,
(c) B2 = O,with no zeros in B,
(d) CD = DC,not allowing CD = O.

Exercise 1.30: Progamming

LU decomposition

Writea program to solve Ax = b using LU decomposition.


It
matricesup to n = 10,'read in A and b from data files, and should be able to handle
write the solution x to a
file.Usingthis program, solve the problem where
1

-1

1 -1

Exercise1.31: Normal equations


Writethe linear system of equations whose solution x = (Xl,
2

+ 2XIX2 + 2X3)

)T minimizes

+ X2

Findthe solution x and the corresponding value of P (x).

Exercise1.32: Cholesky decomposition


A symmetric matrix A can be factorized into LDLT where L is lower triangular and D
is diagonal,i.e., only its diagonal elements are nonzero.
(a) Performthis factorization for the matrix

2 -1
-1
2 -1
2

(b) If all the diagonal elements of D are positive, the matrix can be further factorized
into LLTthis is called the CHOLESKY
DECOMPOSITION
of A. Find L for the matrix
of part (a).

80

Linear

Algebra

1.33: A singular matrix


For the system
4

(a) Find the value of q for which elimination fails (i.e., no solution to
If you are thoughtful, you won't need to perform the eliminationAx b exists).
to findout.
(b) For this value of q what happens to the first geometrical interpretation
ofthe

(c) Whathappens to the second (superpositions of column vectors)?


(d) Whatvalue should replace 4 in b to make the problem solvablefor this q?

Exercise 1.34: LUfactorization of nonsquare matrices


(a) Find the LU factorization of

(b) If b = (1, p,q) T, find a necessary and sufficient condition on p and q so that
Ax = b has a solution.

from
(c) Givenvalues of p and q for which a solution exists, will the algorithm
Section1.32 solve it? If not, pinpoint the difficulty.
(d) Find the LU factorization of AT.

(e) Use this factorization to find two LI solutions of ATx = b, where b = (2,5)T.
Since there are fewer equations than unknowns in this case, there are infinitely
this
many solutions, forming a line in R2. Are there any values of b for which
problem has no solution?

Exercise 1.35: An inverse

Herea is an
Under what conditions on u and v does (I auv T) (I + auvT)-l?
arbitrary nonzero scalar.

Exercise 1.36: LUdecomposition

Write the first step of the LUdecomposition process of a matrix A as


In other words, what are a, u, and v so that A21 = O?

A' =

this pair
Write a program that uses the Newton-Raphsonmethod to solve

y (x

=0

(y +

tanx O

Do not reduce the pair of equations to a single equation. With


least one solution.

of equations

findat
program,
this

81

1.6 Exercises
Exercise 1.38; The QR decomposition

In this exercise,we construct the QR decomposition introduced in Section 12.4. Con-

sideran m x n matrix A with columns ai. Observe that if A = BC, with B an m x n


each
matrixand C and n x n matrix, where bi are the columns of B, then we can
columnof A as a linear combination of the columns of B, as follows
Cli
C2i

ai

cni

Theithcolumn of A is a linear combination of all the columns of B, and the coefficients

in the linear combinationare the elements of the ith column of matrix C. This result
willbe helpfulin solving the followingproblem. Let A be an m x n matrix whose
columnsai are linearlyindependent (thus m n). We know that using the GramSchmidtprocedureallows us to construct an ONset of vectors from the ai. Define a
manixQ whosecolumns are these basis vectors, qi, where qi qj = ij.
(a) Expresseach ai in the basis formed by the qi. Hint: because the set of qi are
constructedfrom the set of ai by Gram-Schmidt,al has a component only in
the ql direction, a2 has components only in the qi and q2 directions, etc.
(b) Use the above result to write A = QR, i.e.; find a square matrix R such that each

columnof A is

upper triangular.

in terms of the columns of Q. You should find that R is

Exercise1.39:Orthogonal subspace decomposition


LetS be an r

n dimensionalsubspace of P i with a basis {al, a2,... , ar}. Consider

the subspace SL, the orthogonal complement to S.

(a) Provethat SL has dimension n r. Do not use the fundamental theorem of


linear algebra in this proof because this result is used to prove the fundamental
theorem.

(b) Showthat any vector x e


a e S and b e S L .

can be uniquely expressed as x = a + b in which

Exercise1.40:The QRand thin QRdecompositions

ForA e Rmxn with independent columns we have used in the text what
is sometimes
calledthe "thin" QRwith QI e
and RI e IRnxnsatisfying
A = QIRI

It is possible to "fill out" QI by adding the remaining


m n columns that span Rm. In
thiscaseA = QR and Q e Rmxm is orthonormal, and
R e Rtnxn. In the "thin" QR, QI
is the shape of A and RI is square (of the smaller
dimension
n), and in the full QR, Q
is square (of the larger dimension
m) and R is the shape of A.
(a) Is the "thin" QRunique?

(b) Showhow to construct the QRfrom the

thin QR.Is the full QRunique?

82

Linear

Algebra

1.41:Uniquenessof solutions to least-squares problems

Provethe followingproposition

Proposion 1.19 (Fullrank of ATA). Given matrix A e Rtnxn, the n x n matrix


ATA
has full rank if and only ifA has linearly independent columns.

Note that this proof requires our first use of the fundamental theoremof
linear
algebra. Sincemost undergraduate engineers have limited experience doingproofs,
we
provide a few hints.
1. The Sifand only if" statement requires proof of two statements: (i)ATAhaving
full rank implies A has linearly independent columns and (ii) A having
linearly
independent columns implies ATA has full rank.
2. The statement that S implies T is logically equivalent to the statement thatnot
T implies not S. So one could prove this proposition by showing (ii) and then
showing: (i') A not having linearly independent columns implies that ATAis not
full rank.

3. The fundamentaltheorem of linear algebra is the starting point. It tellsus


(amongother things) that square matrix B has full rank if and only if B has

linearlyindependent rows and columns. Think about what that tells you about
the null space of B and BT. See also Figure 1.1.

Exercise 1.42: A useful decomposition


Let A c nxn, B c pxp, andX e c nxp satisfy
AX = XB

rank(X) = p

Showthat A can be decomposed as

np

Tll

T12

T22

in which eig(T11) = eig(B), and eig(T22) = eig(A) \ eig(B), i.e., the eigenvalues of T22
are the eigenvalues of A that are not eigenvalues of B. Also show that eig(B) g eig(A).
Hint: use the QR decomposition of X.

Exercise 1.43: The Schur decomposition


Provethat the Schur decomposition has the properties stated in Theorem 1.15.
Hint: the result is obviously true for n = 1. Use induction and the result of Exercise
1.42.

Exercise1.44:Norm and matrix rotation


Given the following A matrix

0.46287 0.11526
0.53244 0.34359
invoking [u , s , v)

-0.59540 -0.80343
-0.80343 0.59540

in MATLABor Octave produces

0.78328
0.00000

0.00000
0.12469

-0.89798

-O. 44004

-0.44004

0.89798

1.6 Exercises

83

(a) Whatvector x of unit norm maximizes llAxll? Howlarge is llAxllfor this x?


(b) Whatvector x of unit norm minimizes llAxll? Howlarge is llAxllfor this x?
(c) Whatis the definition of IIAII?What is the value of IIAIIfor this A?

(d) Denotethe columns of v by VI and v2. Draw a sketch of the unit circle traced
by x as it travels from x = VI to x = v2 and the corresponding curve traced by
Ax.

(e) Let's find an A, if one exists, that rotates all x e R2 counterclockwiseby


radians. What do you choose for the singular values Iand 02? ChooseVI = el
and v2 = for the V matrix in which ei, i = 1, 2 is the ith unit vector. What
do you want ul and u2 to be for this rotation by e radians? Form the product
USVT and determine the A matrix that performs this rotation.

Exercise1.45:Lineardifference equation model


Considerthe following discrete-time model
+ 1) = Ax(k)

in which

0.798
-0.715

0.051
1.088

1
xo = 0

(a) Computethe eigenvalues and singular values of A. See the Octave or MATLAB
commandseig and svd. Are the magnitudes of the eigenvaluesof A less than
one? Are the singular values less than one?
(b) Whatis the steady state of this system? Is the steady state asymptotically stable?
(c) Makea two-dimensional plot of the two components of x(k) (phase portrait) as

youincreasek from k = 0 to k = 200, starting from the x(0) given above. Is


x(l) bigger than x(O)? Whyor why not?

(d) Whenthe largest eigenvalue of A is less than one but the largest singular value
of A is greater than one, what happens to the evolution of x(k)?
(e) Nowplot the values of x for 50 points uniformly distributed on a unit circle and

the correspondingAx for these points. For the SVDcorrespondingto Octave


and MATLABconvention

A = USV*

mark ul, 112,VI, v2, Sl, and s2 on your plot. Figure 1.10 gives you an idea of the

appearanceof the set of points for x and Ax to make sure you are on track.

Exercise1.46:Is the SVDtoo good to be aque?

GivenA e
with rank(A) = r and the SVDof A = UEV*,if we partition the first
r columnsof U and V and call them UI and VI we have

84

Linear

-0.5

0.5

1.5

Figure 1.10: Plot of Ax as x moves around a unit circle.

and A =

Then to solve (possibly in the least-squares sense) Ax

which motivates the pseudoinverse formula


= VIG I UI*
and the "solution"

x = A+b
If we form the residual for this "solution"

= AA+b-b
b-b

= UI

= Imb-b

1m

b we have

Algebra

Exercises

85

whichseems to show that r = 0. We know that we cannot solve Ax = b for every


something must have gone s,smong.What is wrong with this
b and every A matrix, so0?
=
argumentleading to r

Exercise1.47: SVDand worst-case analysis

consider the process depicted in Figure 1.11 in which u is a manipulatable input and
d is a disturbance. At steady state, the effects of these two variables combine at the
measurementy in a linear relationship

The steady-state goal of the control system is to minimize the effect of d at the measurementy by adjusting u.
For this problem we have 3 inputs, u e R3, 2 disturbances, d e R2, and 2 measurements,y e R2, and G and D are matrices of appropriate dimensions. Wehave the
followingtwo singular value decompositions available

[X] [E z T

075 -0.66

-0.66

-0.98
-0.19

0.75

-0.19
0.98

1.57
0.00
0.71
0.00

0.00
0.21
0.00
0.13

-0.89

0.37

-0.085

0.46

045 -0.81

094 -0.33

-0.33

0.94

(a) Can you exactly cancel the effect of d on y using u for all d? Why or why not?

(b) In terms of U,S, VI,X, E, Z, what input u minimizes the effect of d on y? In


other words, if you decide the answer is linear

u = Kd
What is K in terms of U, S, VI , X, E, Z? Give the symbolic and numerical results.

(c) Whatis the worst d of unit norm, i.e., what d requires the largest response in u?
What is the response u to this worst d?

Exercise1.48: Worst-case disturbance


Considerthe system depicted in Figure 1.11 in which we can manipulate an input u e
R2 to cancel the effect of a disturbance d e R2 on an output y e R2 of interest. The
steady-state relationship between the variables is modeled as a linear relationship

and y, u, d are in deviation variables from the steady state at which the system was
linearized. Experimentaltests on the system have produced the followingmodel parameters

2.857 3.125
0.991 2.134

If we have measurements of the disturbance d available, we would like to find the input

u that exactlycancels d's effect on y, and we would like to know ahead of time what

is the worst-case disturbance that can hit the system.

86

Linear

Algebra

u
Figure 1.11: Manipulatedinput u and disturbance d combine to affect output y,

(a) Find the u that cancels d's effect on y.


(b) For d on the unit circle,plot the corresponding value of u.
(c) What d of norm one requires the largest control action u? What d of normone
requires the smallest control action u? Give the exact values of dmaxand dmin,
and the corresponding Umaxand Umin.

(d) Assume the input is constrained to be in the box

-1
-1

sus

1
1

(1.32)

vmat is the size of the disturbance so that all disturbances less than this size

words
can be rejected by the input without violating these constraints? In other
find the largest scalar a such that

if lldll

then u satisfies (1.32)

Use your plot from the previous part to estimate a.

Exercise 1.49:Determinant, trace, and eigenvalues

facts
Use the Schur decomposition of matrix A e cnxn to prove the following
n

(1.33)

(1.34)

detA=
trA= E 'Ni

87

1.6 Exercises
e

inwhich

= 1,2,

,n.

1.50:Repeated eigenvalues
matrix
Theself-adjoint

011
1

Find the eigenvalues of the system and show that despite


hasa repeated eigenvalue.
therepeatedeigenvaluethe system has a complete orthogonal set of eigenvectors.

Exercise1.51: More repeated eigenvalues

matrix
Thenon-self-adjoint

0
013
002
000 21
0

alsohas repeated eigenvalues.


(a) Find the eigenvalues and eigenvectors (there are only two) of A.

(b) Denotethe eigenvector corresponding to the repeated eigenvalue as VI and the

v2 and v3 can be
EIGENVECTORS
other eigenvectoras v4. The GENERALIZED
found by solving

where RI is the repeated eigenvalue, Show that {VI,... , v4} is necessarily an LI


set.

(c) Determinethe set, construct the transformation matrix M, and show that J =
M-I AM is indeed in Jordan form.

Exercise1.52:Solutionto a singular linear system


Considera square matrix A that has a complete set of LI eigenvectors and a single zero
eigenvalue.

(a) Writethe solution to Ax

Oin terms of the eigenvectorsof A.

(b) In the problem Ax = b, use the eigenvectors to determine necessary and sufficient conditions on b for existence of a solution.

Exercise1.53:Example of a singular problem


Considerthe problem Ax = b, where
A =

123
123

123

(a) Perform LUdecomposition on this matrix. Give L and U.

88

Linear

Algebra

(b) Find two linearly independent vectors in the nullspace of


A.
(c) Use the LUdecomposition to find a solution when
4
4

(d) This solution is not unique. Find another.


(e) Find the eigenvalues and eigenvectors of A. How are these
related to youran.

Exercise 1.54: Linearly independent eigenvectors


Showthat if A has n distinct eigenvalues, its eigenvectors are linearly
independent
This result is required to ensure the existence of Q -1 in A = QAQ-I in (1.11).
Hint: set

aiqi = O and multiply by (A MI) (A 21) (A n-11)

to

establish
that an = O. With an = Owhat can you do next to show that an-I = 0? Continue
this
process.

Exercise 1.55: General results for eigenvalue problems


Prove the followingstatements:
(a) If A is nonsingular and has eigenvalues Ri, the eigenvalues of A-I are 1/i.
(b) LetSbeamatrixwhose columns forma set of linearlyindependentbutnonorthog.
onal basis vectors: the mth column is the vector um. Find a matrixS' whose
columns u'n satisfy uku'n = mn. A pair of basis sets whosevectorssatisfy
this condition are said to be BIORTHOGONAL.

(c) Assume that A has a complete set of eigenvectors. Showthat the eigenvectors
of A and AT are biorthogonal.

(d) Showthat if the eigenvectors of A are orthogonal, then AAT= ATA.Suchma-

trices are called NORMAL(The converse is also true (Horn and Johnson,1985).)

eigenvectors
(e) Show that the eigenvalues of A = ATare imaginary and that its
are orthogonal.

Exercise 1.56: Eigenvalues of a dyad

andeigenLet u and v be unit vectors in VI, with uTv * 0. What are the eigenvalues
vectors of uv T?

Exercise 1.57: The power method for finding largest eigenvalues


Consider the matrix

A=

001
001
111

Xi+l AXi.
(a) Let xo = (1,O,
and consider the iteration procedureresult.
several steps of this procedure by hand and observe the

Perform

1.6 Exercises

89

(b) Canyou understand what is happeninghere

basis? In particular, show that for a self-adjoint by writingx in the eigenvector


this iteration procedure yields the eigenvalue matrix with distinct eigenvalues,
of largest absolute value and the
correspondingeigenvector.

(c) Write an Octave or MATLAB


function to perform this
process on a real symmetric
matrix, outputting the largest

eigenvalue (to within


and the corresponding eigenvector, scaled so that a specified tolerance) of A
its largest componentis 1.
present results for a test case. This is the POWER
METHOD.
It is much faster than
findingall of the eigenvalues and can be generalized
to other types of matrices.
Google's "PageRank" algorithm is built around this method.

Exercise1.58:Markov chain models


Imaginethat there are three Idnds of weather: sunny, rainy, and snowy. Thus a vector
wo e R3 defines today's weather: wo = [1, 0, OITis sunny, wo = [O,1, OJTis rainy, and
wo = [0,0, 1] is snowy. Imagine that tomorrow's weather WI is determined only by
today's and more generally, the weather on day n + 1 is determined by the weather on
dayn. A probabilistic model for the weather then takes the form
Wrt+l = Twn

whereT is called a transition matrix and the elements of wn are the probabilitiesof
havinga certain type of weather on that day. For example, if u'5 = [0.2, 0.1, 0.7]T, then
the probability of snow five days from now is 70%.The sequence of probability vectors
on subsequent days, {wo, WI , u.'2,.. .} is called a MARKOV
CHAIN.Because w is a vector

of probabilities,its elements must sum to one, i.e.,


wn,i = 1 for all n.
wn,i = 1, what condition must the elements of T satisfy such
(a) Giventhat
Wn+l,i is also 1?
that
(b) Assume that T is a constant matrix, i.e., it is independent of n. What conditions
of the eigenvalues of T must hold so that the Markovchain will reach a constant
state wooas n 00?How is woorelated to the eigenvectors of T?

Exercise1.59: Real Jordan form for a real matrix with complexconjugate


eigenvalues

Fora 2 x 2 real matrix A with a complex conjugate pair of eigenvalues


= with eigenvectors VI and 1,'2.
(a) Derivethe result that VI = i'2.

+ iw,

(b) Writethe general solution to = Ax in terms of the real and imaginaryparts of


solution are
VI and sines and cosines, so that the only complex numbers in the
the arbitrary constants.
(c) For the specific matrix

-2 -2
2

the columns of S are the


show that the similarity transformation S-I AS, where
real and imaginary parts of VI, has the form

s -I AS =

90

Linear

Algebra

This result can be generalized, showing how a real matrix

Exercise 1.60: Solving a boundary-value problem by


eigenvalue
Consider the reaction

occurring in a membrane. At steady state the appropriate


reaction-diffusion
d2CA

DA
dx2
d2CB

equations

+ k-1CB= O

+ klCA k1CB
k2CB+ k2Cc= 0

d2cc

Dc dx2 + k2CB k-2Cc = O

where the ki,i = (1,2) are rate constants and the Dj,j = (A,B,
C) are the species
diffusivities.
The boundary conditions are

CA= 1
dCA

dCB

dx = dx

atx=l
dcc

dx

Convert this set of second-order equations into a set of first-order


differential
tions. Write a MATLAB
or Octave code to find the solution to this problemin equaterms
of eigenvalues and eigenvectors of the relevant matrix for a given
set of parameters.
Have the program plot the concentrations as functions of position.
Showresultsfor
parameter values DA = DB = Dc = 20, kl = k2 = 10, k-l = k-2 = 0.1,
and also for the
same rate constants but with the diffusivities set to 0.05.

Exercise 1.61:Nullspaces of nonsquare matrices

Consider a nonsquare m x n matrix A. Showthat ATAis symmetric positivesemidefinite. If A were square we could determine its nullspace from the eigenvectorscorresponding to zero eigenvalues. How can we determine the nullspace of a nonsquare
matrix A? What about the nullspace of AT?

Exercise 1.62:Stabilityof an iteration


Consider the iteration procedure x(i + 1) = Ax(i), where A is diagonalizable.
(a) What conditions must the eigenvalues of A satisfy so that x(i) 0 as i 00?

(b) What conditions must the eigenvalues satisfy for this iteration to convergeto a
steady state, i.e., so that x(i) x(i + 1) as i 00?

Exercise 1.63: Cayley-Hamiltontheorem

Suppose that A is an n x n diagonalizable matrix with characteristic equation


+ ao = 0
det(A RI) =
+ an -1 NL-I +

91

Exercises

(a) Show

that

An

anIAn I

=O

A satisfies its own characteristic equation; it


Thisresult shows that
the Cayley-Hamilton

is known as

theorem.

(b) Let

combinations of A and I.
Usethe theorem to express A2, A3, and A-I as linear

least-squares problem
Exercise1.64:Solvingthe nonunique

least-squares solution to Ax = b is unique if and only if


Wehaveestablishedthat the
columns. Let's treat the case in which the columns are not
Ahaslinearlyindependent
solution is not unique. Consider again the
linearlyindependent and the least-squares
SVDfor real-valued A

by
(a) Showthat all solutions to the least-squares problem are given
Xls = VIE-I UTb +
in which is an arbitrary vector.

(b) Showthat the unique, minimum-norm solution to the least-squares problem is


givenby
Xls = VI-1UTb
Thisminimum-norm solution is the one returned by many standard linear algebra packages. For example, this is the solution returned by Octave and MATLAB
wheninvokingthe shorthand command x = A \ b.

Exercise1.65:Propagatingzeros in triangular matrices


Whenmultiplyingtwo partitioned (upper) triangular matrices, if the first one has k
leadingcolumnsof zeros, and the second one has a Opxp matrix on the second element
ofthe diagonal,show that the product is a triangular matrix with k + p leading columns
of zeros. In pictures

0 0 73 r
inwhichTi,i = I
4 are arbitrary triangular matrices, T5is triangular, and * representsarbitrary(full)matrices. This result is useful in proving the Cayley-Hamilton
theoremin the next exercise.

92

Linear

Algebra

Exercise1.66:Cayley-Hamiltontheorem holds for all matrices

RevisitExercise1.63 and establish that all matrices A e cnxn satisfy their


teristic equation. We are removing the assumption that A is
Cayley-Hamiltontheorem so that it holds also for defective matrices.
Hint: use the Schur form to represent A and the result of Exercise 1.65.

k.

Exercise 1.67: Small matrix approximation


For x a scalar, consider the Taylor series for 1 | (1 + x)
1

= 1 x + x 2 x 3 +

which converges for IXI < 1.


(a) Using this scalar Taylor series, establish the analogous series for matrixX
R'txn
(1 +X) -I =1-X+X 2 -X 3 +
You may assume the eigenvalues of X are unique. For what matrix X doesthis
series converge?

(b) What is the corresponding series for


in which R e Rnxn is a full-rank matrix. What conditions on X and R are required

for the series to converge?

Exercise 1.68: Matrixexponential, determinant and trace


Use the Schur decomposition of matrix A e c nxn to prove the followingfact
deteA = etr(A)

Exercise 1.69: Logarithm of a matrix


If A e cnxn is nonsingular,there exists a B e cnxn such that
and B is known as the logarithm of A
B = lnA
If A is positive definite, B can be uniquely defined (the principal branch of the logarithm).
Giventhis definition of the logarithm, if A e cnxn is nonsingular, showthat
(1.35)

= etr(ln)

Exercise 1.70: Some differential equations, sines, cosines, and exponentials


the
(a) Solve the following vector, second-order ordinary differential equation with

given initial conditions for y e R2

d2y

dt

dt

Use the solution of the scalar version of this differential equation as

your guide

93

| Exercises

always reduce a high-order differential equation to a set of first-order


(b) Wecan
x = dy/dt and let
differentialequations. Define

equation can be written as a single first-order differenand show that the above
tial equation
dz

dt

appropriate initial conditions z(O)? What is


with z e R4. What are B and the
the solution to this problem?

of the two y components versus time for


(c) plot,on a single graph, the trajectories
the given initial conditions.

of (b), even though the


(d) Showthat the result of (a) is the same as the result
functionsexp and cos are different.

Exercise1.71:Boundingthe matrix exponential

validity of the bound in (1.18)


Giventhe bound for IleAtl in (1.17), establish the
Osuch that for all
Hints:first, for any k > Oand E > 0, show that there exists a c >
cet t k

that for
Usethis result to show that for any E > 0, N e c nxn , there exists c > Osuch
all t 20

IINtllk
k!

Exercise1.72:Strictly convex quadratic function and positive curvature


Considerthe quadratic function
T
f(x) = (1/2)x T Ax + b x + c

(a) Showthat f G) is strictly convex if and only if A > 0.


(b) For the quadratic function, show that if a minimizer of f G) exists, it is unique if
and only if A > O. The text shows the "if" part for any strictly convex function.
Soyou are required to show the "only if" part with the additional restriction that
f G) is quadratic.

(c) Showthat f G) is convex if and only if

0.

Exercise1.73:Concavefunctions and maximization


AfUnctionf() is defined to be (STRICTLY)
(concave downward) if fG) is
CONCAVE
(strictly)convex(Rockafellar and Wets, 1998, p. 39). Show that a solution to maxx f (x)
is unique if f ( ) is
strictly concave.

94

Linear

Algebra

Exercise1.74: Solutionsto minmax and maxmin problems

Consider again the quadratic function f(x) = (1/2)xTAx and the two games given
in
to the A matrix
(1.19). Confirm that Figure 1.6 (c) corresponds

= x2 = 0 is the unique solution to both games in (1.19).Hint:


(a) Showthat
With
the outer variable fixed, solve the inner optimization problem and note that its
solution exists and is unique. Then substitute the solution for the inner problem,
solve the outer optimization problem, and note that its solution also existsand
is unique.

(b) Showthat neither of the followingproblems has a solution


maxminf(x)
X2

Xl

minmaxf(x)

in which we have interchanged the goals of the two players. So obviouslythe


goals of the players matter a great deal in the existence of solutions to the game.

Exercise1.75:Gameswith nonunique solutions and different solutionsets


Sketchthe contours for f (x) = (1/2)x Tx with the following A matrix

What are the eigenvalues of A?


Showthat
= x2 = Ois still a solution to both games in (1.19),but that it is not
unique. Find the complete solution sets for both games in (1.19). Establish that the
solution sets are not the same for the two games.

Exercise 1.76: Who plays first?


When the solutions to all optimizations exist, show that

This inequality verifies that the player who goes first, i.e., the inner optimizer, has the

advantagein this noncooperativegame. Note that the function f G) is arbitrary,so

long as the indicated optimizations all have solutions.

Exercise 1.77: Solvinglinear matrix equations


Consider the linear matrix equation
(1.36)

AXB = C

in whichA

vnxn X

Rnxp,B e vxq, and C e

we consider

fixed

Of
matrices and X is the unknown matrix. The number of equations is the number
the
elements in C. The number of unknowns is the number of elements of X. Taking
vec of both sides gives
(1.37)
(B' e A)vecX = vecC
We wish to explore how to solve this equation for vecX.

95

Exercises
1.6

to exist for all vecC, and be unique, we require that (B' e A) has
(a) Forthe solution
rows and columns, i.e., it is square and full rank. Using the

linearlyindependent
square and full
rankresult (1.27)show that this is equivalent to A and B being

rank.

(b)Forthis

case show that the solution

vecX = (BT e A) -l vecC

is equivalentto that obtained by multiplying (1.36)by A-I on the left and B-l

on the right,

X = A-I CB-I

(c)If we have more equations than unknowns, we can solve (1.37) for vecX as a
least-squaresproblem. The least-squares solution is unique if and only if BT A
haslinearlyindependent columns. Again, use the rank result to show that this
is equivalentto: (i) A has linearly independent columns, and (ii) B has linearly
independentrows.

(d) Weknowthat A has linearly independent columns if and only if ATA has full
rank,and B has linearly independent rows if and only if BBThas full rank (see
Proposition1.19in Exercise 1.41). In this case, show that the least-squares solution of (1.37)

VeCXls = (B T e A) t vecC

is equivalentto that obtained by multiplying (1.36)by At on the left and Bt on


the right,

Xls = A t CBt

Notethat the superscript t denotes the Moore-Penrosepseudoinverse discussed


in Section 1.3.7.

Exercise
1.78:Solvingthe matrix Lyapunov equation
Write
a functionS = your 1yap(A, Q) using the Kronecker product to solve the matrix

Lyapunovequation

A TS + SA = -Q

Testyourfunctionwith some A with negative eigenvalues and positive definite Q by

comparing
to the function 1yap in Octave or MATLAB.

Bibliography

Phenomena.
John
Lightfoot. Transport
N.
and E.
Stewart,
E.
edition, 2002.
W.
second
B.Bird,
York,
New
The Johns
Matrix Computations.
&sons,

van Loan.
F.
edition, 1996.
C.
and
Maryland, third
Golub
Baltimore,
G.H.
Press,

University

N.J. Higham.

Functions

phia, 2008.

and
R.A. Horn

ofMatrices:

Theory and Computation.

Matrix Analysis.
Johnson.
C. R.

1985.

C.C.Lin

Mathematics
and L.A. segel.

Neudecker.
J. R.Magnusand H.

University press,

to Deterministic

Applied

Macmillan, New York,


Sciences.
Natural

Cambridge

SIAM, Philadel.

Problems in the

1974.

Matrix Differential

Calculus

with Applications

New York, 1999.


andEconometrics.John Wiley,
inStatistics

games. Ann. Math, 54:286-295, 1951.


J.Nash.Noncooperative
W.H.Press,S.A. Teukolsky, W. T. Vetterling, and B. T. Flannery. Numerical
Recipesin C: TheArt of Scientific Computing. Cambridge University Press,
Cambridge, 1992.
R.T. Rockafellar and R. J.-B. Wets. Variational

Analysis.

Springer-Verlag,

1998.

S.M.Selby.CRCStandardMathematical
Tables. CRC Press, twenty-first edition,
1973.

G.Strang. Linear
Algebra
second edition, 1980.

and its Applications.

Academic

L,N.Trefethen
and D. Bau Ill.
Numerical Linear Algebra.
and Applied
Mathematics, 1997.
C.F.VanLoan.
The sensitivity
of the matrix
14.971-981,
exponential.
1977.

J. von

Neumann and

O.

Press, New York,

Society for Industrial

SIAM J. Numer. Anal,

Morgenstern.
Princeton
Theory ofGames
University
and Economic Behavior
Press, Princeton
and oxford, 1944.

ordinary Differential Equations

2.1 Introduction
Differentialequations arise in all areas of chemical engineering. In this
chapter we consider

ORDINARY
differential equations (ODEs),that is,

equationSthat have only one independent variable. For example, for reactionSin a stirred-tank reactor the independent variable is time, while
in a simple steady-state model of a plug-flow reactor, the independent
variableis position along the reactor. Typically,ODEsappear in one of
two forms

dx
dt

(2.1)

or

dny + anI(X) dn-ly


dxn-l

dy +

dx

y e R

(2.2)

Wehaveintentionallywritten the two forms in different notation, as


the first form typically (but not always) appears when the independent
variableis time, and the second form often appears when the independent variable is spatial position. These two forms usually have different

boundaryconditions.Whent is the independentvariable,we normally


knowthe conditionsat t = 0 (e.g.,initial reactant concentration)and
must solvefor the behavior for all t > 0. This is called an INITIALVALUE
PROBLEM
(IVP). In a transport problem, on the other hand, we
knowthe temperature, for example,at the boundaries and must find it

in the interior. This is a BOUNDARY-VALUE


PROBLEM
(BVP).
97

Ordinary Differential Equations


98

First-order

Linear

Systems
for Linear

Differential Equations

principle
Superposition
can be Written
equation
2.2.1
differential
linear
Anarbitrary
Lu U
= d/dt
operator (e.g., L and - A, where
g is a
differential be determined,
linear
to
L is a
the solution the following general properties of
is
u
matrix),
A is a section1.2 introduced in terms of L
function.
we now
which
linear operators,
Lu + Lv
L(u + v) =
L(cxu) = cx(Lu)

the issue of boundary conditions, the


moment
for the
properties follow directly from linearity
LeoingasideSUPERPOSITION
two
are both solutions
Letg = 0. If ul and
problem.
1. Homogeneous cxul+ u2 is also a solution, for any scalars
to Lu = 0, then
and .

Let ul be a solution to Lu = gl and


problem.
Inhomogeneous
2.
aul + u2 is a solution to
be a solutionto Lu = u. Then
Lu = cxgl + U2.

Withregardto boundaryconditions,linearity also implies the following.

3. Letul be a solutionto Lu =

with boundary condition Bu = hi

on a particular boundary, where B is an appropriate operator, e.g.,

a constantfor a DIRICHLET
boundary condition, a first derivative
d/dx for a NEUMANN
boundary condition, or a combination B =

y + (5d/dx for a ROBIN


boundary condition. Let
solve Lu =
withboundaryconditionBu = 112.Then aul + u2 satisfies
Lu =
+ U2withboundarycondition Bu = ahl + h2.
These simple results are very powerful

and wll be implicitly and explicitlyused throughoutthe book,


as they allow complex solutions to
be constructedas sums (or
integrals) of simple ones.

2.2

First-order Linear Systems

99

Homogeneous Linear Systems with Constant


Coefficients
General Results for the Initial-ValueProblem
where t denotes time. The
function f is often called a
consider(2.1),
for
each
point x in the PHASESPACE
FIELD;
VECTOR
or STATESPACE
f(x) defines a vector giving the rate
system,
of change of x at
of the
system
is
called
The
AUTONOMOUS
if f is not an explicit
that point.
trajectory
The
x(t) traces out a curve in the state
functionof t.
space,
initial condition x(0) = xo.
the
from
starting
The most general linear first-order system can be written

dx
dt

(2.3)

section we further narrow the focus and consider


only
In the present
autonomous,
homogeneous
system
thelinear,
AX
X Rn A e Rnxn
(2.4)
whereA is a constant matrix. Note that many dynamicsproblems are
posedas second-order problems: if x is a position variable then Newton's second law takes the form = F(x). Letting = x,
= R, we
recovera first-order system

= F(ul)
Moregenerally, a single high-order differential equation can always be
writtenas a system of first-order equations.
UnlessA is diagonal, all of the individual scalar equations in the
system(2.4)are coupled. The only practical way to find a solution to
the system is to try to decouple it. But we already know how to do
thisweuse the eigenvector decomposition A = MJM-I , where J is
theJordan form for A (Section 1.4). Letting y = M-I x be the solution
vectorin the eigenvector coordinate system, we write

=Jy
If A can be completely diagonalized, then J = A = diag(1,112,... , An)
and the equations in the y coordinates are completely decoupled. The
solutionis
Yi(t) = e ttCi
or

Y = eAtc

Ordinary Differentia/ Equations


100

an initial-valUe problem
constants. For

of arbitrary xo, c = y(0) = M-lxo. Recallfrom


vector
wherec is a a knownvector called the MATRIx
Of
is
is
x(0)
eAt
where
matrix
that the
1.5
matrix A as
Section
general
for a
It is defined
1
+ {A3t+.
At = 1 + At + 21

simply a diagonal matrix with entries


is
this
matrix A,
eigenvalues ofA determine the
diagonal
the
that
the
see
For
= eAttCi,we
(t)
M
eigenvectors (columns ofM) determine
Since
eAit.
the
and
rates
growth or decay occurs. COnVerting
growthor decay
this
which
along
we have the general solution
the directions
coordinates,
back to the original
Cie

Vi

corresponding to Ai. This expression shows


eigenvector
where Viis the
when A has a complete LI set of eigenvec-

explicitlythat the solution


of exponential growth and decay in the
combination
simple
a
is
tors
eigenvectors.
directions defined by the
this result is that
An important

general consequence

of

defined by
conditionxo that lies on the line

an initial

the kth eigenvector leads

= aektvk. This solution


to Ci = cxikand thus to a solution x(t)

Vk. This line is


will never leave the line defined by the eigenvector

for the dynamics: an initial condition


SUBSPACE
thus an INVARIANT

that starts in an invariant subspace never leaves it. Similarly, each pair

of eigenvectorsdefinesa plane that is invariant, each triple definesa


three-dimensionalspace that is invariant and so on.
Aparticularly relevant special case of an invariant plane arises when
A has a complexconjugate pair of eigenvalues ico with correspond-

ing eigenvectorsv and ; see Exercise1.59. A solution with initial

conditionsin this subspace has the form

x(t) = Cleateiwtv + c2ete-tWt


If the initial conditions are real, then c2 =
Cl (to cancel out the imaginary parts of the two terms in this
equation). Equivalently, we can write

that

x(t) = Re (cle te t(0t v

whereRe denotesthe real


part of

an expression. Now writing


cr + iCi,V = vr + ivi
and e iwt = cos (0t +

isin cot, this can be vsmitten

2.2

First-order Linear

form
real
ill

x(t)

Systems

101

as

eat (cr COSwt +

sin wt)vr +

COS(0t + cr sin (Ot)vt

initial conditions, the invariant subspace corresponding


real
for
Thus
conjugate eigenvalues is the plane spanned by vr
to a pair of complex
and Vi.

diagonalized the situation is not as simple, but is


If A cannot be
very complicated. We still have that 5' = Jy, but J
stillnot really
rather than diagonal. Triangular systems have one-way
is triangular can solve from
the bottom up, back substituting as we
coupling,so we
illustrate, we consider the case
To
go.

Wecan solve the equation

= y first and then back substitute,

gettingan inhomogeneous problem for Yl. The inhomogeneousterm


preventsthe behavior from being purely exponential,and the general
solutionbecomes (after converting back to the original coordinates)
x(t) = CletV1 + c2et(v2 + tV1)

(2.5)

whereVI is the eigenvector corresponding to and c'2 is the generalizedeigenvector;compare with Example 1.13. The line defined by the
eigenvectorVI is an invariant subspace, as is the plane defined by VI
andv2. However,the line defined by the generalized eigenvector is
not invariant.

Notethe tent term that appears in (2.5). In initial-valueproblems,

this term allows solutions to grow initially even when all of the eigenval-

ueshavenegativereal parts. As t 00,though, the exponentialfactor


dominates.Thus even when A is defective,its eigenvaluesdetermine
thelong-timedynamics and, in particular, the stability. The issue of
stabilityis addressed at length in Section 2.5; for the present we note
STABLEinitial
that the steady state x = 0 of (2.4)is ASYMPTOTICALLY
conditionsapproach it as t ooif and only if all the eigenvaluesof A
havenegativereal parts.
To summarize, the above results show that every homogeneous

constant-coefficient
problem = Ax can be rewritten as S' = Jy,
whereJ has a block diagonal structure exemplifiedby the following

Ordinary Differentia/

Equations

102

template
(0

each block are decoupled from those


to
corresponding
define invariant sub.
Thedynamics and the associated eigenvectors
invariant subspace are decoupled from
of all the others
each
in
spaces; the dynamics

those in all the

others.

Dynamics of Planar Systems


Qualitative
22.3
system, there is a large range of Possible
Il-dimensional
In a general
real and complex, with positive or negeigenvalues,
of
combinations
simple and general classification of the
a
2,
=
n
For
parts.
ativereal
Such systems are called PLANAR,
bepossible.
is
dynamics
possible
occur on a simple plane (sometimes called
cause all of the dynamics

two eigenvectors (or an eigenvector and


by
defined
PLANE)
PHASE
the
Writing
generalized eigenvector,if A is defective).

= Ax =

thecharacteristicequationfor A is

Noticethat a + d = trA and ad bc = detA, which we call T and D,


respectively.Recallthat T = Al + and D = Al,2.In two dimensions,
the eigenvaluesare determined only by the trace and determinant of
the matrix. When Re(A1) < 0 and Re(2) < 0, any initialcondition de-

cays exponentially to the originthe origin is ASYMPTOTICALLY


STABLE.

Theseconditionsare equivalent to T < 0, D > 0.


Figure2.1 shows the dynamical regimes that are possible for the planar system as characterized by T and D; asymptotically stable steadystate solutionsoccupy the second quadrant,
excluding the axes. Each
regimeon Figure2.1 shows a small
plot of the dynamics on the phase

2.2

First-Order Linear Systems

103

determinant
-- definite

+ definite
Re(A) > O

stable spiral

unstable spiral

stable node

(trace)_

< 0 and

unstable node
trace

>0

unstable saddle

indefinite

indefinite

Figure 2.1: Dynamical regimes for the planar system dx/dt = Ax,
A e
parametrized in the determinant and trace of
A; see also Strang (1986, Fig. 6.7).

planein that regime; the axes correspond to the eigenvectors (or real

andimaginaryparts of the eigenvectorsin the case of complexconjugates)and trajectories x(t) on this plane are shown with time as

the parameter. The arrows on the trajectories indicate the direction of


time. An important curve on this diagram is T2 4D = 0, where the
twoeigenvaluesare equal. This parabola is also the boundary between
on the phase plane) and exponential ones
oscillatorysolutions (SPIRALS
a spiral arises from a complex conjugate pair of eigenvalues
(NODES);
while a node arises from the case of two real eigenvalues with the same
sign. In the lower half of the figure, D < 0, the eigenvalues are real and
withopposite signs. The steady states in this regime are called SADDLEPOINTS,
because they have one stable direction and one unstable.
Figure 2.2 shows the dynamic behavior that occurs on the boundaries
betweenthe different regions.

Ordinary Differentia/
102

template
(0

corresponding to each block are decoupled fromthose


dynamics
The
associated eigenvectors define invariantsubs
the
and
others
of all the
invariant subspace are decoupledfrom
each
in
dynamics
spaces; the
those in all the others.

of Planar Systems
Dynamics
Qualitative
2.2B
system, there is a large range of Possible
In a general It-dimensional real and complex, with positiveor neg.
combinationsof eigenvalues,
simple and general classificationof the
ative real parts. For n = 2, a Such systems are called PLANAR,
bepossible dynamicsis possible.
a simple plane (sometimescalled
cause all of the dynamicsoccur on
defined by two eigenvectors (or an eigenvectorand
PLANE)
the PHASE
generalized eigenvector,if A is defective). Writing

= Ax

the characteristicequation for A is


+ (ad bc) = 0
2 (a +

Noticethat a + d = trA and ad bc = detA, which we call T andD,


respectively. Recallthat T = + 2and D = I2.In two dimensions,
the eigenvaluesare determined only by the trace and determinantof
the matrix. WhenRe(1)< 0 and Re(2)< 0, any initial conditionde-

STABLE.
cays exponentially to the originthe origin is ASYMPTOTICALLY

These conditions are equivalent to T < 0, D > 0.


Figure 2.1 shows the dynamical regimes that are possible for the planar system as characterized by T and D; asymptotically stable steadystate solutions occupy the second quadrant, excluding the axes. Each
regime on Figure 2.1 shows a small plot of the dynamics on the phase

2.2

First-Order Linear

103

Systems

determinant

+ definite

-- definite

stable spiral

stable node

unstablespiral

(Oce _

unstable node
trace

< 0 and 2> 0

unstable saddle
indefinite

indefinite

= Ax,
Figure 2.1: Dynamical regimes for the planar system dx/dt
parametrized in the determinant and trace of
A e
A; see also Strang (1986, Fig. 6.7).

planein that regime; the axes correspond to the eigenvectors (or

real

andimaginaryparts of the eigenvectorsin the case of complexconjugates)and trajectories x(t) on this plane are shown with time as

the parameter. The arrows on the trajectories indicate the direction of


time. An important curve on this diagram is T2 41) = 0, where the
two eigenvaluesare equal. This parabola is also the boundary between
on the phase plane) and exponential ones
oscillatorysolutions (SPIRALS
a spiral arises from a complex conjugate pair of eigenvalues
(NODES);
whilea node arises from the case of two real eigenvalues with the same
sign. In the lower half of the figure, D < 0, the eigenvalues are real and
withopposite signs. The steady states in this regime are called SADbecause they have one stable direction and one unstable.
DLEPOINTS,
Figure2.2 shows the dynamic behavior that occurs on the boundaries
betweenthe different regions.

Ordinary Differential

Equations

104
determinant

X = ico

neutral center

node or
trace

or

stable node

stable star

unstable node

unstable star

behavior on the region boundaries for the plaFigure 2.2: Dynamical


see also Strang
nar systemdx/dt = Ax, A e R2x2
(1986, Fig. 6.10).

2.2.4 LaplaceTransform Methods for Solving the Inhomogeneous


Constant-Coefficient Problem

Inhomogeneousconstant-coefficient systems also can be decoupledby


transformationinto Jordan form: = Ax + g(t) becomes S' = Jy +
h(t), where h(t) = M-lg(t). Accordingly, once we understand how
to solve the scalar inhomogeneous problem, we will have learned what
we need to know to address the vector case. A powerful approach to
solvinginhomogeneousproblems relies on the LAPLACE
TRANSFORM.
Definition

Considerfunctionsof time f (t) that vanish for t < 0. If there existsa

real constant c > 0 such that f(t)e ct Osufficiently fast as t 00,we


can define the Laplace transform of f (t) , denoted f (s) , for all complex-

First-order

Linear Systems

that Re(s)
s such

105

L(f(t))

The

inverse

Re(s) c

(2.6)

formula is given by
c+i 00
est f(s)ds

2TTi c-ioo

properties

(2.7)

operator is linear. For every scalar a,


transform
Laplace
1. The
g (t), the following holds
and functions f (t) ,
L {cxf(t) + g(t)} = M (s) + (s)
The inverse transform

is also linear

L-1 icx(s) +

= af(t) + g(t)

2. Transform of derivatives

df(t)

d2f(t)
dt2

= s 2f(s) sf (0) f' (0)

dnf(t)

n= s n f (s) s n- l f(0) s 2f'(0)

3. Transform of integral

f(t')dt' = -f(s)

4. Derivativeof transform with respect to s

anf(s)
dsn

Ordinary Differential

106

5. Time delay and

s delay

a)H(t a)) = e-QSf(s)

L(f(t

unit step function is defined as

where the Heaviside or

Il(t) =
6. Laplaceconvolutiontheorem

t')dt'
L

f (t t')g(t')dt'

7. Finalvalue theorem

lims(s) = limf(t)
if and only if sf(s) is bounded for all Re(s) 0
8. Initial-value theorem

lims(s) = lim f (t)


00

We can readily compute the Laplace transform of many simplef(t)


by using the definition and performing the integral. In this fashionwe
can construct Table 2.1 of Laplace transform pairs. Such tables prove
useful in solving differential equations. We next solve a few examples

using the Laplacetransform.

Example 2.1: Particle motion


Consider the motion of a particle of mass m connected to a springwith

spring constant K and experiencing an applied force F(t) as depicted


in Figure 2.3.
Let y denote the displacement from the origin and model the spring
as applying force FS -Ky. Newton's equation of motion for this
system is then

d2y
dt2

= F Ky

2.2

First-OrderLinear

Systems

(t)

107

s2

sn+l
cos (0t

s2 + (02

sin cot

s2 + (02
co
s2 (0 2

Sinh (0 t

cosh (0t

s2 (02

eat

teat

eat cos (0t


eat sin cot

Table 2.1: Small table of Laplace transformpairs. A more extensive


table is found in Appendix A.

-KY

F(t)

Figure 2.3: Particle of mass m at position y experiences spring force


Ky and applied force F(t).

Ordinary Differential
108

Equations

this second-order
boundary conditions for
two
require
rest at the
We
the particle is initially at
origin,then
assume
we
If
equation.
=
t
at
O
specified
and
both
these initial
conditions are
the boundary
conditions are
dY (0) = o

dt

If we divide by the

mass of the particle we can express the modelas

d2Y+ y = f
dt2

dy(t)
= F/m. Take the Laplacetransformof
in which = K/m and f
of the particle versus time y(t), for
the model and find the position

arbitrary applied force f (t).

Solution
of motion and SUbstitut.
Taking the Laplacetransform of the equation
ing in the two initial conditions gives

2
s 2(s) sy(0) y' (0) + k (s) = f (s)
s 2(s) + k2(s) = f (s)
Solvingthis equation for y(s) gives

Wesee the transform is the product of two functions of s. The inverse


of each of these is available
-1

= sin kt

The first followsby the definitionof f (s) and the second followsfrom
Table2.1. Usingthe convolutiontheorem then gives
1

f (t') sink(t t')dt'

2.2

First-order

Linear Systems

109

the complete solution. We see that the particle position


have
andwe
convolut10n 1
wish to check that this solution indeed satisfies the
is a
may
Thereader equation and both initial conditions as claimed.
differential

A forced first-order differential equation


Example2.2:
first-order differential equation with forcing term
considerthe

dx
= ax + bu(t)
dt

x(0) = xo

transform to find x(t) for any forcing u(t).


Usethe Laplace
Solution

transform, substituting the initial condition, and


Takingthe Laplace
solvingfor (s), give
SR(s) xo = ax(s) + b(s)

X(s) =

xo

the second
Wecan invert the first term directly using Table 2.1, and
termusing the table and the convolution theorem giving
x(t) = xoeat + b

u(t')dt'
ea(tt')

Wesee the effect of the initial condition xo and the forcing term u(t).
If a < 0 so the system is asymptotically stable, the effect of the initial
conditiondecays exponentiallywith time. The forcing term affects the
solutionthrough the convolution of u with the time-shifted exponential.

Example2.3: Sets of coupled first-order differential equations


Considernext the inhomogeneous constant coefficient system (2.3),
withg(t) = Bu(t)

dx = Ax + Bu(t)
dt

x(0) = xo

110

Ordinary

Differential
qu

inwhichx e VI, u e Rtn,A e Rnxn B e

Qtions

applications, x is known as the state vector an In systems


and
variable vector. Use Laplace transforms to

Again taking the Laplace transform, substitutin


g the
and solving for X(s) gives
initial
conditio
n,

SR(s) -- xo = AX(s) + B(s)


(sl
xo + B(s)
(s) = (sl - A)-l xo + (sl -

We next require the matrix version of the Laplace

f(t)

transformpair

f(s)
1

eAt A e Rnxn

(sl

A)-l

which can be checked by applying the definitionof the


Laplacetrans.
form. Using this result and the convolution theorem gives
x(t) = eAtxo+

Notice we cannot move the constant matrix B outside the integral


aswe
did in the scalar case because the indices in the matrix multiplications

must conform as shown below

x(t) = eAt xo +
nxl

nxn nxl

eA(t-t') B u(t') dt'


nxn nxm mxl

2.2.5 Delta Function


The DELTAFUNCTION,
also known as the Dirac delta function(Dirac,
1958, pp. 58-61) or the unit impulse, is an idealization of a narrowand

2.2 First-Order

Linear Systems

111

tall "spike." Two examples of such functions are


gdx)

1
47T(X

e x2/4cx

(2.8)

gdx) = TT(CX2+ X2)

(2.9)

where > 0. Setting x = 0 and then taking the limit tx -+ 0 shows that
00,while setting x = xo 0 and taking the same limit
showsthat for any nonzero xo,
ga(xo) 0. These functions
becomeinfinitely high and infinitely narrow. Furthermore, they both
have unit area

gdx) ax = 1

A set of functions depending on a parameter and obeying the above


propertiesis called a DELTA
FAMILY.
The delta function (x) is the
limitingcase of a delta family as
0. It has infinite height, zero
width, and unit area. It is most properly thought of as a GENERALIZED
or DISTRIBUTION;
FUNCTION
the mathematicaltheory of these objects
is described in Stakgold (1998).

operationally,the key feature of the delta function is that when

integrated against a "normal" function f (x) the delta function extracts


the value of f at the x value where the delta function has its singularity

f(x)(x) dx = limJ f (x)gdx) dx = f (0)

(2.10)

The delta function also can be viewed as the generalized derivative of


the discontinuous unit step or Heavisidefunction H(x)

dH(x)
dx
Also note that the interval of integration in (2.10)does not have to be
(00,
00).The integral over any interval containing the point of singularity for the delta function produces the value of f (x) at the point of
singularity. For example

f(x)(x a)dx = f (a)

for all

for all ae R

Finally,by changing the variable of integration we can show that the


delta function is an even function

Ordinary Differential

Equations

112

Delta Function
the
of
Derivatives

Doublet.

of the delta function is that it is


property
An interesting derivative is termed the doublet or dipole

The first
also differentiable.
(x)
usually denoted (5'

d(x)

dx

b(x) to denote the doublet instead


notation
dot
Sometimeswe see the integration by parts on the integral
of D'(x). If we perform

(x)dx

the negative of the first derivativeoff


selects
doublet
we find that the location of the doublet's singularity
evaluatedat the
(2.11)

also find by changing the variableof


We
equation.
this
in
Note the sign
delta function, or singlet, which is an even
integration that, unlike the
function, the doublet is odd
'(x)

integration by parts
the
Higher-order derivatives. Repeatedtriplets, quadruplets, produces
etc.
followinghigher-orderformulas for
nf (n)(0)
n 0
(x)dx =

As with the singlet and doublet, we can change the variableof integration and shift the location of the singularity to obtain the general
formula
a)dx =

nf (n )(a)

a e R

Finallywe can use the definition of the Laplace transform to takethe


transform of the delta function and its derivatives to obtain the transform pairs listed in Table 2.2.

2.3 Linear Equations with Variable Coefficients


2.3.1 Introduction
In many chemical engineering applications, equations like this one are
encountered

x 2d2y

dy

dx

(2.12)

2.3 Linear Equations with Variable Coefficients

113

f(s)
1

s
1

Table 2.2: Laplace transform pairs involving (5


and its derivatives.
This is called BESSEL'SEQUATIONOFORDERv, and arises in the
study of

diffusion and wave propagation via the Laplacianoperator in cylindrical


coordinates. Since the coefficients in front of the derivativeterms are
not constant, the exponential functions that solvedconstant-coefficient

problems does not work here. Typically,variable-coefficient


problems
must be solved by power series methods or by numericalmethods, as
they have no simple closed-form solution. We focus here on secondorder equations, as they arise most commonlyin applications.

23.2 The Cauchy-EulerEquation


equation, also called the EQUIDIMENSIONAL
The CAUCHY-EULER
equa-

tion, has a simple exact solution that illustrates many important features of variable-coefficientproblems and arises during the solution of
manyproblems. The second-order Cauchy-Eulerequationhas the form
aox 2y" + al xy' + a2Y = 0

(2.13)

where y' = dy /dx. Its defining feature is that the term containingthe

nth derivativeis multipliedby the nth power of x. Becauseof this,


guessing that the form of the solution is y = x a yields the quadratic
equation

and

1) +

then each root leads to a solutionand thus the general

solution is found

+ a2 = 0. If this equation has distinct roots

Y=

+ C2xa2

(2.14)

=
For example, let ao = 1, al = 1, = 9,yielding the equation cx2 9
0, which has solutions = 3. Thus the equation has two solutions

Ordinary Differential

114

general solution is y = ClX3+


of the form y = xox.the
up at x = O;this singular behavior
that this solution can blow
(linear) problems, but is
frequently
arise in constant-coefficient
found
the general solution does
In the case of a repeated root,
given one solution to a second-order

linear problem

(x), the

(x). For example, second


can be found in the form Y2(x) =
let
repeated
root
=
I.
the
yielding
l,
Thus
=
=
2
-- ,
l,al
=x
into
the
substitution
upon
differential
= A(x)x, which,
equation,

yields

A"x3 + 2A'x2 A'x2

Ax + Ax = O

which simplifies to

Letting A' = w leads to a simple first-order equation for w

so that w = c/x and thus A = clnx + d, where c and d are arbitrary

constants. Thus the general solution for this problem can be written

y(x) =

+ c2xlnx = x(C1+ Inx)

It can be shown in general that (second-order) Cauchy-Eulerequations


with repeated roots have the general solution
y(x) = x a (C1+ Q Inx)

(2.15)

2.3.3 Series Solutions and the Methodof Frobenius


A general linear second-order problem can be written

p(x)y" + q(x)y' + r(x)y = 0


or

q(x)

r(x)

p(x)

p(x)

(2.16)

(2.17)

q(x) and r (x) are ANALYTIC,


i.e., they have a convergent TaylorseIf p(x)
p(x)

POINT.
ries expansion,at some point x = a, then a is an ORDINARY
Otherwise, x = a is a SINGULAR
POINT.

2.3 Linear

Equations with Variable Coefficients

If x = a is an ordinarypoint, there exist

power series

115

solutionsin the form of


(2.18)

TWOsuch solutions can be found, thus yielding the general


solution.
Lettingp be the distance between a and the nearest
singular point of
the differential equation, which might be at a complexrather
than a
real value of x, the series convergel for Ix al < p. Accordingly
p is
calledthe RADIUSOFCONVERGENCE
of the series. The exception to this
is when a series solution truncates after a finite number of terms, i.e.,
0 for M > MO;in this case the sum is always
CM
finite for finite x.
Example2.4: Power series solution for a constant-coefficientequation
Letp(x) = 1, q(x) = 0 and r (x) = 1<2,
resultingin the equation
Solvethis by power series expansion.
Solution

Weseek a solution by expandingaround the ordinarypoint a = 0. For


this simple example, every point is an ordinary point. Inserting the
solutionform, (2.18),into this equation yields

n=2

- 1)cnxn-2 +

E cnxn = 0

n=0

The two sums can be combined if we can make their lower limits the

same. Thus we set n = m + 2 in the first series and n = m in the


second,obtaining

[ (m +

+ 1)cm+2 + k2cm] x m = 0

Thiscan only hold if the term inside the square brackets is zero for all
m, requiring that
cnk2

IA full understanding of convergence of power series requires knowledge of functions of complex variables, see, e.g., Ablowitz and Fokas (2003).

Ordinary Differential
116

now reverted to
have
we
(where
that
arbitrary, we find

Equations

using n as the index). Leaving


coand

Cok2

Clk2
3!

--C21<2 cok4

4!

4-3

C3k2

5-4

5!

Cl (recall that it is arbitrary), the series


Absorbinga factor of l/k into
solution becomes
k2x 2

k4x 4

kx

4!

k 3x 3

k 5x 5

3!

5!

co and Cl, so it is the gen.


Note that this has two arbitrary constants
eral solution. The two infinite series can be recognized as the Taylor

expansions of two familiar functions, and we can thus rewrite the gen_
eral solution as
y(x) = co cos kx + Cl sinkx

If p (x) - 0 at some point x = a, the situation is more complex.We

seta = 0 from now on for convenience. Now q (x)/p (x) and r(x)/p(x)
POINT.If x (q(x)/p(x))
are not analytic and x = 0 is called a SINGULAR

and x2 (r(x)/p(x)) are analytic,i.e., the singularityin p(x) is not


SINGULAR
POINT.Observe
very strong, then the point is a REGULAR

that x = 0 is a regular singular point for the Cauchy-Eulerequation.In


fact, by multiplying(2.17)by x2 and Taylor-expanding the coefficients,
one can see that when the conditions for a regular singular point are

satisfied, this general case reduces precisely to a Cauchy-Euler equation


as x 0. This observation motivates the METHODOFFROBENIUS,
which

seeks solutions of the form

y(x) =

n
E cnx

n=0

(2.19)

The power series has the same convergenceproperties as described


above for ordinary points.

2.3 Linear

Equations with Variable Coefficients

117

Frobenius solution for Bessel's


Example2.5:
equation of order zero
(2.12)
with
v
equation
= 0 is
Bessel's
(2.20)

Here X

0 is a regular singular point. Solveby the


method of Frobe-

Ilius.

Solution

Observethat this equation can be written x2y" +xy' + (0+x2)y = 0 so


the corresponding Cauchy-Euler equation is thus x2y" + xy' + Oy = 0.
yields the repeated root = 0 and
seekinga solution y =
thus a
y(x)
Cl
+
c2
solution
Inx.
As
we
will
general
see, this structure is
form
of
the
the
solution
in
to
reflected
Bessel's equation.
Frobenius
the
solution
Inserting
form, (2.19)into (2.20)yields that
n+cxl

n=0

+ CnX
(n+a)cnx n+cxl

Tosimplifythis series, set n = m + 2 in the first two sums and m = n


in the third. Then set all the ms back to n. This yields a summation
startingat n = 2,which is fine as long as we make c-2 = c-1 = 0. The
formulabecomes
2) 2ctt+2 +cn]x

Sincex can vary, the equality can only hold if the terms in the brackets
are all zero. This is the recursion formula for the coefficientscn. The
firstterm (n = 2)picks out the Cauchy-Eulerbehavior and is called the
EQUATION.
Since c-2 = 0, it reduces to (n + +
INDICIAL
=
= 0.
Aswe anticipated above with the corresponding Cauchy-Eulerequation,
thishas the repeated root = 0. The general recursion relation for the
coefficientsreads
cn

Sincec-1 = 0, all the coefficients with n odd are zero. Therefore, only
oneof the two solutions to the problem has the form of (2.19),again,
in parallel with the Cauchy-Euler analysis. With some rearrangements,
this solution becomes

Yl(x) =

n=0

Ordinary Differential
118

Equations

symbol Jo(x) and is called the "Bessel


special
the
zero." For general v, the solutions
order
This function has
and
kind
function of the first A second solution can be found for this problem
It is not of
are denoted Jv(x).
Frobemus
see Exercise 2.31. (again
parameters;
O
of
x
as
variation
as
by
anticipate
d
logarithmic singularity Cauchy-Eulerequation).
a
having
form,
corresponding
It
is
the
from the solution tothe "Besselfunction of the second kind and order
called Yo(x) and is
Thegeneral
general v are denoted
for
solutions
Singular
zero."
solution is
c2Yo(x)
(2.21)
y(x) = ClJo(X) +
functions Jo and Yo.Note that for com.
of
graph
a
for
2.3
See Table
shows the solution for the radialpart
also
table
the
parison purposes,
cylindrical, and spherical coordinates.
rectangular,
in
0
=
y

of V2y
indicial equation yielded a singlerethe
example,
previous
In the
solution of Frobenius form. Other casesare
one
and
for
root
peated
and their consequences.
possibilities
the
are
Here
possible.

equal, only one Frobenius solutionis obare


roots
indicial
the
If
1.
the above example.
tained. This is what occurred in
constant, then each root leads
2. If the roots differ by a noninteger
solution is obtained.
to a solution and the general
then the (algebraically)largerroot
3. If the roots differ by an integer
leads to a Frobenius solution and either
(a) the smaller root also leads to a Frobenius solution and the
general solution is obtained, or
(b) the smaller root does not lead to a second solution of Frobe-

nius form. A second solution can be found by reduction


of order and have a logarithmic singularity just as in the

Cauchy-Euler case.

2.4 Function Spaces and Differential Operators


2.4.1 Functions as Vectors
One of the main tasks of mathematical modeling is the exact or approximate representation of functions. Here we extend the ideas of vectors
and bases into the regime where each vector is a function, so the space
SPACE.
the vectors live in is a FUNCTION

2.4 Function Spaces and Differentia/

Operators
119

Rectangular
Coordinates

d2y
dx2

Cylindrical
Coordinates

=O

Spherical
Coordinates

I d

2dy

r2 dr (r dr ) Y

cosx
Jo(r)

cosr

8 10 12 14

sinx
Yo(r)

sin r

10 12 14

8
10

10
10

Io(r)

Ko(r)

Table 2.3: The linear differential equations arising from the radial
part of V2y y = 0 in rectangular, cylindrical,and spherical coordinates. Bessel functions (Jo,Yo)and modified
Bessel functions (10,1<0)are two linearly independent so-

lutions in cylindrical coordinates for the plus and minus


signs, respectively. The solutions in spherical coordinates
are called spherical Bessel functions.

12 14

Ordinary Differential
120

space cn, the usual inner productof


finite-dimensional
Il-dimensional version of the dot product
In the
the
simply
tors u and v is
uiVi
v (x) in a domain a
and
u(x)
For functions
is
to this relation

(u(x),v(x)) =

b, a natural
analog

dx

product for functions defined on the interval


inner
usual
This is the
product, we can obtain a norm
inner
this
From
[a, b].
1/2
a

u(x)(x) dx

which plays an important role shortly, is given


product,
inner
Another
by the formula

dx

(u(x),

weight function and must be positive


in
where w(x) is a so-called
bounded function

(a, b). Finally, with these


satisfies

definitions, a

is onethat

2 < 00
dx = llu11w

With these definitions in hand, we can define an important function


space. The set of functions u(x) that satisfy (u, u) = llull < withthe
SPACEL2(a, b). If wehad
usual inner product (w = 1) is the LEBESGUE

used a nonunit weight function w (x) in the inner product, wewould


SPACES.
A
have L2,w(a, b). Lebesgue spaces are examples of HILBERT
Hilbert space is essentially identical to a space of vectors with infinitely
many components, so that all of our intuition about directions,lengths
and angles carries over from two dimensions into an infinite numberof
dimensions!

Basis Sets and Fourier Series


In a finite-dimensionalspace, any vector can be represented in an orthogonal basis {el, Q, } as
(ei, ei)

ei

Spaces and Differential


2.4 Function
Operators
121

in
Thesame is true a Hilbert space, except that
function(hi(x) and the sum is infinite2,e.g.,each basis vector is now

of the most important basis sets for 1.2are


TWO
the trigonometric
functionsand the Legendre polynomials.
consider the space L2( r, TT),i.e., the
Lebesgue space defined
as
above,except on the interva13 from TT
to r. The functions
etkx = cos kx + isin kx, k

arein this space. In addition, they satisfy


(e ikx , e ilx ) = 2TTk1

Thatis, they are orthogonal. A natural question, then, is


whether this
set can be used as a basis for L2
TT).Specifically,we examine
the
propositionthat every function in L2(TT,
r) can be represented as
ikx

(2.22)

Thisis the trigonometric FOURIER


SERIES
representation of f (x). The
are the FOURIERCOEFFICIENTS
and are given by the standard formula
for expansion of a vector in an orthogonal basis
(ek,ek)

dx
f(x)e LkX

27T -TT

(2.23)

The equality (2.22) cannot possibly hold at everypoint x for every


IT), simply because trigonometric functions
function f (x) e L2(TT,
are continuous and smooth, and functions in
TT)are allowed
r) is not measured pointto have discontinuities. Distance in L2(7T,
wise,however, but rather via the L2 norm. To address the issue of
the distance between a function and its Fourier series representation,
considerthe finite trigonometric series expansion
k=-K
2Depending
on the specific situation, the sum's lower limit might be 0, 1, or 00.
3The interval
(0, 27T)might also be used.

Ordinary Differential

122

Equations

and recall that in L2,the distance between f and Pk is givenby

f (x)

Uketkx dx

We can now ask the question: given integer K, what coefficients


minimize the 1-2distance between f and PK? It can be shown that the
solution to this minimization problem is
for k

1, 2, ... , K, with the Ckgiven by (2.23) (Gasquet and Witomski

1999). Becausethe Ckdo not depend on the number of terms,K,if


we decide to increase the order of the approximation, we do not need
to recalculate the lower-order coefficients. We can now considerthe

truncated Fourier series

Cke

ikx

The question of convergence of this series to the function f is nontrivial; we state without proof that for functions in L2(TT,
TT)

as K 00
llf(x)
The rate of convergenceof fK to f depends on the behaviorof the

Fourier coefficients Ckas 11<1 00. Returning to (2.23) and integrating

by parts

2TTCk = (f, e lkx )

(2.24)

f(x)e tkxdx

-1

(2.25)

f' (x)e -ikx dx

(2.26)

Therefore ICkldecays at least as fast as k-l as k 00.This is oftenwritten as Ck= O(k -l ): "Ckis order k-l " If, additionally, f (r) = f(-TT)

and f' (x) is differentiable,then the first term in (2.26)vanishesand


we can repeat the integration by parts procedure on the remainingintegral to conclude that Ck= O(k-2). Iterating this argument,we canconr),
clude that if f (x) is m-times continuouslydifferentiablein (TT,
i.e., the mth derivativef (m)is continuous, and that f (j) is periodicfor
all j m 2, then
= O(k-nt)

123

Spaces and Differential Operators


Function
2.4
exact
1

K = 10
0.8
0.6

fK(x)

0.4
0.2

0
-0.2

-4

-3

-2

-1

trigonoFigure 2.4: Function f (x) = exp ( 8G)2) and truncated


metric Fourier series approximations with K = 2, 5, 10.
The approximations with K = 5 and K = 10 are visually
indistinguishable from the exact function.

to j
Thecase just discussed, in which Ck = O(k-2), corresponds
= 2. For infinitely smooth periodic functions, this argument impliesthat the Fourier coefficientsdecay faster than any finite negative
convergence. Figpowerof k. This is called exponential or SPECTRAL
ure 2.4 shows truncated Fourier series approximations to the function
f (x) = exp ( 8() ) with several values of K. Although this function
is not exactlyperiodic, its function values and derivatives at x =
are extremelysmall, so convergence is rapid.

If f (x) is discontinuousor f (r) * f (r), then Ck= O(k-l )

convergenceis very slow. The most obvious characteristic of Fourier


seriesrepresentations of discontinuous functions is the GIBBSPHENOMENON,
the rapid oscillation of the truncated series fn in the vicinity of the discontinuity. For further discussion of the convergence of
Fourierseries see Gasquet and Witomski (1999);and Canuto, Hussaini,
Quarteroni,and Zang (2006).

124

Ordinary Differential

Equations

Example 2.6: Fourier series of a nonperiodic function


What is the Fourier series expansion of f (x) = x?
Solution
Application of (2.26)immediately yields that
i

Observe that c-k


series as

k0

(see Exercise 2.5), so we can write


the

fK(x) = co + 2

Fourier

(Re(ck) cos kx Im(Ck)sin kx)

which in the present case reduces to

fK(x) =

sinkx

This series contains only sines, not cosines, reflecting the fact
that the

function f (x) = x is odd. Figure 2.5 shows the approximation


for

K = 5, 10, and 50, which exhibits Gibbs phenomenon as expected


for a

nonperiodic function.
The plot remains essentially the same if the discontinuityis in the
interior rather than on the boundary. For example, the function

f(x) =
is periodic (along with all its derivatives) but has a discontinuity at the

origin. The Fourier series of this function is the same as that for the
previous, except shifted by 7T

fK(x) =

sink(x + 7T)=

-2

sin kx

For trigonometric Fourier series, Gibbs phenomenon occurs whether


the discontinuity occurs on the boundary or in the interior of the domain.

Spaces and Differential Operators


Function

125

4
3
2
1

fK(x) 0
-1
-2

-3
-4

-4

-3

-1

-2

series approximation to
Figure 2.5: Truncated trigonometric Fourier
as

f (x) = x, using K = 5, 10, 50. The wiggles get finer

K increases.

Implicitly,the trigonometric basis assumes that the function is pethe


riodic,with the period being the length of the interval. This is why
Gibbsphenomenonoccurs if the boundary values of the function are
not the same. Another basis that does not make this implicit assumpThis basis can
POLYNOMIALS.
tion is given by the so-called LEGENDRE
be constructedby performing Gram-Schmidtorthogonalization on the
set {1,x, x 2, x 3, .. .}. The first several of these polynomials, now in the
spaceL2(1,1), the usual setting for polynomial basis functions, are

PI(x) = x

(2.27)
(2.28)

P2(x) = (3x2 - 1)/2

(2.29)

PO(x) = 1

Pj+1(X)=

xPj(x) -

Pj-l (X)

(2.30)

126

Ordinary

Differential

and the Legendre-Fourier series representation of a function


is

Note that the sum starts with the index i = 0, whichis c


onventional
polynomial bases.
for
not
is
basis
orthonormal;
this
written,
As
instead each
has been scaled so that its value is 1 at x = 1. The function
f
can be represented exactly, since PI (x) = x. Convergence (x)
for Fourier
series based on Legendre polynomials is analogous to that
for
metric functions; in particular, spectral convergence is found trigonoforfunc_
tions that have infinitely many derivatives, whether they are
periodicor
not. Werefer the interested reader to Canuto et al. (2006)for
detailed
analysis.
Figure 2.6 shows Legendre-Fourier series approximations to the
func-

tion f (x) exp ( 8x2) truncated at n + 1 terms, i.e.,includingpoly.


nomials up to degree n. As with the trigonometricFourierseriesapproximation of this function, convergence is rapid. Figure2.7shows
Legendre-Fourier Series approximations to the unit step functionf(x) =
H(x); because this function is discontinuous, the Legendre-Fourier
series also displays Gibbsphenomenon.
The trigonometric and Legendre basis sets are very important,but
there are many others that also are important and widelyseenin applications. The following section introduces an entire class of equations,
each of whose members generates a basis set.
2.4.2 Self-Adjoint Differential Operators and Sturm-LiouvilleEquations

When we studied linear algebra, we learned that self-adjointmatrix


operators in [R1thave special properties, namely that their eigenvalues
are real and their eigenvectors form an orthogonal basis for Vt. Selfadjoint differential operators also generate basis vectors (functions).
Recallthe definition of the adjoint L* of an operator L

(Lu,v) = (u, L*v)


Let us apply this definition to the operator L = d/dx in the interval

2.4

Spaces and Differential Operators


Function

127

exact
1

n = 10

0.8

0.6
0.4

fn(x)
0.2

-0.2
-0.4

-1

0.5

-0.5

Figure 2.6: Functionf (x) = exp ( 8x2) and truncated LegendreFourier series approximations with n = 2, 5, 10.

[0,1] and the usual, i.e., uniformly weighted, inner product

u(l)v(l)

u(x)v'(x)dx

SinceL is here a first derivative, any differential equation involving it


requiresspecification of one boundary condition. As an example, we
requirethat u(0) = 0. Now the boundary term at x = 0 vanishes. Now
observethat if we require that v (1) = 0, the boundary term at x = 1
alsovanishes, leaving the result

(Lu,v) =

u(x)v'(x)dx

whereL* = d/dx. Therefore, if L is d/dx, operating on functions that


vanishat x = 0, then from the above equation, L* = d/dx, operating

Ordinary Differential

128

Equations

0.8
0.6

fn(x)

0.4
0.2
0

-0.2

-1

-0.5

0.5

Figure 2.7: Functionf (x) = H (x) and truncatedLegendre-Fourier


series approximations with n = 10, 50, 100.

on functions that vanish at x 1. The first derivative operator is not


self-adjoint.
If, however,we let L = d2/dx2 and require that u(0) = u(l) = O,
then the same procedure (but using integration by parts twice)shows
that L* is also d2/dx 2 operating on the same domain. The secondderivative operator, therefore, with appropriate boundary conditions,
is self-adjoint. More generally, consider a class of second-order differoperators. These operators
ential operators called STURM-LIOUVILLE
have the general form
1

(2.31)

w (x) dx

in the domain a < x < b, with homogeneous boundary conditions


cxu(a) + u'(a) = 0,

yu(b) + u'(b) = 0

To avoid the possibility of singular points, p (x) must be positive in the

Spaces and Differential Operators


Function

domain.

129

Furthermore,take the inner product to be

(u,v)w

u(x)v(x)w(x) dx

w (x) here is the same as in (2.31).For this integral to be


function
The
in the domain.
inner product, we must require that w (x) > 0
proper
a
that Sturm-Liouvilleoperators are self-adjoint. Reshow
now
We
integration by parts yields
peated

+ r(x)u v w dx
dx
(x)
aw
p(b) (u'(b)v(b) u(b)v'(b))
p(a) (u'(a)v(a) u(a)v'(a))
1

(Lu,

a w (x) dx

w dx

(2.32)

(2.33)

vanish, then this expression satisfies the selfIf the boundary terms
(u, Lv). This is the case if the above
adjointnesscondition (Lu, v) =
u and v. The restriction on the
boundaryconditionsapply on both
vanishes at one or both
boundaryconditions can be relaxed if p (x)
the function and its
boundaries,in which case only boundedness of
latter case is called a sinderivativeis required at that boundary. The
operator, because it has a singular point at the
gularSturm-Liouville
the boundary terms
boundaryor boundaries where p vanishes. Finally,
are
CONDITIONS
BOUNDARY
alsovanish if p (a) = p(b) and PERIODIC
imposed:u(a) = u(b), u' (a) = u' (b) and likewisefor v. the SturmNextconsider the eigenvalue problem associated with
operator4
Liouville
Lu + Au = 0
the
Aswith all self-adjoint operators, the eigenvalues are real and
called eigenfunctionsbecause they are elements of
eigenvectorsnow
a functionspaceareorthogonal with respect to the inner product
weightedby w (x). Furthermore, and very importantly, there are an
infinitenumber of eigenfunctions and they form a complete basis for
L2,w
(a, b). Wenext consider three Sturm-Liouvilleoperators that producesome famous eigenfunctions that are popular choices for use as
basis functions.

4Thisis the conventional form for writing differential eigenvalue problems. Unfortunately,it is different from the convention for algebraic problems.

Ordinary Differential

130

trigonometric basis functions


Example2.7: Generating
d2/dx2,
boundaryconditions
Considerthe operatorL =
for this operator is
u(l) = 0. The eigenvalueproblem
(2.34)

Whatare the eigenvaluesand eigenfunctions?


Solution

This equationhas the general solution


u(x) = Cl sinx

We have thus taken

+ c2 cos

0: a negative value of

would lead to a

gen.
eral solution consisting of growing and decaying exponentials,which

cannot satisfy homogeneous boundary conditions on both boundaries

as can be easily checked. The boundary condition u(0) = 0 requires

that c2 = 0. Setting Cl = 0 leaves only the trivial solution u = 0, soto

satisfy the remainingboundary condition, we require that


sin N/I= 0

This is the characteristic equation for this eigenvalueproblem;it has

infinitely many roots

= n2TT2/12for n = 1, 2, 3, ... , 00.The case n = 0

does not result in an eigenvaluesince sino = 0. Thus the eigenfunctions are


I-ITT
X

un(x) sin

1
The result that Sturm-Liouville eigenfunctions
with (um, un) = 2mn.

form a basis for functions in L2(0,l) implies that we can writeany

function in that space as a Fourier series

f (x) =
where

(f

(sm

n=l

sin

sin

cn(x) sin

nTTX
1

=
21(f (x), sin nrx)
1

This is the FOURIER


SINESERIES
of f (x).
Nowconsider the same operator but with periodicboundaryconditions u(0) = u(l), u' (0) = u' (l). The boundary terms in (2.33)also

Spaces and Differential Operators


Function
2.4
this
vanish in

131

case, because here p(a) = p(b) = 1. Nowthe solution to

(2.34) is

u = exp iMx

the periodicity requirement if = ( 1 for any insatisfies


which
Thus the eigenfunctionsof d2/dx2 with periodic boundary
teger n.
l) are
conditionsin (0,
2nrrx

un expi

we recover the first set of basis functions we considered


TakingI = 2TT,
in Section

2.4.1.

equation revisited
Example2.8: Bessel's
Theoperator

x dx

d)
dx

arisesin many differential equations originatingin problems in polar


coordinates,e.g., diffusion in a cylinder. It has Sturm-Liouvilleform
withw = p = x, r = 0. The eigenvalueproblemfor this operator can
be written

x
or,multiplyingthrough by x2, as
x 2u" + xu' + x2u = 0

Whatare its eigenfunctions and eigenvalues?


Solution

Thisis a variable-coefficient problem with a regular singular point at

x = 0, so we can seek solutions by the method of Frobenius. Alternately,in the present case we can make the substitution z = xv/, thus
rewritingthe equation as

2d2u

du
dz

whichis in fact Bessel's equation of order zero. We already found that


this equation has the general solution u(z) = ClJo(z) + c2Yo(z), or,
revertingto the original independent variable,

132

Ordinary Differential

Equations

To complete the specification of the eigenvalue problem


requires Choo
ing the domain and imposing specificboundary conditions.
s.

Consider

d; this is
all that
Boundednessrequires

is required, since p(0) = 0 and u(l) = 0.


c2 = 0, because Yodiverges logarithmically at the origin.

that
Satisfaction

The top center plot of Table 2.3 shows Jo (x); the positions of
its
determine the eigenvalues A. The first several of these are at zeros
mately x = 2.4, 5.5,8.7, 11.8,... and are tabulated in many approxi_
places,ins
cluding Abramowitz and Stegun (1970). Thus
(2.4/1)2,etc.
The

functions

un(x) =

form an orthogonal basis for L2,u,(0, l). Referring again to Table2.3


ul is the function Jo scaled so that its first zero is at x = l, is the
same function, but scaled so that its second zero is at x = I, etc.
Other boundary conditions could be chosen. For example, one could

require u(a) = 0, u(b) = 0. In this case the eigenfunctionsinvolveboth


Jo and Yo,and the eigenfunctions and eigenvaluesare determinedby
the solution to the coupled nonlinear equations

Jo(Vb)+

=O

Since is arbitrary, it has been set to unity for convenience.Herec2


and are the unknowns. Solution of these highly nonlinear equations
is nontrivial.

Example 2.9: Legendre's differential equation and Legendrepolynomials


Consider the Sturm-Liouvilleeigenvalue problem with p (x) = 1 x2,
w(x) = 1, r (x) = 0 in the domain 1< x < 1
1 x 2) u" 2xu' + Ru = 0

It has regular singular points at x = 1 while the originis an ordinary point. Becausep(x) = 0 at x = 1, only boundednessat these

points is required of the eigenfunctions. What are the eigenvaluesand


eigenfunctions?

2.4 FunctionSpaces and Differential

Operators
133

Solution

seeking a series solution around


x

Oreveals
Oan integer, then one of
the solutions that, if = +
(Exercise
1
of degree
2.35) and
is a Legendre 1) with
using the
polynomial
method of
learnthat the other has logarithmic
Frobenius one
can
because the radius of convergence singularitiesat x
=
of a Power
the distance to the nearest singular
series solution Otherwise,
there is no solution that is bounded point(Ab10witzand is given by
Fokas,
at both x =
fore,the eigenvalues of (2.9)are
1 and x = -1. 2003),
= + 1)
the corresponding eigenfunctions
with I = O,1,2, Thereare the
and

Legendre polynomials
Legendre polynomials are the
simplest of a
PI
broad class of
that come from
p0LYNOMlALS
ORTHOGONAL
Sturm-Liouville
are orthogonal with respect to
eigenvalue problems
various weighted
and

inner products.

Some

2.4.3 Existence and Uniqueness of

Solutions
HomogeneousBoundary Conditions
consider the nonhomogeneous second-order
differential equation with
the homogeneous boundary conditions

Blu=O

-O

(2.35)

Definethe null space of the operator

N(L) = {u I Lu -O,

BILL
-O, B2u=O}

and the null space of the adjoint operator

= {v IL*v -O, Bl*v-O, B2*v


then the following theorem characterizes existence and uniqueness of
solutions to (2.35) (Stakgold, 1998, p. 210-211).

Theorem 2.10 (Alternative theorem). For the boundary-valueproblem


in (2.35), we have the following two alternatives.
(a) Either
N(L) contains only the zero function in which case N(L*) contains
only the zero function and (2.35) has exactly one solution for every

Ordinary Differential
134

Equations

(b) Or
linearly independent functions, in which case
n
contains
N (L)
independent functions

contains n linearly

N(L) = {111,112,

and (2.35)has a

solutionif and only if

is
and the general solution

u(x) = up(x) +

CXkUk(X)

particular solution and


in whichup (x) is any

are arbitrary

scalars.
problems that display the
Next we present two heat-conduction
two
alternatives.

Example 2.11: Steady-state temperature profile with fixed end tem-

peratures
Apply the alternative theorem to the steady-state heat-conduction problem with heat generation (x) and specified end-temperature boundary
conditions

-k

d2T(x)
dX2

What can you conclude about existence and uniqueness of the steadystate temperature profile?
Solution

First it is convenientto make the boundary conditionshomogeneous


by defining

u(x) = T(x) - Toa -x) -

and dividingby the thermal conductivity to give

Bill -O

=O

Spaces and Differential


2.4 Function
Operators

135

= /k and
inwhichf

Blu = u(0)

B2u= u(l)

Nextwe compute N(L). Setting Lu = 0 gives


u(x) = ax + b

Applyingthe boundary conditions gives

Blu

u(0) = b = 0

B2u = u(l) = a = 0

andwe see that u = 0 is the only element of N(L). We can therefore con-

dude that N(L*) also contains only the zero element,and the steadystate temperature profile exists and is unique for any heat-removalrate

Example 2.11 illustrates the first alternative in Theorem 2.10. The


followingexample illustrates the second alternative.
Example2.12: Steady-state temperature profile with insulated ends
Replacethe fixed-temperature boundary conditions in Example2.11
withinsulated-end boundary conditions. What can you conclude about
existenceand uniqueness of the steady-state temperature profile for
these boundary conditions? What is the physical interpretation of the
existencecondition. Why is the solution not unique?
Solution
Theboundary conditions for insulated ends are

are homogeneous,we have


and since the boundary conditions already
BIT = o

B2T= o

in whichf = /k and
d2

BIT = Tx(0)

B2T= Tx(l)

Ordinary Differential

136

Nextwe compute N(L).

Setting LT = 0 gives

boundary conditions gives


as before. Applyingthe

BIT= Tx(0)= a = O B2T= Tx(l) = a -o


b is in N(L). With these boundary
and now we have that T(x)

null space consisting


ditions L has a one-dimensional gives {1}as the of the constant
basis function
function. Normalizingthis element
for
nullspace
N(L) and the one-dimensional
Sincethe problem is self-adjoint, N(L*) is identical to N(L). Applying
the alternativetheorem, we conclude that a steady-state temperature
exists only if
f(x)dx = 0
and the general solution is

T(x) = Tp(x)
where Tpis any particular solution. Since f corresponds to a rateof
heat removal (or addition when f < 0) to the domain, the restrictionon

f providesthe physicallyintuitivefact that if the ends are insulated,

just as much heat must be removed from the domain as is addedfora


steady-statetemperature to exist. For f satisfying this restriction,the
general solution indicates that a constant can be added to anysteadystate solution to provide another steady-state solution.
Nonhomogeneous Boundary Conditions
Nextconsider the nonhomogeneous second-order problem for u (x) on
x e [a, b] with the nonhomogeneous boundary conditions

B2u = Y2

(2.36)

The null spaces of the operator and the adjoint are definedas in the

case with homogeneous boundary conditions

2,

Function

Spa ces

define
WIIe11

and Differential Operators

137

{v IL*v = O, Bl*v-O, B2*v


the adjoint operator, we perform integration by parts
(Lu,v)

(u, L*v) =

by parts, we have that J (u, v) is linear in both


integration
the
d involves lower-order derivativesof u, v evaluated
interval. Setting J(u, v) Ibato zero is what deterboundary functionals
at the the adjoint
, es
Vu such that Blu = 0, B2u = 0
J(u,v)la -o
Vv such that Bl*v= 0, Blv = 0
solvability condition for the nonhomogeneousboundary
the
find
take the difference
we
conditions,
(Lu, Vk) (U,L*Vk) = J(u, 12k)la
element of the null space of the adjoint and u is the
any
is
Vk
which
Then, because Lu = f and VI,= 0, we have
(2.36).
to
solution
(2.37)

Y2,and
J(u, Vk) for u satisfying Blu = Yl and B2u =
Evaluating
O, B2*Vk 0, gives the solvabilityconditions for
Bl*Vk
satisfying
problem. The next example and Exercise 2.40
nonhomogeneous
the
conditions for problems with nonhomogeneous
derivethe solvability

boundaryconditions.

profile with fixed flux


Example2.13: Steady-state temperature
ends with fixed,
consideragain Example 2.12, but replace the insulated
nonzerofluxes at the ends
Tx(x) = Yl X = O
Tx(x) = Y2 x = 1
Forwhatf does the solution exist?
Solution

Thisfullynonhomogeneous problem can be written as

LT=f
BIT = Yl

B2T= Y2

Ordinary Differential

138

and
in whichf = /k
d2
dx2

B2T= Tx(l)

BIT = Tx(0)

unchanged, so the constant function{1}


is
The null space N(L) is
and
function
basis
The problem was shown

and

to be self-adjoint so N (L* ) is one

J(u, v) for this problem.


(x) = 1. Nextwe compute
Integration

by parts gives

(Lu,v)

(u, L*v) = J(u, v) 10

Vkin N(L*), we have


For T satisfying the boundary conditions and

B2T= Tx(l)

BIT = Tx(0)

- dV1
B2*V1
dx

dV1
BI*VI- dx

=O

Substitutingthese into J gives


J(T,

dt'l

VI(1) Tx(l) -VI (0) Tx(0) - dx


10= ...-...y.-.>-...-...-y---.u
1

dV1

dx

Substitutingthis into the solvability condition, (2.37),gives


f(x)dx

= Y2 Yl

and the general solution remains

T(x) = Tp(x) + a
The restrictionon f now stipulates that the net heat generationmust
exactlybalance the heat removed through the two ends. Again,forf
satisfyingthis restriction, a constant can be added to any steady-state
solution to provide another steady-state solution.

2.4 Function Spaces and Differential

Operators
139

u(t)

du(t)
dt
y(t)
j (t) = f (t) + y(t)

f(t)
t

Figure 2.8: Solution to the initial-value


problemwith nonhomogeneous boundary conditions; top
figure shows u(t) with
step introduced at t = 0, and bottom
figure shows re-

sulting du/dt with impulseat t = 0.

NonhomogeneousBoundary Conditions Revisited


Wecan use the delta function and its derivativesintroduced in Section
22.5 to streamline the treatment of the nonhomogeneouscase. Basicallywe replace the nonhomogeneous boundary conditions with homo-

geneousones, but then compensate for this changeby adding appropriate impulsive terms to the forcing term of the differentialequation.
In this way, we have to recall only how to solve problems with homoge-

neous boundary conditions, and we can use Theorem 2.10 to analyze


existence and uniqueness even when a problem has nonhomogeneous

boundaryconditions.
It is perhaps easiest to introduce the approachwith an example.
Let's say we are interested in solving the first-order nonhomogeneous

differentialequation, with forcing term f (t), and nonhomogeneous

Ordinary Differential
140

Equations

condition
boundary (initial)

du
dt
Figure 2.8. Imagine instead that we solve
in
sketched
is
The solution
homogeneous boundary condition u(0-) = O,and
the
with
problem
the
slightly to the left of zero. NowWeWish
0
=
t
at
boundary
we push the
y just after time Oto value u(0)
jump
solution
the
so
to make
with
problem
the
to
nonhomogeneous
solution
that it agrees with the
This idea is also sketched in Figure2.8
boundaryconditionat t = 0.
discontinuouslyby amount y at t = 0, we require
jump
u(t)
make
To
strength y at t = 0, which is y(t). Since
du/dt to have an impulse of
forcing term and chooseit
du/dt = f (t), we introduce a modified
to be

Weconjecture that solving the problem with this modified forcing term
and homogeneousboundary condition should give us the solutionto
the problem with the original f and nonhomogeneous boundary condition. Let's check this conjecture. By inspection, the solution to the
differentialequation is obtained by integration

du
dt
du =
u(t)

t
I o-

u(t) u(0-) =

u(t) =
Note that this solution satisfies
the homogeneous boundary condition
u(0-) = 0 as desired. Now we
substitute the definition of to obtain

2.4 Function Spaces and Differential


Operators

141

the solution of the original problem

u(t) =

yJ
u(t) = y + J

f(T)dT
t

Byinspection, the last equation


is indeed the solution to the original
problemwith forcing term f and
nonhomogeneousboundary condition
We can generalize this approach
to cover any nonhomogeneity in the
boundary conditions by

adding appropriate impulsiveforcing terms


to
the original problem's differential equation.
We revisit Example2.13
to illustrate this technique.

Example2.14: Fixed flux revisited


Rederivethe existence and uniqueness conditions
for Example2.13using the alternative theorem, which applies only to
homogeneousproblems.

Solution
Wereplace the nonhomogeneous boundary conditions of Example 2.13
with the homogeneous version

BIT = Tx(O-) = O
B2T = Tx(l+) = O

In this example we require that TXjump from zero to value at the


left boundary, x = 0. That requires an impulse to be added to f so
that Txxsees an impulse and TXsees a jump at x = 0. We also require
for TXto jump from value to zero as x passes through x = 1 at the
right boundary. We add Y2(x1) to f to cause TXto jump by this
amount. The modified is therefore 5

) (x)

f (x) + Yl(x) Y2(x1)

5Note that if we had nonhomogeneous boundary conditions on T rather than TX,we


would required Tx to have an impulse and Txx to have a doublet,and we would add
Yl'(x) Y2'(x 1) to f.

Ordinary Differential
142

self-adjoint so this is also

Equations

N(L*). The solvability condition applied


to

gives

Y2(xl))dx

(f (x) +
f(x)dx + Yl Y2

The last equation implies the solution exists


f(x)dx

for f satisfying

= Y2 Yl

and the general solution remains

T(x) = Tp(x) + (x
We see that we have reached the same solvability conditionfound
in Example2.13. By introducing f and using homogeneous boundary
conditions, we avoid the additional complication of introducingand
evaluatingJ(u, v) as explained in Section 2.4.3. EvaluatingJ(u, v) is
about the same work as determining the appropriate j. But using delta

functions expands the applicabilityof Theorem 2.10, and allowsthis


one theorem to cover both homogeneous and nonhomogeneous bound-

ary condition cases, which is not an insignificantbenefit.

Example 2.15: Nonhomogeneous boundary-value problem and the


Green's function
The following second-order nonhomogeneous boundary-value problem
arises in solving the transient wave equation for propagation of sound.
We wish to solve the followingBVPfor u(x), x e [0, 1]

2.4 Function Spaces and Differential


Operators

143

in which the second-order differential


k2u, and the two boundary functionalsoperator is Lu = d2u/dx2 The constant k is real and the functionare Blu = u(0), B2u = u(l).
f (x) is an arbitraryforcing
function.
(a) Take the Laplace transform of
the BVPwith the

x variable playing
the role of time. Note that the
value of u(0) and ux (0) shows up
in the transform. Evaluate u(0)
and leave ux (0) as an unknown
constant.

(b) Invert the transform to obtain u(x).


(c) Solvefor ux(()) using the solution in the previouspart and the
other boundary condition. Plug the expressionfor ux(0) back
into your solution to obtain the complete solution to the problem.

(d) Next express the solution as

u(x) =
The function G (x, is known as the Green's function for the nonhomogeneous problem. 6 Write out the Green's function G(x,

for this problem.


(e) Establish that the Green's function G(x, is symmetricfor this
boundary-value problem, i.e., G(x, = G,x).
Hint: you may find the hyperbolic differenceformula useful:

sinh(a b) = sinhacoshb coshasinhb.

Solution

the differential equation gives


(a) Taking the Laplace transform of

s2(s)

su(0) ux(0) - k2(s) = f


(s 2

= f + ux(0)

ux(0)
greater detail in Chapter 3,
6The Green's function concept is explored in
3.3.5.

Section

Ordinary Differential
144

(b) Usingthe

transform pair

and the convolution

u(x)

Equations

Sinhkx

theorem gives

sinh(k(x

ux(0)
k

Sinh loc

solution at x = 1 and solving for the unknown


(c) Evaluatingthe
ux (0) gives

-1

Sinh k

ux(0) sinhk o
Substituting ux (0) into the previous solution gives
u(x)

Sinh kx

k sinhk

(d) Combiningthese two integrals into one gives

u(x) =
with
1

sinh(k(x

kx
sinh(k(l
k Sinh k

Sinh
; ) )

Sinh kx Sinh
k Sinh k

9)

E<x

E)

(e) We work on the first part of G(x, ; ) using the Sinh difference
formula

sinh(a b) = sinhacoshb coshasinhb


Wehave for < x that

Sinhkx
k Sinh k
1

k Sinh k

sinh(k(l 9)

( sinhk sinh(k(x 9) Sinhkx sinh(k(l

2.5 Lyapunov Functions

and Stability

Using the Sinh


difference

145
formula on

the term in

E)) sinhkx

Sinh k(sinh

Canceling the cosh

sinhk

kx Cosh

-- Coshkx

Sinh kx(sinh

k Cosh

terms gives

parentheses

E)) =
Sinh ICE)cosh k Sinh

- E)) sinhkx

- E))=
Sinh k Cosh
kx Sinh ICE+
Sinh kx Cosh k Sinh

Factoring out the Sinh

term and using

the differenceformula

sinhk sinh(k(x ---E))


sinhkx sinh(k(l
E))
Sinh k; (
Sinh

Sinh

Sinh kx cosh k -cosh k Sinh k)


sinh(kx k)

sinh(k(l x))

Substituting this result into the

equationfor G(x, E) gives


sinhkE sinh(k(l x))
k Sinh k

sinhkx sinhk(l
k Sinh k

and we have established that G(x, = G(, x); the Green's


function for this operator is symmetric,a consequenceof the selfadjointness of L in this case.

2.5 Lyapunov Functions and Stability


2.5.1 Types of Stability
Consider a system model of interest to be an autonomous initial-value
problem
dx = f (x) x(0) = xo
(2.38)

dt

We are interested in the behavior of solutions to this system. Since


x) the
the solution depends on the initial condition,we denoteby

Ordinary Differential

146

Stability

Equations

Asymptotic Stability
(5

(5

x(t)

x(t)
stability (left) and asymptotic stability
Figure 2.9: Solution behavior;
(right).

time t O,which has valuex at


solution to the initial-value problem at
problem aboveis given
time t = 0. So the solution to the initial-value
in the solution as wevary
by ((t; xo), t 0. But we are also interested

the initial value x. Steady-statesolutions to the model, if anyexist,


satisfy
f(xs) = o

We can always shift a steady state of interest to the origin by defining


a new coordinate, R = x xs, and f (R) = f (R + xs) so that

di dx = f (x) = f (R + xs)
dt dt
dt
So we assume without loss of generality that xs = 0, i.e., the originis
the steady state of interest. Unlike a linear system, when dealingwith
a nonlinear system, stability depends on the solution of interest,and
we may have some solutions that are stable, while others are unstable.
For a given linear system, the stability of all solutions are identical,and
to reflect this special situation, we often refer to stability of the system,
rather than stability of a solution.
There are several aspects to stability, and we define these next.The
first most basic characteristic of interest is whether a small perturbation to x away from the steady-state solution results in a smallsubsecomquent deviation for all future times. The general term stabilityis

monly reserved for this most basic notion; we use the more precise

2.5 Lyapunov

Functions and Stability

147

term Lyapunov stability or stable in the sense of Lyapunov if we need


to ensure that there is no confusion. The definition is as follows.
Definition2.16 ((Lyapunov)Stability). The origin is (Lyapunov)stable
if for every > 0, there exists (5 > 0 such that Ilxll (5 implies
c for all t 0.
The stability concept is illustrated on the left side of Figure 2.9. A
solutionthat is not stable is termed UNSTABLE.
The next characteristic
whether
small perturbations to the initial state die away
of interest is
as time increases. The idea here is whether the origin attracts solutions
starting nearby.
Definition 2.17 (Attractivity). The origin is attractive if there exists
Osuch that Ilxll (5implies that

lim

t 00

>

=0

Asymptoticstability is then the combinationof these two properties.

Definition 2.18 (Asymptotic stability). The origin is asymptotically stable if it is (i) stable and (ii) attractive.

The right side of Figure 2.9 shows a representative solution trajectory when the origin is asymptotically stable.7 One might wonder

whyLyapunov stability is a requirement of asymptotic stability, or even


whether the origin can be attractive, and not Lyapunov stable. The an-

sweris yes, the origin in a nonlinear system may be globally attractive and still not Lyapunov stable. The problem with these systems is
that there exist starting points, arbitrarily close to the origin, for which
the resulting trajectories become large before they asymptotically approach zero as time tends to infinity. Becausewe cannot bound how
largethe solution transient becomes by constraining the size of its initialvalue,we classify the origin as unstable.8 Note that the system must
7Asymptotic stability is probably the most common notion of stability that people
havein mind, and sometimes it is referred to simply as stability. Of course, this usage
may cause confusion because now the term stability is being used in two ways: as
Lyapunovstability and as asymptotic stability; and one is a subset of the other.
80ne is obviously free to define words as one pleases, but defining asymptotic stability in this way precludes a possible solution behavior that is not expected of "nice"
or "stable" solutions. Regardless of terminology, the important point is to be aware
that solutions can be globally attractive and not Lyapunov stable.

Ordinary Differential

148

be nonlinear for a solution to be attractive and unstable. Forlin


tems, attractivity and asymptotic stability are identical; see earsys

Exercise

2.60.

A stronger form of asymptotic stability known as exponential


bility is often useful, especially when dealing with linear dynamics.
It

Definition 2.19 (Exponentialstability). The origin is exponentially


ble if there exists > 0 such that llxll (5implies that there sta.
exist
c, A > 0 for which
c llxll e

for all t

We leave it as an exercise for the reader to show that the


definition
of exponential stability implies also Lyapunov stability.
2.5.2 Lyapunov Functions

Now we consider a scalar function of x, denoted V(x), whosechar.


acteristics are going to enable us to analyze the stability of the origin
without requiring us to first solve completely the model = f (x). The
motivation for this class of functions is the role that mechanical
energy plays in a mechanical system. Consider mechanicalenergytobe
the sum of kinetic and potential energies, T and K, and let total energy
be the sum of mechanical energy and internal energy
If we start an isolated mechanical system, such as the particle on a track

depicted in Figure 2.10, at some system temperature with someinitial


kinetic and potential energies, and monitor the mechanicalenergywith
time, we observe that although the total energy E is conserved,the
mechanical energy EMsteadily drops as some of that form of energyis
converted into heat by friction.9 The temperature of the system slowly
increases due to the conversion of energy into heat, and the internal
energy U of the system increases to maintain the total energyconstant.
If we define the height of the track at its lowest point as h = 0, wethen
have EM= (1/2)mv 2 + mgh, and since h 0, m > 0, and v 2 0,we
have that EM 0. The mechanical energy is therefore a scalarfunction
satisfying
EMO
9This conversion of mechanical energy into heat is what causes the system's entropy
to increase.

2.5 Lyapunov Functions and Stability


149

Figure 2.10: A simple mechanical


system with total energy

E, internal energy U, kinetic energy


T = (1/2)mv 2, and potential energy K = mgh.
The mechanicalenergyis

EM = T + K, and the total energy


is E = EM+ U.

BecauseEMdecreases with time and is bounded belowby zero, we


expect that its only possible steady state is EM= 0, and EM= 0 implies

both v = 0 and h = 0. So by analyzingthe energyfunctionEMin this

fashion, we conclude that the marble at rest at the bottom of the track
is an asymptotically stable steady state, and we do not have to solve
the complicated equations of motion of the system to deduce this fact.
Wewish to generalize this concept, and the key idea is to define V(x)
Ro,with a negative time
to be a nonnegative scalar function V :
0. To compute the time derivativeof V(x(t)), we
derivative (x(t))
apply the chain rule giving10

V T dx
x) dt

f(x)

(2.39)

the concept of a Lyapunov


This generalization of mechanical energy is
definitionis as follows.
function for the system = f (x). A precise
Consider a compact (closed and
Definition 2.20 (Lyapunov function).
its interior and let funcin
origin
the
containing
bounded) set D C [Ril

Some
derivatives with respect to vectors.
for
notations
=
(x)
various
or
IOSeeAppendix A for
form (x) = VV
the
in
equation
this
readers may be more familiar with

(W)T or (x) = i.

Ordinary Differential

150

tion V : [Vt

Equations

be continuously differentiable and satisfyll


[R()

V(0) = 0 and V(x) > 0 for x e D \ 0


V'(x) 0 for x e D

(2.40)
(2.41)

system = f (x).
Then V( ) is a Lyapunov function for the
The big payoff for having a Lyapunov function for a system is the
mediate stability analysis that it provides. We present next a fewrepres
sentative theorems stating these results. Wemainly follow Khalil(2002)

in the followingpresentation, and the interested reader may wishto


consult that reference for further results on Lyapunovfunctionsand

stability theory. Werequire two fundamental results from real analysis


to prove the Lyapunov stability theorems. The first concerns a nonin_

creasing function of time that is bounded below, which is a Property


we shall establish for V(x(t)) considered as a function of time. One
of the fundamental results from real analysis is that such a function
convergesas time tends to infinity (Bartle and Sherbert, 2000,Theorems 3.32 and 4.3.11). The second result is that a continuous function
defined on a compact (closed and bounded) set achieves its minimum
and maximum values on the set. For scalar functions, i.e., f : R
this "extreme-value"or "maximum-minimum" theorem is a fundamental result in real analysis (Bartle and Sherbert, 2000, p. 130), and is
often associated with Weierstrass or Bolzano. The result also holdsfor
multivariate functions like the Lyapunov function V : Rn Ro,which
we require here, and is a highly useful tool in optimization theory (Mangasarian, 1994, p. 198) (Polak, 1997, Corollary 5.1.25) (Rockafellar and
Wets, 1998, p. 11) (Rawlings and Mayne, 2009, Proposition A.7).

is called
\0ththisptoperty

to indicate tha
SET,

Nextno

Theorem 2.21 (Lyapunov stability). Let V( ) be a Lyapunov function


for the system = f (x). Then the origin is (Lyapunov)stable.

Proof. Given > 0 choose r e (0, E] such that

{X e

I llxll

an INV;

0. Therefore we

Therefore,
hiRlonllt0

gD

The symbol Br denotes a BALLof radius r. Such an r > 0 exists since


D contains the origin in its interior, The sets D and Br are depictedin
Figure 2.11. Define by

min

xeD,llxllr

V(x)

11Fortwo sets A and B, the notation A \ B is defined to be the elements of A that

are not elements of B, or, equivalently, the elements of A remaining after removing the
elements of B.

origin
ts

otiCQtl

151

2.5 Lyapunov Functions and Stability

VCO
Figure 2.11: The origin and sets D, Br, V (shaded), and B.

Notethat cxis well defined because it is the minimization of a continuous function on a compact set, and > 0 because of (2.40). Choose
e (0, a) and consider the sublevel set

Notethat, as shown in Figure 2.11, sublevel sets do not need to be

connected.Regardless, we can readily establish that Vis contained in


the interior of Br as follows. A point p not in the interior of Br has
llpll r and therefore satisfies V(p)
due to a's definition, and is
thereforenot in the set Vsince < a. Noticealso that any solution
startingin Vremains in Vfor all t - 0, which follows from (2.41)

since(x(t))

0 implies that V(x(t))

V(x(0))

for all t

0.

A set with this property is called an INVARIANT


SET,or sometimes a
POSITIVE
INVARIANT
SET,to indicate that the set is invariant for time
runningin the positivedirection. Next notice that Vcontains the origin
in its interior since LB> 0. Therefore we can choose (5> 0 such that the
ball Bis contained in V. Therefore, if we choose initial x e B, we
havefor all t 0

Ilxll

xeV

and Lyapunovstability is established.


Theorem 2.22 (Asymptotic stability). Let V( ) be a Lyapunov function
for the system

= f (x). Moreover, let V ( ) satisfy

v (x) < 0 forxeD\0


Thenthe origin is asymptotically stable.

(2.42)

Ordinary Differential

Equations

152

is stable from the previous


origin
the
concludethat stability we need to show only that the origin
we
proof.
asymptotic continuous and vanishes only at zero,
prove
it is
to
so
V(.) is
t
00
as
Since
zero
to
goes
is attractive.establishthat
as in the proof of Lyapun0Vstability
sufficientto
choose
we
n.
is a
satisfyingllxll
Ofor all x(t),
g D. Since it is bounded below by zero. Therefore it COnVerges
and
tion of time,
converges to zero. Assume the contrary,that
it
that
Weneed to show
c > 0, and we establish a contradiction.

to some
V(x(t)) converges
{x I V(x) = c}. This level set does not
=
Vc
set
level
Considerthe
choose d > 0 such that maXllxllsdV(x) c
can
we
so
tain the origin, nonincreasingand approaches c as t
00,wehave
is
V(x(t))
Since
Bd for all t 0. Next define y as

that x(t) is outside

max V(x)

because V(x) is continuous due to (2.39)


Notethat y is well defined
f (x) are continuous. We knowy 0
and the fact that V(x)/x and
due to (2.42).Therefore

V(x(t)) =

+J

V(x(0)) yt

Theright-handside becomes negative for finite t for any x(0) e Bn,


whichcontradictsnonnegativity of V(), and we conclude c = 0 and
0, as t 00.
V(x(t)) 0, and hence x(t)
Underthe stronger assumption of Theorem 2.22, i.e., (2.42),estab-

lishingcontinuityof the solution c(t;x) in t for all t

0 andallx

in a level set Vcontained in Bn also implies that the level set Vis
connected.This followsbecause every point x e Vis then connected

to the origin by a continuous curve 4)(t; x) that remains in the positive


invariant set Vfor all t 0.

Nextwe consider a further strengthening of the properties of the


Lyapunovfunction to ensure exponential stability. We have the follow-

ing result.

Theorem 2.23 (Exponential stability).

function

for thesystem = f (x). Moreover, Let V ( ) be a Lyapunov


let V ( ) satisfy for all x e D
allxll V (x)
(2.43)
b llxll
(2.44)
v (x)
c llxll
for some a, b, c, >
0. Then the origin is
exponentially stable.

yapunov
L

Fu nctions

and Stability

153

arbitrary r > 0 and define function () by (r) =


consider an have that () is positive definite
We
and (0) = O.

proof,

r > 0 small enough so that V(r)g D. Such an r ex-

continuouS and V(O) = 0. We know that trajectories


. ce fl.) is
te

ding inequality on V(-) implies that llxll V(x)/b, which


gives
the bound on the time derivative of

scalar time function v (t) = V(x(t)) satisfies the ODE


the
that
Notice
v(t)
Translating this
_ (c/)v and therefore
v
to V( ) gives
for all t
statementback
the lower-bounding inequality for V( ) gives

-(c/b)t . Using the upper-bounding inequality


for all x e V and all t 0
gives
again
llxll e (c/(b))t

Osuch that the ball B is contained in Vas shown


Wecanchoose(5>

We then have that for all llxll


inFigure2.11.

< c llxll eAt for all t

inwhichc

(b/a)1/ > 0 and

of the origin is established.


stability

= c/(b) > 0, and exponential

2.53 Applicationto Linear Systems


function analysis of stability can of course be applied to linLyapunov
earsystems,but this is mainly for illustrative purposes. Wehave many
waysto analyze stability of linear systems because we have the analytical
solutionavailable. The true value of Lyapunov functions lies in
analysis
of nonlinear systems, for which we have few general purpose
alternatives.
To build up some expertise in using Lyapunovfunctions,
weconsideragain the linear continuous time differential
equation

dx
dt

= Ax

x(0) = xo

(2.45)

inwhichx e
and A e Rnxn. We have already discussed in Section22.2the stability
of this system and shown that x(t) = 0 is an

154

Ordinary Differential

Equations

asymptoticallystable steady state if and only if Re(eig(A))

all eigenvalues of A have strictly negative real parts. Let's


see howwe'
construct a Lyapunov function for this system. Consider as
a candidate
in which S e [Rnxnis positive definite, denoted S > O.With
this

Ro, which is the first requirement, i.e., choice


V(0) 0

we have that V : Rit

and V(x) > 0 for x 0. Wewish to evaluate the evolution


of
with time as x evolves according to (2.45). Taking the time
derivative
of V gives

d
d
V(x(t)) = x Tsx
dt

dt

dt

+ XTs g-E

dt

= x TATsx + x TSAx

dt

= x T(ATs + SA)x

and the initial condition is V(0) = xo Sxo. One means to


ensurethat
V (x (t) ) is decreasing with time when x

0 is to enforce that the matrix


ATS + SA is negative definite. We choose some positive
definite
Q > 0 and attempt to find a positive definite S that satisfies matrix

A TS + SA = -Q

(2.46)

so that

V = xTQx
dt

Equation (2.46)is known as the matrix Lyapunovequation. It saysthat


given a Q > 0, if we can find a positive definite solution S > 0 of (2.46),
then V(x) = x TSx is a Lyapunov function for linear system (2.45),and
the steady-state solution x = 0 is asymptotically (in fact, exponentially)
stable. This requirement can be shown to be also necessary for the sys-

tem to be asymptotically (exponentially) stable, which we verify shortly.

We seem to have exactly characterized the stability of the linear system


(2.45)without any reference to the eigenvalues of matrix A. Of course,
since the condition on the eigenvalues as well as the conditionon the
matrix Lyapunovequation are both necessary and sufficientconditionS
for asymptotic stability, they must be equivalent. Indeed, we havethe
following result stating this equivalence.

Functions and Stability


2.5 Lyapunov

155

(Lyapunov function for linear systems).


The following
Theorem 2.24
equivalent
(Sontag, 1998, p. 231).
statementsare
stable, i.e., Re(eig(A)) < 0.
(a) A is asymptotically
e [Ritxn, there is a unique solution S of the matrix Lya(b) For each Q
punov equation

ATS +

= -Q

andif Q > 0 thenS > 0.


(c) Thereis someS > 0 such thatATS + SA < 0.
(d) There is some S > 0 such that V(x) = xTSx is a Lyapunov function
for the system = Ax.
Exercise2.62 asks you to establish the equivalenceof (a) and (b).
2.5.4 Discrete Time Systems
Nextwe consider discrete time systems modeled by
x(k + 1) = f(x(k))

x(0) = xo

To streamline
in whichthe sample time k is an integer k = 0, 1,2
the presentation we assume throughout that f ( ) is continuous on its
domainof definition. Steady states are now given by solutions to the
equationxs = f(xs), and we again assumewithout loss of generality
thatf (0) = 0 so that the origin is a steady state of the discrete time
model.Discrete time models arise when timeis discretized, as in digital
controlsystems for chemicalplants. But discrete time models also
arisewhen representing the behavior of an iterative algorithm, such as
the Newton-Raphsonmethod for solvingnonlinear algebraic equations
discussedin Chapter 1. In these cases, the integer k represents the
algorithmiteration number rather than time. Wecompress notation by
definingthe superscript + operator to denote the variable at the next

sample time (or iteration), giving

x + = f (x)

x(0) = xo

(2.47)

Noticethat this notation also emphasizes the similaritywith the continuoustime model = f (x) in (2.38).Weagain denote solutions to
(2.47)by
with k 0 that start at state x at k = 0. The discrete
timedefinitionsof stability, attractivity, and asymptotic stability of the
Originare then identical to their continuous time counterparts given in

156

Ordinary Differential

Equations

Definitions 2.16, 2.17, and 2.18, respectively, with integer


ing real-valued time t 0. In discrete time, the definitio k

stability is modified slightly to the following.

Definition 2.25 (Exponential stability (discrete time)).


The origin
is

there exist c > 0, e (0, 1) for which

(5implies
that

K c llxllk for all k()


We see that Akwith

< 1 is the characteristic rate of

solutiondecay

Lyapunov functions. The main difference in constructing


Lyapunov
functions for discrete time systems compared to those
for
continuous
time systems is that we compare the value of V at two
ple times, i.e., V(x(k + 1)) V(x(k)). If this change successivesamis negative,then
we have the analogous behavior in discrete time that
we
negative in continuous time, i.e., V(x(k)) is decreasing have when is
along the solution x(k). We define the AV notation when evaluated

AV(x) =

- V(x) = V(x+) - V(x)

to denote the change in V starting at state x and


proceedingto successor state x + = f (x). Another significant change is
that we do not
require differentiability of the Lyapunov function V( )
in discrete time
since we do not require the chain rule to compute the time
derivative.
We do
require continuity of V( ) at the origin, however. For
consistency

with the earlier continuous time results, we assume here


that VG)is
continuous everywhere on its domain of definition.12The definition
of
the (continuous) Lyapunov function for discrete time is as follows.
Definition 2.26 (Lyapunov function (discrete time)). Consider a compact (closed and bounded) set D c
containing the origin in its interior and let V :
- R0 be continuous on D and satisfy
V (0)

0 and V (x) > 0 for x e D \ 0

AV(x) < O for x e D

(2.48)
(2.49)

Then V(-) is a Lyapunov function for the system x + = f (x).


12For those needing discontinuous V( ) for discrete time systems, see Rawlingsand

Mayne (2009, Appendix B)for the required extension.

2.5 Lyapunov

Functions and Stability

157

Noticethat AV(x) also is continuous on its

domain of definition
sinceboth V( ) and f ( ) are assumed continuous.
Theorem 2.27 (Lyapunov stability (discrete time)).
Let V(-) be a Lyapunov function for the system x + = f (x). Then the
origin is (Lyapunov)

stable.

Theorem 2.28 (Asymptotic stability (discrete time)). Let


V( ) be a Lyapunovfunction for the system x + = f (x). Moreover,let V (0 satisfy

AV(x) < O forxeD\O


Then the origin is asymptotically stable.

(2.50)

Theorem 2.29 (Exponential stability (discrete time)). Let V G) be a Lyapunovfunction for the system x + = f (x). Moreover,let V ( ) satisfy for

allx e D

a llxll V (x)
AV(x)

b llxII

(2.51)

c llxll

(2.52)

for some a, b, c, (T> 0. Then the origin is exponentiallystable.

The proofs of Theorems 2.27, 2.28, and 2.29 are essentially identicalto their continuous time counterparts, Theorems 2.21, 2.22, and
2.23,respectively, with integer k replacing real t and the difference AV
replacing the derivative V. An essential difference between continuous and discrete time cases is that the solution of the continuous time
is continuous in t, and the solution of the discrete time
model
model 4(k; x) has no continuity with index k since k takes on discrete
values.Notice that in the proofs of the continuous time results, we did
not follow the common practice of appealing to continuity of (t,x) in
t, so the supplied arguments are valid for both continuous and discrete
cases.

Linear systems.

The time-invariant discrete time linear model is


x + = Ax

x(0) = xo

and in analogy with the continuous time development, we try to find a


Lyapunovfunction of the form V(x) = xTSx for some positive definite
matrixS > 0. Computing the change in the Lyapunovfunction at state

x gives

AV(x) = V(x+) - V(x) =


= x T(ATsA -

- XTsx

Ordinary Differential

158

Q>
Choosinga positive definite

0, if we can find S > 0 that

Equations

satisfies

AT SA-S= -Q
(2.53)

finding a V( ) with the desired


propemes.
xTQx
0
AV(X)
for
and
0
all
x. Equation
V(x) = x TSx
Lyapunov
matrix
equation.
discrete
(2.53)is known as the
Exercise2.63
of
version
Theorem
time
discrete
the
2.24,listing
asks you to state
the
the
discrete
of
solution
Lyapunovequation
connectionsbetween the
and the eigenvalues of A. These connections often come in handywhen
analyzing the stability of discrete linear systems.

then we have succeeded in

2.6 Asymptotic Analysis and Perturbation Methods


2.6.1 Introduction

Typicalmathematicalmodels have a number of explicitparameters.


Often we are interested in how the solution to a problem depends ona
certain parameter. Asymptotic analysis is the branch of appliedmathematics that deals with the construction of precise approximatesolu-

tions to problems in asymptotic cases, i.e., when a parameterof the

problem is large or small. In chemical engineering problems, smallparameters often arise as ratios of time or length scales. Importantlim-

iting cases arise for example in the limits of large or smallReynolds,


Pclet,or Damkhlernumbers. In many cases, an analyticalsolution
can be found, even if the problem is nonlinear. In others, the scaling
behavior of the solution (e.g.,the correct exponent for the power-law
dependenceof one quantity on another) can be found withouteven
solving the problem. In still others, the asymptotic analysisyieldsan
equation that must be solved numerically, but is much less complicated
than the original model. The goal here is to provide a backgroundon

the basic concepts and techniques of asymptotic analysis,beginning


with some notation and basic ideas about series approximations.

2.6.2 Series Approximations: Convergence, Asymptoticness,Uniformity


As this section deals extensively with how one function approximateS
another, we begin by introducing symbols that describe degrees of identification between different functions.

AsymptoticAnalysis and Perturbation Methods

159

a is equal to b

1, a is asymptotically equal to b (in some given/implied limit)


a is approximately equal to b (in any useful sense)

a b a is proportionalto b
to note that implies a limit process, while does
It is important
not. In this section we will be carefulto use the symbol'" in the
precisemanner defined here, though one must be aware that it often

meansdifferent things in different contexts (and different parts of this


which
book). Closely related to these symbols are ORDERSYMBOLS,
a qualitative description of the relationships between functions
in limitingcases. Consider a function f (E) whose behavior we wish to
(e). The
FUNCTION)
describerelative to another function (a GAUGE
order symbols "O", "o" and "ord" describe the relationships

f() =

as

if lim < 00

f (e) =

as 0

if lim

f(e) =

as 0 if f (E) =

but not

In the latter case, f is said to be strictly order (5.Often, authors write


"f() (5()"to mean "f (e) = O(())", though the latter only implies
equalityto within a multiplicative constant as 0, while as defined
here the former implies equality.
Asymptoticapproximations take the form of series, the most familiar of which is the truncated Taylor series approximation that forms
the basis of many engineering approximations. An infinite series

f (x) =

n=0

fn(x)

CONVERGES
at a particular value of x if and only if, for every e > 0,

there exists No such that

E fn(x) < E for


In contrast, an ASYMPTOTICSERIES

n=0

Ordinary Differential

160

Equations

satisfies

= 0 for each M N

fM(c)

the last term


In words, the remainder is much smaller than
kept. This
of
asymptotic
usefulness
the
of
series. If this
property is the source
property is satisfied, we write

f (E) 's'

n=0

fn() as c 0

In general, we do not care whether the series converges if we let N

Often it does not. The important point is that the finitesumoftenthe


first term or twoprovidesa useful approximationto a functionfor
small E. This is in stark contrast to convergentinfinite series,which,
although they converge,often require a large number of termsto be
evaluated to obtain a reasonably accurate approximation.
We typicallyconstruct asymptotic series in this form
(2.54)

n
where

(50()> (51(E) > (52()>


(e) =
for small e. We also require that
as E --40. In practice, the s are not generally known a priori, but must be determinedas
part of the solution procedure to satisfy the requirement that the coefficients an be ord(l). This procedure is best illustrated by exampleas
we do in several instances below. In principle, we can construct a series approximation with N as large as we like, as long as the an remain
(e) < N(e) at the value of of interest. However,the
ord(l) and
most common application of asymptotic analysis is to the construction
of a one- or two-term approximation that captures the most important
behavior as
0.
As an example of the difference between convergent and asymptotic
series, we look at the error function erf (z), written here as

erf (x) = I

e t2dt

ByTaylor expandingthe integrand around the origin and integrating


term by term, a power series convergent for all x can be constructed

Analysis and Perturbation Methods


Asymptotic
2.6
for this

161

function

erf (x) =

(1)nx2n+1

O (211+ l)n!

COnvergent,this expression may require many terms for reaAlthough


to be obtained, especially when x is large. One could
sonableaccuracy and Taylor expanding
e w around w 0. This
w = 1/ x
try setting
immediate difficulty because
leads to
lim
e -1/w2
tvodw n

expansion is identically zero! This difficulty arises


for all n; his Taylor
decays to zero faster than any negative power of x as
because e -x2
00.

On the other hand, for x > 1, an asymptotic series for the function
maybe constructed by repeated integration by parts (a common trick
for the asymptotic approximation of integrals). This approximation is

erf (x) = 1

e t2dt

2
e x

e x

1
e

det2

2x
2

x 2

x 2

e -t2 dt

4X3

erf (x)

x2

dt

1-3 + 0(x -6 )

2x2 (2x2)2
If continuedindefinitely, this series would diverge. The truncated series, however, is useful. In particular, the "leading order" term 1 e

the expression that includes the first correction for finite but large x,
preciselyindicates the behavior of erf (x) for large values of x. Furthermore,the truncated series can be used to provide accurate numerical
valuesand is the basis of modern algorithms for doing so (Cody, 1969).
Nowconsider a function f of E and some other parameter or vari-

able,x

f (X,) Eann(e)

Ordinary Differentia/
162

is asymptotic as 0 for each fixed x, then


approximation
Now consider the particular
If the
ASYMPTOTIC.
case
POINTWISE
say it is

asymptotic, but for fixed c, the second term blowsup


pointwise
is
This
it cannot remain much smaller than thefirst
obviously,
So
0.
-4
as x
requirement for asymptoticness. The approximanon
term, which is our
VALID.Put another way
is not UNIFORMLY
lim lim e/v/ lim lim e/v'
u (x, c) CONVERGESUNIFORMLY
to u (x, 0) on
To be precise, a function
a
is
D > 0 such that
x e [0, a], if, given E > 0, there

the interval

lu(x, E) ou(x,
Nonuniformity

< E, for E < D and allx e [0,a]

is a feature of many

practical singular perturbation


prob-

lems. A major challengeof asymptotic analysis is the constructionof


VALIDapproximations. We shall see a number of techUNIFORMLY
niques for doing this. They all have a general structure that lookslike
this

2.6B Scaling,and Regular and Singular Perturbations


Beforeproceeding to discuss perturbation methods for differential equations, we introduce some important concepts in the context of algebraic

equations.First, consider the quadratic equation

x 2 + exI = 0, e < 1

(2.55)

If = 0,x = 1. Wewould like to characterize how these solutionsare


perturbedwhen 0 < e < 1. To to so, we posit a solution of theform

(2.54)

x(e) = oxo + l (E)XI +

+ o(2)

(2.56)

whereXi = ord(l) (independent of e) and the functional forms of (E)


and

(e) remain

to be determined.

Substituting

neglectingthe small o(2) terms yields


(E)XI

into the quadratic and

(E)XI +2

= 0 (2.57)

2.6AsymptoticAnalysis and Perturbation


Methods

163

0, the solution is x = xo =
1. Sowe let (50
consider
the
root
1 and for the
xo
= 1. Now (2.57)
moment

At

becomes

= o (2.58)
observe that all but the first two terms are
0() or o(l). Neglecting
these,we would find that
(2.59)

SinceXl is independent of c, we set (51 e, in which


case Xl 1
becomes
Now (2.58)

+ 2(52x2+

=0

(2.60)

Now,since (52 o(2), we neglect the term containingit to

get

4 + 22X2= O

for which there is a solution if (52(E) = 2 and x2 =


constructedan asymptotic approximation

(2.61)

Thus we have

(2.62)

Observe that to determine (5| (e) and (52(E), we have found a DOMINANT

a self-consistent choice of k(e), where it is comparablein


BALANCE:

size to the largest term not containinga (5kand where all the terms
containingkS and es are smaller as e 0.
To find how the second root xo = 1depends on e, we use the
lessonslearned in the previous paragraphto streamlinethe solution
process. That analysis suggests that (5k(e) = Ek so we seek a solution
X = I + EXI + E2X2 + O (E3)

whichupon substitution into (2.55)yields


E

26X 1 + E 2Xl + E2X12

2E2X2 + O (E3) = O

Sinceby assumption the Xk are independent of E,this expression can


onlyhold in general if it holds power by power in e. Wehave already
zeroed out the co term by setting xo = 1.The El and 2 terms yield
1.
-2X1 -1=0
2.
2X2 Xl + Xl = O

Ordinary Differential Equations


164

these equations: the equationfor


between
= _1
coupling
to these are
solutions
one-way
The
Thereis
with 1 < k.
on
depends
root is
so the second

(2.62) and
solutions
0 both

(2.63)

(2.63) reduce to the solutions

REGULAR
PERTURBAT10N
this are called
as
such
when E 0. Cases
when c O is qualitatively
interesting
problems.
more
much
SINGULAR
PERTURBAThe situation is
like this are called
Cases
1.
differentfrom
Consider the equation
problems.
TION
In the limit c

(2.64)

1, while for any


exact solution x
unique
the
has
this
When = 0,
problem is singular because the
This
solutions.
two
0, it has
the highest power in the equation - when
multiplies
parameter
small
polynomial becomes lower degree so it has
the parameteris zero the this problem, we define a scaled variable
one fewerroot. To analyze
X = x/ where (5= (50and

X xo +

+ -X2= ord(l)

Thus measures the size of x as

0. Substitution into (2.64)yields

2x 2 + X-1

=O

Nowwe examine the possibility of finding a dominant balance between

differentterms with various guesses for (5. If we let = 1, then the


secondand third of these terms balance as E 0 while the first is
small. This scalinggives the root x = 1 + O(E). If we let (5 = 0(1)
then the first and second terms are small, while the third term is still
ord(l). Thereis no balance of terms for this scaling. On the other hand
if we let (5=
then we can get the first and second terms to balance.

Applying this scaling yields

whichclearlyhas the solution X = 1 + O(E)


or x -1 + 0(1). As
--0 the root goes off to infinity.
Although the first term in (2.64)
containsthe smallparameter, it can
multiply a large number so that

2.6 Asymptotic Analysis and

Perturbation Methods

165

the term overall is not small. This


characteristic is typical of
singular
subtle
more
A
singular perturbation
problem is
(1 e)x 2 2x
+1=0
(2.65)

When = 0 this has double root x = 1.


solution whereas when c > 0 there are When < 0 there are no real
two. Clearly(50= 0, xo = 1 so
solution
a
seek
we
X = 1 + (51X 1 +
(52X2 + O ( (52)

Substitution into (2.65) gives


+ 212XIX2 + (53X3
(5?iX12

2e1X1

22X2 + ...

Since1 > (51> (52,we can conclude that (5?1x12


and c are the largest
(dominant) terms. These balance if (52= 0(). Thus we set (51
which implies that Xl = 1. So the solutions are
x =1+

1/ 2 +0(2)

(2.66)

As an exercise, find that (52= and that the solutions to (2.65)can be


written as an asymptotic series in powers of 61/2

2.6.4 Regular Perturbation Analysis of an ODE


One attractive feature of perturbation methods is their capacity to provide analytical, albeit approximate solutions to complexproblems. For
regular perturbation problems, the approach is rather straightforward.

As an illustration we consider the problem of second-orderreaction


occurring in a spherical catalyst pellet, which we can model at steady
state by
1 d 2dc
2
(2.67)

r 2dr

dr

with c 1 at r = 1 and c bounded at the origin. If D,R, k, and CB


are the diffusivity, particle radius, rate constant, and dimensional sur-

2/D is the DAMKHLER


face concentration respectively, then Da kCBR
NUMBER.
The problem is nonlinear, so a simple analytical solution is
unavailable. An approximate solution for Da < 1 can be constructed,

however,using a regular perturbation approach.

Ordinary Differential Equations

166

co + ecl + dc 2
of the form c(r)
+
solution
a
seek
like
powers
yields
equating
Let c = Da. We
into (2.67) and
Substituting
O (3).

1 d 2dco

r 2dr dr
1 d 2dC1
r 2dr dr
1 d 2dC2r 2dr dr

-1
co(l)

co, Cl(l) = O

2c1c(), c2(1) = O

order has the same operator but difeach


at
solution
the
Observethat
solution at lower order. This Structureis
ferent "forcing"from the
problems. The solution at 0 is trivial:
perturbation
regular
of
typical

we have
co = 1 for all r. At 1,
1 d 2dC1

r2dr dr

(r 2 1) /6. The solution to the


The solution to this equation is Although this problem is nonlinear,
2problem is left to Exercise2.66.
a simple approximate closedthe regular perturbation method provides
form solution.

2.6.5 MatchedAsymptoticExpansions
The regular perturbation approach above provided an approximatesolution for Da < 1. We can also pursue a perturbation solution in the
opposite limit, Da > 1. Now letting E = Da-l we have

I d 2dc

r2dr dr

(2.68)

If we naively seek a regular perturbation solution c = co + ECI+ O (62),

the leading-order equation will be

This has solution co = 0, which satisfies the boundedness condition


at r = 0 and makes physical sense for the interior of the domainbecause when Da > l, reaction is fast compared to diffusion so we expect the concentrationin the particle to be very small. On the other
hand, this solution cannot be complete, as it cannot satisfy the boundary conditionc = I at r = 1. The inability of the solution to satisfy
the boundary condition arises from the fact that the small parameter

and Perturbation Methods


AsymptoticAnalysis

167

the highest derivativein the equation. It is thus absent


c
leading-order problem, so the arbitrary constants required to
the
from
b oundary conditions are not available.
the
satisfy resolution to this issue lies in proper scaling. Although is
The
multiplies a second derivative. If the gradient of the solution
it
small,
some region, then the product of the small parameter and
in
large
is
may result in a term that is not small. In the present case,
multiplies

largegradient
physical intuition to guess where the gradients are large. At
use
can
we
reaction occurs rapidly, so we expect the concentration to
highDa the
most of the catalyst particle. Near r = 1, however, reactant
be smallin
1 the
from the surroundings and indeed right at r

is diffusingin must be unity. Thus we define a new spatial variable


concentration
determined.
(1 r) /(c) where is a length scale that is yet to be
variable to (2.68)yields
Applyingthis change of
2

4 2 (1 r7)2 dn

Thefirst term contains 6<-2. If we take

(2.69)

= 61/2then this term is ord(l)

0 and can balance the term c2 to yield a nontrivial solution.


as
Thisscalingimplies that near r = 1 the steepness of the concentration
-1 / 2 . Proceeding with this scaling, (2.69)
gradientscales as
becomes

(1 E l /2 17)2 dc

(2.70)

=
Nowwe seek a perturbation solution of this rescaled problem: c (n)
co + 1/ 2c1 + O (e). The choice of 1/ 2 comes from the observation

the Taylor expansion of (1


the leading-order problem

El /217) = 1
d2 co

dn2

that

261/97 + O(e). This gives

(2.71)

co

Althoughthis equation is nonlinear, it has a special form that facilitates


solution.13Let w = c' where ' denotes d/dn. Nowwe can write
w' = co
with

13

co = w

13

If we had considered first-order kinetics instead, the solution would be simple.

Ordinary Differentia/

168

As

has the special property that


system
this
constructed,

dil
dn

fl , H
co
co = w

H
cotv

13 = K, where K
{c
'2
=
H
of
is a constant
Therefore, curves
large, i.e., at positions much
, are
larger
solutions. As becomes
than
expect
the concentration
interface, we
distance1/2 from the
andits
take K = 0, so
gradient to go to zero, we
2 3/2
chosen so that co decays with increasing
The negativesign must be
This equationcan be integrated and the boundary conditionco(rl
0) = 1 applied to yield
-2
co(n) =
In terms of the originalvariables this becomes
co(r) =

-2

66

(2.72)

This decays to zero once 1 r is larger than 0(1/2). Thus thecon.


LAYERwith thicknessof
centration changes rapidly in a BOUNDARY
0( 1/2) that is located near the catalyst particle surface. Outsidethis
thin boundarylayer, in the interior of the particle, the concentration
is

very small, going to zero as - 0. One can carry this analysis to higher
order terms. For example,

the first effects of the particle shape on the

result appearat 0(1/2)but it should be clear that the primarystructure of the solution behavior has been captured by this leading-order

solution.
This example is a simple instance of a singular perturbation method.

The solution co 0 that we obtained before rescaling is calledthe


OUTER
SOLUTION.
It is valid away from the boundary r = 1. ThesoSOLUlution(2.72)that we obtained after rescaling is called the INNER
TION.In this simple example the inner solution decays to zero,automaticallymatching the outer solution. In general, the outer solutionis
not simply a constant, and a MATCHINGCONDITIONmust be imposed
to properlyconnect the two solutions to one another. This processis

the origin of the term MATCHED


EXPANSIONS.
ASYMPTOTIC

2, 6

Asymptotic

Analy sis and Perturbation Methods

169

be accomplishedwith a number of different procecan


Matching
here a simple approach that works for many probdescribe
We sophisticated and general approaches are described in
More
solution
In the simple approach, we denote the outer
(1991).

as up (x) and the inner solution as Up(s9 where x


to
up
terms
outer and inner variables, respectively. The simple
x/c are the
and
procedure j ust requires that at each order n = 0, 1, ... , N
matching
(2.73)
lim un(x) lim Un)
Hinch

inner limit of the outer solution equals the outer limit of


the
words,
In
solution.
theinnerinner variable as n, this expression is satisfied trivially. In
andthe
the inner nor the outer solution is valid throughout the
neither
general,
but the matching procedure provides a means to conentiredomain, valid solution. This so-called COMPOSITE
SOLUTION
uniformly
a
struct
isgivenby
(2.74)
unc(x) = un(x) + Un(9 - lim Un()

double counting of the overlapping parts of the


Thelast term avoids
are illustrated in the following example.
twosolutions.These ideas

2.30: Matched asymptotic expansion analysis of the reacExample


tionequilibriumassumption
Considerthe following reactions

B,

k2

1<-1are much larger than the rate constant


inwhichrate constants 1<1,
IQ,so the first reaction equilibrates quickly. In a batch system where
CB,and cc are the concentrations, the governing equations are
CA,
dCA

dt

dCB

dt

klcA + k-1CB
= klCA

k-1CB k 2CB

dcc = k2CB

dt

Thereactionequilibrium assumption takes CAand CBto be in equilibriumso that CB= KCAwhere K = kl/k-l. Further assume that 1<-1
is the largest rate constant. Initial concentrations in the reactor are

CA(O)
=

= CC(0)= 0.

-141 AL

Ordinary Differentia/
170

nondimensionalization so that a systematic


proper
(a) Find a
can be performed.
bation expansion

asymptotic expansions to show that the reaction


matched
Use
(b)
corresponds to the leading-order
equilibrium approximation
equations. Also find the equations outer
kinetic
the
of
forthe
solution
solution.
outer
the
O(el ) terms in
solution for the dynamics
leading-order inner
the
Find
(c)

on thefast

the inner and outer solutions, andfind


time scale l/k-l, match uniformly valid for all
time.
composite solution that is

Solution
(a) Letu =

(0), w = CC/CA(0), so u(0) =


v CB/CA
w (0) = 0. Define a scaled "slow" time variable ts = k2t so that

an 0(1) changein ti corresponds to a time interval of O(l/k2),


and define the small parameter e = k2/k-1. In these variables,
the rate equations are

du
dts

dts
dw
dts

Sincecc is determined completely by CBwe do not includeits


evolutionin the following development.

(b) Multiplyingthe dimensionless equations by yields

du

dts

= Ku+ v

dv
dts

Assuminga power series form, the outer solution is obtainedby


letting

u(ts) = uo(ts) +
v(ts) = vo(ts) +

(ts) + O(E2)
(ts) + O(e2)

2.6 Asymptotic Analysis

and Perturbation

Methods

171

substituting and
considering only
the terms of o
yields
Kuo = vo
for both of these
equations.
sumption in dimensionless This is the reaction equilibrium
asform. Although
observe that this
physicallyreasonable,
assumption is not
consistent with the initial con1, v (0) = 0.
Similarly,because the time
are multiplied by c,
derivatives
they do not appear
in
the leading-orderouter
problem, so we do not
have differentialequations
tions include the arbitrary
whosesoluconstants
that are determinedby the
initial conditions. Keeping
this issue in mind, we collect O(el)

terms to yield

duo

Ku 1 + VI

dts

dvo

dts

Although this equation is valid, it is not yet useful becausewe do


not know the values of of uo and vo. To obtain these we consider

the inner solution.

(c) The problem with the outer solution can be traced to the loss of
the time-derivative terms. Recognizingthat the derivativescan
be large at short times because kl and 1<-1and muchlarger than

1<2,we define a new fast time scale tf Uk-I = ts/e. Nowtf


changes an O(1) amount in a dimensionaltime of aboutIlk-I.
Rewriting the equations with this new time scaling yields

du

dtf

= Ku + v

dv

dtf

= Ku V

Nowwe seek an inner solution


Uo(tf) + EU1(tf) + 0( 2)
O(e2)
v(tf) = Vo(tf) + EVI(tf) +

u(tf)

(0) terms yields


O
the
extracting
Substituting and
dUo = KUO
+ Vo

dtf

dtf

pair of
= 0. This coupled
transforms or
Laplace
by
with initial condition
solved, for example,
be
could
equations

Uo(0)

Ordinary Differentia/

172

so tJ()+ Vo = 1 to just solve for


dtJo - -KUO+ 1 -

dtf
which has solution

-(1+K)tf +

-(1+K)tf

Usingthis, we obtain
-(1+K)tf

By analogy with the reaction-diffusion example above, this inner

solution corresponds to a boundary layer in time, rather than

space.

With inner and outer solutions in hand, we can use (2.73)to match

them. The "outer limit" of the inner solution is

lim

lim Vo

whichsatisfies the equilibriumassumption Ku = v. Theinner


limit of the outer solution is simply uo(0), vo(0) and usingthe

previous result yields

CIO
(0) =

vo(0) =

Nowwe have initial conditions for the outer solution. Addingthe


two differential equations at 0( 1) and differentiating the algebraic equation (reaction equilibrium result) at O (0) give

duo
dts

dvo
= vo
dts

duo
dts

dvo
dts

Solvingthese two equations for the two time derivativesgives


duo
dts

uo

dvo
dts

vo

Solvingthese with their respective initial (matching) conditions


uo (0) = 1, vo(0) = 0 gives the full leading-order outer solution
Llo =

vo =

and Perturbation Methods


AsymptoticAnalysis

173

0.9
0.8

0.7

uoc

0.6
CA(t)

CA0

0.5

0.4

uo

0.3

0.2
0.1

0.5

1.5

Figure 2.12: Leading-order inner Uo, outer uo, and composite solutions uoc, for Example 2.30 with E = 0.2,K = 1, and

or, reverting to dimensional form


CA (t)

e 1+K k2t

CB(t)

e 1+K k2t

Thisis precisely the solution that would be obtained via uncritical


application of the reaction equilibrium approximation. Now we
see this approximation in more precise terms.

Finally,we construct a uniformly valid composite solution via


(2.74).To leading order in dimensional variables
CA (t)

CB(t)

e 1+K k2t+

-(1+K)k2t/E

e 1+K k2t

Figure2.12 shows the leading-order inner, outer and composite


solutions for u(t) =
for E = 0.2, K = 1 and = 1.

Ordinary

174

Differential

2.6.6 Methodof Multiple Scales


The method of matched asymptotic expansions deals
in which different time or length scales dominate in

concurrentlyon disparate scales, a situation that


approach include dymamicalsystems with multiple naturai
or decay times14,nonlinear systems with widely separated
timescales
and problems of propagation (wavelike or diffusive) in
inhomogeneous

As an introduction to this approach, we consider a weakly


damped
X+

+ (k)2x = 0, X(0) =

On physical grounds, we expect two time scales to act simultaneously


in this problem: harmonic oscillation, with natural period 2Tr/Q)
(assumed to be ord(l)), and the exponential decay, with time scaleof
ord(e). If we proceed naively, looking for a regular perturbationsolu(t) + O(E2), we find that
tion x(t) = xo(t) +

+ (02xo = 0,
xo(0) =
+ Xl XO, XI(O) =

=o
=O

coswt + E (21wsin cot

tcos wt

with solution
x(t)

The equation at O (E) has the same differential operator as doesthe


= coswt that
zeroth order problem, but has a resonant forcing term
leads to the t cos cot SECULAR
term in the solution. When t = ord(l/),

this term destroys the asymptoticness of the expansion;the approximation is not uniformly valid, failing at large times. The methodof

multiple scales avoids this nonuniformity by explicitly recognizingthe


existence of two time scales in the problem, by letting to
and looking for a solution of the form x(to, tl,E). Now

dx
dt

x
tl

andMook
14Forextensive application of the method in this context, see: Nayfeh

(1979).

175

Analysis and Perturbation Methods


Asymptotic
2.6

look for
and we

a solution of the form

xo(to,tl) +

x(to,

(to,tl)

/to and DI = /tl, the leading-orderequation beDefiningDo = differential equation


comesa partial

Doxo+

=O

solution xo A(tl) cos (Oto,where A(O) = 1, but is as yet


Thishas the
At the next order, we have
otherwiseundetermined.

+ (02x1=

sin

sin

DoxI(O) = o

Again,a resonant forcing term is present on the right-hand side. Unand


less this is zero, a secular term again shows up in the equation
the approximationwill not be asymptotic. However,we now have the
possibilityof eliminating this term. Notice that if

dtl

the resonant term vanishes. This equation is called the SOLVABILITY


or secularity condition or integrability condition. It is an
coNDITION
amplitudeequation, determining the evolution of the amplitude of the
solutionover the slow time scale tl. From the leading-order result, we
At leading order, the solution
havethat A(O) = 1, so A =

is therefore

xo = e

coswt

Thisis the type of solution we expect intuitively: a very slowly decaying


harmonicoscillation. A couple final comments on this example: the

solutionXl is identically zero, but another resonance term shows up


in the equation for x2. This nonuniformitydoes not show up until
t = ord(l /E2), by which time the amplitude has nearly decayed to zero,
but if desired, it could be eliminated by including a "superslow" scale
t2 2t. This time scale arises because the damping causes a very
small(O(2)) change in the frequency of oscillation.
This simple example illustrates the procedure and resulting structure. The recurring theme is the existence of a secularity condition,
whosesatisfaction requires the solution of an amplitude equation. This
amplitudeequation determines the evolution of the system at its largest

Ordinary Differential

176

scale. If the underlying problem is linear, so is the amplitudeequation.


a nonlinear equation leads to a nonlinear amplitude equation.Th
e fol
lowing example illustrates this.

Example 2.31: Oscillatory dynamics of a nonlinear System


From Section 2.22, we have a complete understanding of the linear
system = Ax. When A has complex conjugate eigenvalues
i(0, the origin is a stable or unstable spiral depending on the signof
101, the growth or decay of solutions occurson
(T. When la I
a
time scale much longer than the period of oscillation. In this situation
the method of multiple scales can be used to show very generallythe
dynamics of the nonlinear system = Ax +N(x), where N(x) contains
no linear part. In this example we apply the method of multiplescales
to the system of equations

dx

dt

(0

where (T= ELI,with < 1, while u and (0 are ord(l). The steadystate
x = 0 of this system is very weakly stable or unstable, dependingon
the sign of u. Since the problem is nonlinear, finding the proper scaling
of x is an important part of the solution procedure. The oscillatory
nature of the linearized equation suggests that a solution canbe found
in terms of amplitude llxll and phase

Solution

the
Althoughwe consider here a specificform for the nonlinearity,
multiple-scales solution will lead to equations for r and whosegen-

eral structure is both extremelysimple and extremelygeneral.Thetime


scaling of this problem is similar to that of the linear exampleabove,
= ct. The
so we consider a multiple-scales expansion with to
ti rescale to reflects the time scale of the oscillation, while the scale To
solution.
flects the scale for growth or decay of the amplitude of the
x = X,
determine the proper scaling of the solution amplitude, we let
where X = ord(l) as e 0. Nowthe equation becomes
Dox + EDlX =

u ox

Analysis
Asymptotic
2.6

and Perturbation Methods

177

and N3(X,X, X) are the quadratic and cubic terms writconvenient for perturbation expansions. For general vecten in a form
the nonlinear terms
-- [111, 112] , v = [VI,V2] , w =
tors u
problem are
this
for
(X, X)
where N2

N2(U,V)

nonlinearity can be written as a sum of terms with this


Anypolynomial
structure.
If we tentatively let (5= e and X = Xo + EXI + O (2) then the proband O(cl) become, respectively,
lemsat 0(0)

Doxo-

xo=o

and

xo - DI xo + N2(X0)

DOXI

Thesolution at 0(0) is

xo(to, h) = r(tl)

cos(wto
sin(wto

does not lead to

Turningto the O(e) equation,the term

resonancebecause it is quadratic and thus contains no terms with frequencyco.The solvabilitycondition for this choice of scaling is thus
xo
Thisequationis linear, leading to exponential growth on the time scale

ti whenu > 0 and eventuallyviolatingthe scaling assumption x

o (E).Thus the choice = is not self consistent.


Weneed a different guess for . The term N2 does not lead to resonanceat leading order, but the term N3 can. For example sin3 (Oto =
(3sinwto sin 3wto) /4. A balance between the linear term, which is
O(e) and this cubic term, which is O(3) would imply that = 61/2
Thus we seek a solution of the form x = El /2(Xo + El /2Xl + EX2 +
Withthis scaling the O(0) problem and its solution remain
the same as above, while the o (1/2)equation becomes
LXI =

Ordinary Differential
178

Equations

where

(Xo,Xo)
As noted above, M
solution can be found

contains no resonant terms. A Particular


(cos 20 + sin 20)

- (cos20 + sin 20))

Xl (to, tl)

Observe that Xl has frequency 2(0.


where O= (Oto((tl).
havenot
amplitude and phase, r(tl) and
The leading-order
at
O
equation
the
which
to
turn
deter_
we
yet been determined, so
mines X2

LX2=

XO

right-hand side as R. Resonance will occur


For brevity, we denote the
(Oto, cos (Oto]T or [cos (Oto,sin wto]T
if it has terms of the form [sin
solvability conditions,

whichin general it does. To obtain the


terms
require that R be orthogonal to these
27T/W

sin (0 to
cos (Oto

2TT/(O

dto

wethus

cos (Oto

sin coto

dto = 0

Omitting the detailed calculation, which involves elementary but extensive trigonometric manipulations, we find that

dtl
dtl

= ur + ar
= br2

where for the nonlinearity given here

240
This is a remarkably general result. These simple differential equa-

tions govern the leading-orderbehavior for small and their form


is completely insensitive to the nature of the nonlinearitythe entire

structure of N2 and N3 is distilled into the constants a and b. Furthermore, even for a more general nonlinearity containing higher powers'

2.7

of Nonlinear Initial-ValueProblems
QualitativeDynamics

179

and cubic terms contribute. For example, a quartic


quadratic
onlythe
contribute to R.
not
does
thus
and
0(2)
until
appear
not
does
generality, it is known as the normal or canonical form
its
of
Because
class of nonlinear problemsl
this
for
the oscillation amplitude r is the most important.
Theequationfor
u/a. Including the
solutions r = 0 and r
steady-state
It has = 1/2 the latter solution becomes Ilxll = uc/a. Therefore
scaling
exist if uc/a < 0. We return to this example
solutions
nontrivial
real
in a more general context in Section 2.7.5.
andrelated ones

Initial-Value ProbNonlinear
of
Dynamics
Qualitative
2.7
lems
21.1 Introduction
equations can be extremely comThedynamicsof nonlinear differential
in

issues that arise


plex.In this section we introduce a number of the
address include
thesesystems. Questions that we
Howdo nonlinear systems differ from linear ones?
in
Whatgeneral qualitative (geometrical)structure can be found
nonlinear systems?
Whatkinds of steady-state and time-dependent behaviors are typical?

Howdo solutions change as parameters change?


2.7.2 Invariant Subspaces and Manifolds
Webegin with an introduction to the geometry of differential equations,
bydescribinginvariant manifolds,regions of phase space in which solutionsto an equation remain for all time. We shall see that these regions
organizethe dynamics of initial-value problems. For linear constant-

coefficientsystems,thinkingof the solution to

= Ax in terms of the

eigenvectorsleads toward a geometric view of solutions to differential


equations.An important point to notice is this: a point lying on a line
15Guckenheimerand Holmes(1983)give a general formula for construction of the
normalform,includingexplicitformulas for a and b, derived using a rigorous and
elegantmethod of nonlinear coordinate transformations.

Ordinary Differential

180

Equations

eigenvectors Vi of A never leaves this line


defined by one of the
lines are invariant under the solution
never has left. These
operator
x(O) = ct'i, then
if
that
2.22
Section
eAt.Recallfrom
x(t) = eAtCVi= cettVi Vt
by Vi an invariant subspace of the
Thus we call the line defined
phase
of
subspaces
a
invariant
linear
important
systemare
space. The most
the
(possibly
be
uns
,
generalized) eigendefined as follows. Let 111,...
real parts; VI, ... , vnu be
vectors whose eigenvalues have negative
those
parts and WI, ... , wnc those
whose eigenvalues have positive real
whose

eigenvalueshave zero real parts. Now we can define three invariant


subspaces

Eli = span{vl, ... vnu}


EC = span{wl, wnc}

stable subspace
unstable subspace
center subspace

An initial condition in ES will remain in ES and eventually decayto


zero, one in Eu will remain in Eli and grow exponentiallywith time,
and one in ECwill remain in EC,staying the same magnitude or growing

with at most a polynomial time dependence. Figure 2.13 showssome


examplesof these invariant subspaces in linear systems. In general,
a systemwith eigenvalueswith zero real parts is not robust: a small
changein the system, e.g., the parameters, moves the eigenvaluesoff
the imaginary axis and the invariant subspace ECvanishes. A system
like this, for which an arbitrarily small change in the system changes
UNSTABLE.
In
the qualitativebehavior,is said to be STRUCTURALLY
contrast, if all the eigenvalues have nonzero real parts, no qualitative
change occurs if the system is changed slightly. Such a system is said
STABLE.
to be STRUCTURALLY
Similarly,if a system linearizedaround
a steady state has no eigenvalues with zero real parts, the steady state
is said to be HYPERBOLIC.
Otherwise it is nonhyperbolic.
So what happens when we allow nonlinearity to creep in? Considera
nonlinear system in the vicinity of a steady state xs. Letting z = xxs,

we can write the system

or

zj z=0

= Jz

f (x) as
1

2fi

zjZk + O(llz11 3)

2 zjZk z=0

+ O(llz11 3)

2.7 Qualitative Dynamics of Nonlinear Initial-Value Problems

u
(a)

181

(2)

(b)

Figure 2.13: Examples of invariant subspaces for linear systems:


1,2 = O; (b) 1,2= l i,3 l.

whereJij fi/XJ z=0is the Jacobian of f(xs) and N2(z, z) contains


allterms that are quadratic in z. SinceJ z = O(z) andN2 (z, z) = O(z2),
the leading-orderbehavior for small z is determined by the linearized
system,as long as all the eigenvalues of J have nonzero real parts, i.e.,
the steady state xs is hyperbolic. The rigorous and general statement
of this fact is called the HARTMAN-GROBMAN
THEOREM
(Guckenheimer
and Holmes,1983). If there is an eigenvaluewith zero real part, then
the linearized problem gives that

llz112 0

dt

for some values of z, in which case the quadratic term N2(z, z) appears
at leading order.

Restrictingourselves to the usual situation, when xs is hyperbolic,


we now generalize the ideas of the stable and unstable subspaces to

the nonlinear case. We define the STABLE


16
ANDUNSTABLE
MANIFOLDS
of xs as follows:

The stable manifold w s (xs) is the set of points that tend to xs as


+00.

Theunstable manifold wu (xs) is the set of points that tend to xs


as t 00.
16A manifold
for our purposes is simply a curve or surface. Weuse the term because
ws and Wu are not generally linear subspaces of IRn,while ES and Eu are.

v/

Ordinary Differentia/

182

Equations

(b)

(a)

Figure 2.14: Invariant subspaces of the linearized system (a) and invariant manifolds of the nonlinear system (b).

These have the same dimensions

ns and nu as the subspaces


ES and

Eu of the linearized system, and are tangent to them at x = xs. The


relationship between them is shown in Figure 2.14. Convinceyourself
that the definitionsof ws and wu are equivalent to those givenabove
for ES and Eli for a hyperbolic linear system.
For many interesting situations, a steady state of interest is stable;
there is no unstable manifold. Recall,however, that in the linearcase

each individual eigendirection defines an invariant manifold so EScon-

tains within it further invariant subspaces. This fact givesus a tool


for understanding the approach to a steady state and possibly for constructing simplifiedmodels of the dynamics near a steady state. As
an example, consider the following pair of differential equations,with

Let [xl,X2]
(0,0) be a stable steady state. Furthermore, assume
that we have written the equation in coordinates where

-1

Thus the eigenvalues are e-l and 1,with corresponding eigenvectors uf = [l, OJTand us = [0, IJT. The "s" and "f" stand for "slow"and
"fast"respectively, because the dynamics in the us direction occuron

2.7 Qualitative Dynamics of

Nonlinear

Initial-ValueProblems

183

time scale, while those


an
in
time scale. We can thus define a the u f directionoccur on an O(c)
"slow" subspace
space ESf, with nonlinear extensions
ESSand a fast suband
(sufficiently close to the origin)
wsf. Initial conditions
approach
Wssin a time of O(e) so
ter this transient all the dynamics
afare
leading order in e, w ss is defined by along the slowmanifold wss. To
the
the result we would get by just setting equation fl (Xl, x2) = 0. This is
to zero above and can be
as an outer solution in a matched
found
asymptotic
expansions analysis.
Close to

the origin, this equation


can be rewritten Xl = h(X2),so
we can reduce the pair of equations to a
single equation
Whatwe have just done is a form of the
quasi-steady-state approximation used in all areas of chemical engineering
analysis. It illustrates a
very important and general property of initial-value
problems: beyond
initial transients, solutions often evolve on a subspace
or manifold that
has many fewer dimensions than the entire phase space.
This
both conceptually and computationallyimportant. It means fact is
that the
behaviorof large systems can often be understood by only considering a few dimensions, and also that computations might be performed
with many fewer degrees of freedom than formally required.

2.7.3 Some Special Nonlinear Systems


Gradient Systems
Imagine a small particle suspended in a viscous fluid, and moving under

the influence of a force that can be written as the gradient of a scalar


potential function U(x). This situation is describedby
(2.75)

In general, systems of this form are called gradient systems. Recall


that the vector VU is always normal to surfaces of constant U so trajectoriesof this type of system are alwaysmoving"downhill"on the
"energylandscape" defined by U. In other words, the potential U is a
Lyapunovfunction for (2.75). The only steady states of gradient systems are sources, saddle points, and sinks (can you show this?). A
two-dimensional example is shown in Figure 2.15. Some more insight
into the behavior of a gradient system is gained by asking how the "potential energy" of a trajectory evolves with time. The rate of change of

Ordinary Differential

184

Equations

-1

-2
1

-1

Figure 2.15: Contours of an energy function v (Xl,X2) or


Black arrows denote directions of motion on the energy
surface for a gradient system, while gray ones denote
= 0.2x2 motion for a Hamiltonian system;

+ 10X?2.

U on a trajectory is
dt

U dXi
Xi dt

Xi Xi

= - llVU112 go

with equality only at steady states, where VU = 0. So whateverthe


trajectory of the vector equation for X, it satisfies this scalarequation

showing that the rate of change of potential energy is the square of the

a
gradient of the potential. Trajectories roll downhill until they reach
minimum in U. In a high-dimensional problem, the potential surface
there
can be very complex, with many minima, and saddle points where
are many "downhill" directions for the system to choose from.

2,7 Qualitative Dynamics of Nonlinear

Initial-Value
Problems

185
Hamiltonian Systems
consider again the landscape of
Figure 2.15.
rather than V. Now imagine a
dynamical call the energy
H,
but
to
normal
are
system
not
function
tangent to
them. so where trajectories
we modify
are
the gradient
X2

Bythe same exercise we performed

X1

(2.76)

above for U
along trajectories,
We

dH(x, t)

dt

the energy function H is conserved


on trajectories
The above situation is a special
that follow(2.76).
case of a very
classof equations. Consider a system
general and important
of
particles,
eachparticle, there is a number of
e.g.,molecules.
For
coordinates
describesthe position of the particle,
three)
that
an associated momentum. we denoteand for each coordinatethere is
the full set of
and the momenta as p. The total
coordinatesas q
(kinetic plus potential)
the system is a function of the
positions and momenta energyH of
the HAMILTONIAN.
and is called
In the absence of friction
(always
true at the
level),the sum of kinetic and potential
energyis conserved,so atomic

dil

dt

H dpi

Pi dt

Ingeneral,this holds only if

H dqi

qi dt

Pi

qi
These equations are called
HAMILTON'SEQUATIONS.
A system whose

dynamicsare described by a nhodel of


this form is said to be Hamiltonian (Goldstein, 1980).

In additionto the property that H is constantalongtrajectories,

Hamiltoniansystems have another important attribute: phase space

Ordinary Differential

Equations

186

In other words, a "blob


of
volumeis
and rotate with time, but it cannot shrink
deform
may
initial conditions looking at the divergence17 in phase space of the
by
We can see this Hamiltonian system
trajectories.
conserved along

vector field for a

Pi
Cli

THEOREM.In general vector fields


LIOUVILLE'S
as
known
is
This result
conservative;if V f < 0 the systemis
be
to
said
are
with V f = 0
f for a gradient system?
dissipative.18What is V conservative vector fields is the velocityfields
An important class of
two dimensions, equations for motionof
of incompressibleflows. In
with the Hamiltonian function being
a fluid element are Hamiltonian,
three-dimensional incompressibleflow
simply the stream function. A
generally be Hamiltonian. Why
field, although conservative, cannot
not?

Single Degree-of-FreedomHamiltonian Systems


A mechanical system with only one degree of freedom, such as a particle moving along a line or a pendulum restricted to swing in a single

plane, illustrates some of the important features of nonlinear differential equations. In this case p and q are scalars. Often the Hamiltonian
can be written in this simple form
1
p2+ V(q)

Alongany trajectory, H is constant, so we can solve for the momentum


in terms of the position

Trajectoriesin phase space are thus symmetric across p = 0. Furthermore, this formula can be used to construct the energy landscape,the
curves of H = constant on the (q, p) plane. From Hamilton's equations
See Section 3.2.

18Grmelaand ttinger (1997)have developed


a

formalism

modelsOf

for continuum
materials, in which the vector field is
simply a sum of a Hamiltonian part and a gradient
part.

Dynamics of Nonlinear Initial-Value Problems


Qualitative
2.7

187

for p, we can see that steady states occur when


and the expression = 0. For the
pendulum, V(q) Kcosq, where
v'(q) and V/q

the energy landscape and phase-plane trajectories are


Kis a constant; 2.16 for K 2. Note in
particular the trajectories
shownin Figure
"hill" or "valleys," connecting two saddle points. These
that round the
ORBITS.
Denoting the two
specialtrajectories are called HETEROCLINIC
both
steadystates involved as P and Q, the heteroclinic orbit is part of
the unstable manifold of P and the stable manifold of Q. If the pochanged to V(q) q2+ {q4, the landscape and
tential energy is
trajectorieSare as shown in Figure 2.17. Now we have two trajectories
ORBITS.The
connectinga saddle point to itself, called HOMOCLINIC

homoclinicorbit is part of both the unstable and stable manifold of


the steady state. Homoclinicand heteroclinicorbits are examples of
globalfeatures of a dynamical system, because their existence cannot
be deducedby only looking at behavior in a small neighborhood of a

particularpoint. Hamiltonian systems are not structurally stable; physicallywe can understand this by noting that any dissipation of energy

leadsto "downhill"motion on the energy landscape and the special


propertiesthat H = constant on trajectoriesand V f = 0 are lost.
similarly,homoclinicand heteroclinicorbits are not structurally stable features,but they remain important for general systems because
theycan arise at particular points in parameter space, called GLOBAL

(Guckenheimerand Holmes, 1983).


BIFURCATIONS

27.4 Long-Time Behavior and Attractors


A questionof significant practical interest when studying a mathematicalmodelof a process is: what happens to the dynamics after a long
time,i.e., as t 00?In one- or two-dimensionalphase spaces, the possibilitiesare quite limited and we describe them essentially completely.
In three or more dimensions, very complex behavior is possible and we
shallonlytouch on the topic.
One Dimension

If x is a scalar, then the autonomous equation

canalwaysbe written in gradient system form

dx

Ordinary Differentia/

188

Equations

-2

-4
-4

-2

=
Figure 2.16: Energy landscape for a pendulum; H

Kcosq;

where V(x) f f (X') dx'. All initial conditions must end up at a


steady state, or roll downhill forever toward 00.
Two DimensionsPlanar Systems

Not every two-dimensionalvector field can be written as the gradi19) systems are not
ent of a potential, so two-dimensional(or PLANAR
quite as restricted as one-dimensional ones. Nevertheless, they are
still fairly constrained by the topology of the plane. Let us write a twodimensional system as

19Notall two-dimensionalsystems are planar. For example, consider a systemwhose


trajectories are restricted to the surface of a torus, i.e., a doughnut with a hole. This
surface cannot be mapped onto an unbounded plane. We discuss this case whenconsidering three-dimensionalsystems. On the other hand, it turns out that the surfaceOf
a sphere can be mapped onto a plane, but the mapping is singular. Another nontrivial
two-dimensionalsurface is a Mbius strip.

2,7 Qualitative

Dynamics of Nonlinear Initial-Value Problems

189

0.5

-0.5

-1

-1.5

-1

-0.5

0.5

1.5

Figure 2.17: Landscape for H = Bp2 + aq

e R2. The steady states of this system are simply the


where

- O. Near
intersectionsof the curves fl (Xl,X2)= 0 and
thesesteady states, the behavior is described by the linearizations, if
the eigenvalueshave nonzero real parts. In addition to steady states,
weknowthat closed trajectories (oscillations)can arise, as we saw in
theHamiltonianexamples described previously. Can anything else happen as t - co? Can, for example, a periodic orbit have a figure-eight
shape?The answer to this is no; for trajectories in phase space to cross
wouldrequire two values of the vector field (fl , f2) Tfor the same point

, x2)T, which cannot occur. This prohibition on trajectories crossing


appliesin any number of dimensions, but in two dimensions it severely
constrainsthe possible behavior. One very important consequence of
the constraint is the POINCAR-BENDIXSON
THEOREM

Theorem2.32 (Poincar-Bendixson).If D is a closed region of R2 and


a solution(Xl(t),x2(t))T e D for all t > to, then the solution either is a
closedpath, approaches a closed path as t 00,or approaches a fixed

point (steady state).

Ordinary Differential
190

Equations

theorem, consider the System


this
of
As an application

+ 2y2)
= x y x(x2

coordinates
Transformingto polar

gives

e) = 1 + Yr 2 sin 2 0 sin 20

= r(l

r 2(1 + sin2 20))

that the origin is the only


the equations we see
of
steady
form
this
From
do
the
trajectories
where
So
go?
Note
unstable.
that
state and that it is
must be bounded.

the trajectories
Furthermore
< 0 for all r > 1 so
and
2/6
rl
r =
0 on
0 for all O on the circle entering this annulus (let the circle
us call it D)
= 1. So all trajectories
is the area between the two gray circleson
never leaveit. This region
D, there can be no
throughout
steady statesin
Figure 2.18. Since > 0
Poincar-Bendixson theorem thus requires that there
this region. The
in this

(periodic orbit)
region. Numerical
be at least one closed path
integrationreveals that for this problem there is one asymptotically
Partof
stable periodic orbit, which is also known as a LIMITCYCLE.
as
a trajectorythat starts near the origin, as well the limit cycleit approaches, are shoml in Figure 2.18.

At this point we have seen two types of behavior that trajectories

may tend to as t 00.'a steady state and a limit cycle. These are simple
A good working definition of an attractoris
examples of ATTRACTORS.
the following.

An attractor A of a dynamical system is a set of points that


is invariantunder time evolution of the system, and that is
00 of all initial conditions
the ultimate destination as t
that begin sufficiently near it, i.e., in a neighborhood U 20
For planar systems, the Poincar-Bendixson theorem dictates that the

onlyattractors are steady states and limit cycles. Note that the twodimensionalHamiltoniansystems discussed above also have periodic
orbits;these are not attractors because an initial condition closeto one
such orbit does not approach it as t 00.The fact that trajectoriesof
Hamiltoniansystems lie on constant energy surfaces precludesthem
from having attractors.

20see,for example, Guckenheimer and Holmes (1983) for a discussion of variousdefinitionsof attractors and the difficulties
in developing a satisfactory generaldefinition

2.7

Qualitative

Dynamics of Nonlinear Initial-Value Problems

191

0.5

-0.5

-1

-1

-0.5

0.5

x
Figure 2.18: A limit cycle (thick dashed curve) and a trajectory (thin

solid curve) approachingit. The region D is bounded


by the two gray curves.

Three Dimensions

Trajectoriesin the three-dimensional phase space R3 are much less

topologically
constrained than are those in one or two dimensions.

Thereis no three-dimensional analog of the Poincar-Bendixsontheoremand thus no restriction that all attractors be either steady states
or periodicorbits. We look first at a simple, geometricallydefined example.Consider a torus (a donut-shaped surface) floating in three dimensionsand assume that all trajectories asymptotically approach the
surfaceof the torus, so that we only need consider what happens on
thetorus itself. Further assume that there are no steady states on the

Ordinary Differentia/
192
2
2

1
1

Tr

and quasiperiodic (right) orbits on the


Figure 2.19: Periodic (left) The orbit on the right
eventually passes
face of a torus.
domain.
the
in
through every point

can be represented by twoangus


torus. Nowany point on the torus
e [0, 2Tr), so we can represent
lar positions, O e [0, 2m) and
the
trajectory
that
leaves
squareany
one
a
by
side
space
of the
phase
a
Consider
very
side.
opposite
simple
the
on
square reenters
eVolution

of these variables

wherep and q are constants. Eliminating time and integrating wefind


an explicitsolution for the trajectories: O = P-(4 40) + 00,where

(00,40) is the value of (O,4) at a chosen value of t. Nowsince


and are in [0, 2m), the trajectory will return to (00,40) if - 00=
2nm when (()= 27Tn,where m and n are (as yet unspecified)
integers. Thisrequires that qm = pn, which can only hold if p/q =
m/n for somepair of integers m and n. That is, p/q must be a rational

number; this situation is a form of resonance. Otherwise, the trajectory


will never repeat and will eventually pass through even,' point on the

because O(t) and 4(t)


torus! Such an orbit is called QUASIPERIODIC,
are individuallytime periodic, but the pair (O(t), 4(t)) is not. Figure

2.19 shows trajectoriesfor the cases p/q = 9/7 (left) and p/q = u
(right).The qualitativedistinction should be clear. From this example
we see a new type of dynamical behavior, the quasiperiodic orbit.

Finally, we present one example of an even more complex typeof

attractor that can occur in phase spaces of dimension 3 or higher. Con-

2.7 Qualitative Dynamics of Nonlinear

Initial-Value

Problems

siderthe system

193

52= x + ay

=b+z(x c)

system. If a =
knownas the RSSLER
b = 0.2, c
displaysa limit cycle, as shown in
Figure

1, the system
2.20.
the
has
-system
If c
attractor
the
in
Figure
2.21. This
neither periodic nor quasiperiodic; in
attractor is
fact,
nearby initial
followsimilar paths but will
eventually diverge conditions will
is
known
from one another.
as SENSITIVITY
Thisproperty
To
INITIAL
is characteristic of CHAOTICdynamics.
CONDITIONS
and
Loosely
speaking,
on which the dynamics are chaotic is
an attractor
called a STRANGE
(Guckenheimerand Holmes, 1983;
ATTRACTOR
Strogatz, 1994).

21.5 The Fundamental Local Bifurcations


of Steady States
Wenow have seen a variety of possible
behaviorsfor nonlinear
namicalsystems: steady states, periodic
orbits, quasiperiodic dystrange attractors, heteroclinic
orbits,
orbits, homoclinic
orbits.. .. Our focus
now shifts to understanding the ways in which
the qualitative behavior
of a system changes as we change parameters.
This branch of the theory of differential equations is called BIFURCATION
THEORY(IOOSS
and
Joseph, 1990).
Webegin the discussion just by thinking
generallyabout the steady
states of

wherewe now explicitly indicated the dependent of the vector


field f

on the parameter u. For definiteness, assume that derivatives


of f of all

ordersexist. Let xs (u) be a steady state, i.e.,


= 0. Wecan
determinefrom the linearization of f at xs whether this steady state is

hyperbolic. If it is, then we know, from the Hartman-Grobman theorem,

that a small change in u does not change the qualitativebehaviornear


xs. Thus our attention focuses on behavior near values of u where xs
is not hyperbolicwhere the linearization has eigenvalueswith zero
real part. This is where qualitative changes in the local behaviornear
xs can occur.21 We denote a value
where xs is not hyperbolic as a
BIFURCATIONPOINT.
If the system has a special structure, like a Hamiltonian,then additionalconditions
mustbe satisfied for bifurcation to occur.
21

Ordinary Differentia/

Equations

194

1.5

0.5

cycle for the


Figure 2.20: A limit

Rssler system, a = b = 0.2, c = 1.

30

20
10
10

-10

10

15 -15

-10

Figure 2.21: A strangeattractorfor the Rssler system, a = b = 0.2,


c = 5.7.

2,7

Qualitative

Dynamics of Nonlinear
Initial-Va/ue Problems

195

be productive to begin our examinationof bifurcations


It
with
systems: x e RI, p e RI. We
dimensional
shall
see later how this
_
to
generalizes
higher-dimensional
discussion
systems. Withoutloss of
specify
can
that
the
bifurcation point is at
generality,wedependent
= 0 and
variable
y
=
x

new
xs(po). Taylor
definea
0, p = 0 and using the facts that f = fx = 0 give expanding

j'

+
fpl + 72(fxxy2 2fxpuy + fuuu2) +
1

(fxxxy3 + 3fxxuy u + 3fxpuyu2 +

(2.77)

Herethe subscript denotes partial derivative,fu = (f/p)


Wenow examine the structure of solutions to this in the most
etc.

important cases.

Saddle-NodeBifurcation
Webeginwith the most general case: the partial derivativesoff (other

thanfx) involved in the leading-order behavior are nonzero at the bifur-

bifurcation behavior; the behavior


cationpoint. This gives the GENERIC
that arises in the absence of special conditions on f. For small and
y, the dominant balance in (2.77)is

5" =
Thishas steady states

+ fxxy 2

(2.78)

-2fuu
xx

(Tosee this, check that when y = O(v) the terms in (2.77)that we


neglectedto get (2.78) are small compared to the ones that we kept.)

Therefore,depending on the sign of fu /fxx, there are two real solutions

for > 0 and none for < 0 or vice versa. The point = 0 is thus
quitespecialin that on one side of it there are no steady states near
y = 0 and on the other there are two. This type of bifurcation point
is called variously a LIMIT POINT, TURNINGPOINT, or SADDLE-NODE

bifurcationpoint. It arises when the conditions fx = 0,fu O,fxx 0


are satisfied. By rescaling, we can write the NORMAL
FORMfor this
bifurcation as
2

(2.79)

Ordinary Differential Equations


196

0.5

-0.5

-1
-0.2

0.2

0.4

0.6

0.8

1.2

diagram for the saddle-node bifurcation. EvFigure 2.22: Bifurcation


looks like this modulo a verery bifurcationof this type

across y = 0, u = O.
tical and/or horizontal reflection
The branch of stable solutions is the solid curve, the
unstable branch is dashed.

Nowthe steady states are simply y = VIP;the positive root is stable


and the negativeunstable. When u = 0, there is a single (repeated) root,

whichis stablefrom the right but not the left, and when u < 0 there
is no steady state, although trajectories that pass close to y = 0 move
very slowlythrough that region. The time spent in the interval [-1, 1]

is Tr The BIFURCATION
DIAGRAMassociated with the saddle-node

bifurcationis shown in Figure 2.22. It summarizes the position and


stabilityof the steady states as u changes.
Transcritical Bifurcation
In the above scenario, steady states exist only
on one side or the other of
the bifurcation point. What type of
bifurcation do we expect to see if we

know,on physicalgrounds, for example,


that solutions exist on both
sides of the bifurcationpoint?
the additionalconditionthat To capture this situation, we impose
fu = 0 at the bifurcation point. Nowwe
find that y = O(u) and
the leading-order equation for the dynamics
becomes

(fxxy2 + 2fxuuy + fuuu2)

Dynamics of Nonlinear
Initial-ValueProblems
Qualitative

197

0.5

-0.5

-1

-0.5

-1

Figure 2.23:

0.5

Bifurcation diagram for the transcritical


bifurcation.
ery bifurcation of this type looks like this modulo Eva ver-

tical and/or horizontal reflectionacross y = 0, u


=
The stable branch is solid, and the unstable branch0.
is
dashed.

states
Thishas steady
1

fxxfuu) u

sothe steady states are (locally) lines in the (y, u) space, which cross
at (Y,u) = (0, 0). Since steady states persist on both sides of the bibifurcation. It
furcationpoint, this scenario is called TRANSCRITICAL
ariseswhenthe conditions fx = 0, fu = O,fxu * O,fxx * 0 are satisfied.Wecan make the presentation simpler without loss of generality
bysettingfug 0 and rescaling, which gives us the normal form for
thetranscritical bifurcation

5' = y (u + ay),

a = 1

(2.80)

Wecanshowthat the steady state y = 0 is stable when u < 0 and

unstablewhen u > 0, and the nontrivial steady state y = u/a has the
oppositestability characteristics. The solutions are sometimes said to
"exchange
stability" at the bifurcation point. The bifurcation diagram
forthe transcritical bifurcation is shown for a < 0 in Figure 2.23.

Ordinary Differentia/

Equations

198

0.5

-0.5

-1
-1

-0.5

0.5

0.5

-0.5
-1
-1

-0.5

0.5

Figure 2.24: Bifurcationdiagrams for the pitchfork bifurcation. Top:


supercritical bifurcation, a = 1. Bottom: subcritical

bifurcation,a = 1. The stable branches are solid, the

unstable are dashed.

Pitchfork Bifurcation

Manyphysicalproblems have some symmetry that constrains the type


of bifurcation that can occur. For example, for problems with a reflection symmetry,a one-dimensionalmodel may satisfy the condition

f (x xs;u) = f((x

for all valuesof u. Withy = x xs we have that f (y; u) =


so y = 0 is alwaysa solution and f is odd with respect to y = 0.
Therefore,at a bifurcationpoint y = 0, u = 0 we have that 0 = f =
fxx = f xxxx ... and 0 = fu = fug =
Our Taylor expansion

2.1

Qualitative

Dynamics of Nonlinear

Initial-Value

Problems

199

becomes

tus

5" = fxuuy + GfxxxY3


O(Al/2). Rescaling,we find the normal form

5' = y (u + ay2), a =
y = o, y = u/a.
has steady states

(2.81)

The steady

states and
this bifurcation are shown in Figure 2.24.
for
stability
For obvious reais
called
scenario
PITCHFORK
BIFURCATION.
sons,this
It arises when
=
O,
f
fx
(y;
u)
=
f
(
y;
conditions
the
* O,fxx = O,fxxx O
1,
then
=
a
the
If
nontrivial
steady-state branch exists
aresatisfied.
is
and
stable;
this
case is
>0

said to be SUPERCRITICAL.
onlyfor u
If
branch
nontrivial
exists
for u < 0 and is unstable;
+1, the
this is
case. Note that in the latter case,
sUBCRITICAL
the
the linearlystable
not
be
will
approached
by
branch
initial conditions
trMaI
> u/a; so although small perturbationswith magnitude
from the
decay, larger ones grow.
0
=
y
state
steady
HopfBifurcation
In all of the above scenarios, solutions either monotonicallyapproach
a steadystate or go off to oo(or more precisely, to where higher-order
termsin the Taylor expansion are important). We now consider the

casewhere we expect oscillatory behavior, i.e., where the linearized


versionof the problem has complex conjugate eigenvalues = ico.
we must move from one- to two-dimensionalsystemsfor
Obviously,
thisbehavior to occur. As above, we expect a bifurcation when the
steadystate is nonhyperbolic, so = 0 and the eigenvaluesof J are
purelyimaginary. In this instance, the steady-state solution persists on
bothsides of the bifurcation point, as long as co * 0 when is small.
Welet = with u = 0(1) and write the model as
x

+N3(x,x,x)

+ O(lx14)

Thebehaviorof the linearized system is characterizedby oscillation


on a time scale of co-I (which we assume remains finite as 0), and
slowgrowth or decay on an O
) scale. In Example2.31, we used the
methodof multiple scales to show that for small c, balancing the linear
growthterms with the nonlinearity requires that x = 0(1/2),as in the

Ordinary Differenti

a/ Eq

200

QQti0hs

cases above, and that the solution


saddle-node
hasth
pitchforkand
form

x(t) = e l /

2r(t)

phase 4) of the solution are given by


and
r
amplitude
wherethe

= epr + aer

2
d) = ber

(2.82)
(2.83)

and b are
a
constants
The

functions of the nonlinearity and of Q)


(Gucks

1983;looss and Joseph, 1990). These equations


Holmes,
and
enheimer
form for the so-called HOPF BIFURCATION,
normal
the
comprise
steady states (r O) to periodic the
connecting
orbits
genericbifurcation
identical
is
in
r
for
equation
form
the
to
thatfor
(r 0). Noticethat
<
0,
we
have
a
a
if
So
supercriticalHopf
the pitchforkbifurcation. increasing from a stable
with
steady state to
bifurcation, a transition

amplitude is El 27"
ue/a. For the
suba stable limit cycle whose
but

it exists for <


periodic solution,
0
criticalcasea > 0 there is a
we

see that on thelimit


phase equation,
andis unstable.Turningto the
of the solution is co + bu/a. It
cycle, = pebt/a, so the frequency

changeslinearlyas p increasesfrom zero, with a rate determinedby


b/a.

2.8 NumericalSolutions of Initial-Value Problems


Wehave seen that for linear constant-coefficient problems, a complete
theoryexistsand the general solution can be found in terms of eigenvalues and eigenvectors.For systems of order greater than four, however,
there is no general, exact way to find the eigenvalues. So even in the
most well-understoodcase, numerical approximations must be intro-

duced to find actual solutions. The situation is worse in general,becauseno simple quantitative theory exists for nonlinear systems. Most
of themneed to be treated numerically right from the start. Therefore
it is important to understand how numerical solutions of ODESare constructed. Here we consider initial-value problems (IVPs). We focus on
the solution of a single first-order
equation, because the generalization
to a system is usually apparent.
The equation

2.8 Numerical Solutions of Initial-Value Problems

integrated from a time


canbe formally
t to a future
time t + At to

f(x(t'),t')dt'

20)

read
(2.84)

in
Thecentralissue the numerical solution of IVPsis
evaluationof the integral on the right-hand side of the approximate
this
a goodapproximation and a small enough time step At,equation.With
the aboveformulacan be applied repeatedly for as long a time
interval as we like,
i.e.,x(At) is obtained from x(O), x(2At) is obtained
from x(At), etc.
x(kAt).
Weuse the shorthand notation x(k)

2.8.1 Euler Methods: Accuracy and Stability


Thethree key issues in the numerical solution of IVPsare SIMPLICITY,
and STABILITY.
We introduce each of these
AccURACY,
issues in turn,
in the context of the so-called Euler methods.
The simplest formula to approximate the integral in (2.84)is
the
rectanglerule. This can be evaluated at either t or t + At, givingthese
twoapproximationsfor x (t + At)

x
x

x( k) +

(2.85)

t(k+l)

(2.86)

Thefirst of these approximations is the EXPLICIT


or FORWARD
Euler
second
is
the
the
IMPLICIT
or BACKWARD
scheme,and
Euler scheme.

TheexplicitEuler scheme is the simplest integration scheme that canbe


obtained.It simply requires one evaluation of f at each time step. The
implicitscheme is not as simple, requiring the solution to an algebraic

equation(or system of equations) at each step. Both of these schemes


schemes, as they involve quantities at the
areexamples of SINGLE-STEP
beginningand end of only one time step.
Toconsider the accuracy of the forward Euler method, we rewrite it
likethis

x (k) +

t( k)) + eAt

wherec-is the LOCALTRUNCATIONERRORtheerror incurred in a single

timestep. This can be determined by plugging into this expression the


Taylorexpansion of the exact solution

31

Ordinary Differential
EquQti0hs

202
yielding

(k)At+X

of

(k) t(k)), the first two terms on each

we find
and
cancel,
this equation

that

Side

+ o(At 2)

O. The implicit Euler method obeysthe


At
as
as Atl
are said to be "first-order
methods
scales

Euler
Thus
Thus the
is simpler, is there any reason to use
scaling.
method
same
the explicit
and arises when we look
rate." sincemethod? The answer is yes
at the
stability.
the implicitmentioned above,
equation = Ax, so f (x, t)
third issue
Ax. If
linear
single
not
asking
is
It
too
00.
Considera
much that
-.-0 as t
x(t)
then
property.
0,
same
The Eulerap_
Re(A)< approximation maintain the
a numerical
special case are
this
for
proximations
x (k+l) = x(k) + Atx(k)

x (k+l)

x (k) + AAt x (k+1)

the iteration formula can be writtenin


scheme,
Euler
explicit
For the
(k+l)= Gx(k),where in the present case G = (I +
the generalform x
or AMPLIFICATION

FACTORfor
GROWTHFACTOR
AAt). We call G the
approximation. By applying this equation recursively from k =

the
k 0
00as k
k
if IGI > 1, then x (k) ..>
0, we see that x ( ) = G x( ), so (k)
0 as k
0. Thus thereis
if IGI < 1, then x
00. Conversely,
CRITERION:IGI < 1. This is equivalent to
STABILITY
a NUMERICAL
Gi + G} < 1, where subscripts R and I denote real and imaginary parts,
respectively. For explicit Euler, GR 1 + RAt,G1 = IAt, yielding
stabilitywhen
(1 + RAt)2 + (IAt) 2 < I

On a plane with axes RAtand IAt, this region is the interior of a circle

centeredat ARAt= 1,IAt = 0. If At is chosen to be within this circle,


the time-integrationprocess is numerically stable; otherwise it is not.

If is real,instabilityoccursif

> 0; this is as it should be, because

the exactsolution also blows up. But it also happens if < 0 but At <
-2/, whichleads to G < 1.This is pathological, because the exact
solutiondecays. This situation is known as NUMERICAL
INSTABILITY.

2.8 Numerical Solutions

of

Initial-value

Problems

2.5
'2

1.5

203

Exact
Explicit

Implicit

11

0.5

x(t)
-0.5
-1

-1.5
-2

-2.5
5

10

15

20

Figure 2.25: Approximate


solutions to
implicit

Euler methods
with At = 2.1,
along with the

A numerically unstable solution


is not a

faithful approximation
of the

For a system of equations =


Ax,
only if the time step satisfies the IGI numerical stabilityis obtained
<1
values i. Observe that for systems with criterion for all of the eigenpurely imaginaryeigenvalues,
i.e., purely oscillatory solutions, the
explicit Euler method is never
numerically stable.
Now consider the same analysis for the
implicitEulerscheme. We
can again write x(k+l) = Gx(k), but now G = (1
At)-1.Therefore
IG 12 = GG = (1 2RAt + 1/\12 At 2) -1

whenever
< 0. That is, if the exact solution decays, so does the
approximation. The stability of this method is independentof At, so it
is said to be ABSOLUTELY
STABLE
or A-stable.
Figure 2.25 shows plots of x(t) for the case = 1startingfrom
initial condition xo = 1 using explicit and implicitEulermethods with

Ordinary Differential Equations


e-t. The implicit
Euler
displays

numerical instability.

solution
Euler
explicit
Systems
whilethe
and Stiff
Accuracy,
model whose shortest time
Stability,
equation
time step At that
2.8.2
is
cannot choose a
we
a differential
jump right over the
will
have
Obviously,
solution
we
Say
is tint, our approximate
that At < tint. But if we
or
of interest
accuracyrequires

step smaller than t


requires a time
min,
stability
problem.
For example
entire
method,
the
of
scale
usean explicit
smallesttimemight be the reaction time for a free-radical
the
whichis problem,this
so fast that its concentration always
are
kinetics
kinetics
in a
tmn
whose
Problems where
intermediate
equilibrium.
used to solve such problems,
remainsnear methodsare always
time steps.

So

small
STIFF.Implicit requireunreasonably
explicitmethodsthe problem
for
In general,

wecan

= Ax

a single-step

For example,consider

scheme as
(k)

the system

= Ax

-100

so its characteristic time


ThematrixA has eigenvalues -3 and -100,
-100t, so it is negligible after
scalesare 1/3 and 1/100. In fact X2(t) e

onlya very short time. The explicit Euler method must capture this time

scaleto remain stable. Specifically,G = I + AtA, whose eigenvalues are

I - 3Atand 1 - 100t,givinga stability limit At < 2/100. If implicit


Eulerisused instead, G =

whose eigenvalues are 1/(1+

3At)and 1/(1 + 100At), which are both always less than one. Again,
the implicitEulermethod is always stable.
2.8.3 Higher-Order Methods

TheEulermethods are
simple to implement and convenient for introducingthe conceptsof
simplicity, accuracy, and stability, but they are

2.8 Numerical Solutions of

Initial-Value
Problems

not necessarily the most


efficient for
ample,if an implicit method
is required,solvingreal
formula
(AM2)
order
is much
problems.
the
For
preferable.
ADAMS-MOULTON
rule rather than the
This
therefore has second-orderrectangle rule to formula uses thesecondtrapeis usually given by a numberaccuracy. The evaluatethe integral
accuracyof
and
Therefore, the Euler p, the exponent
IVP
in the
methods have
P 1 and expression =
AM2has P
= 2.
= x(k) + At
2

Likethe backward Euler method,


this formula
requires the
at
ward Euler method, it is A-stable. each time step. Also solutionof
It is preferable
like the backmethod because it has higher
to
the backward
accuracy, the
Euler
same
work.
no more
AM2 is widely used
stability,
for stiff problems. and requires
formulas of arbitrary order are
Adams-Moulton
available. The
third-orderformula,
for
(2.84)by polynomial approximation.
These methods are
however, and since they are expensive,
not A-stable,
are
rarely
used (exceptin the
contextdescribed later in this section).
The second-order ADAMS-BASHFORTH
(AB2)method is an
method that also uses the trapezoid rule,
explicit
but
it
extrapolates
to the
point f (x (k+1) t (k+1)), using current and past
of f. Denoting
t(k)) by f (k) AB2 approximates f(k+l) byvalues
f(k) + (f(k) _ f(k-l))
(linearextrapolation), so it is a two-step scheme.
Usingthis extrapolation in the trapezoid rule formula aboveyields
X(k)+ At (3f(k)
x
2
The price that is paid for higher accuracy without more work is a sta-

bilitylimit that is twice as restrictive as the forwardEulerlimit,e.g.,


for real the stability limit is At < 1instead of 2.This stricter
limit arises from the extrapolation that Adams-Bashforthuses, as seen
in Figure 2.26. Adams-Bashforth formulas of arbitrary order also are
available. The third-order formula, for example, uses f (k) f( k- l ), and
f(k-2)

Stability can be improved by combining an explicitmethod for "pre-

dicting"x

with an implicit method for "correcting"it. Suchap-

methods. Often the order


proaches are called PREDICTOR-CORRECTOR
corrector.We
Ofthe predictor is chosen to be one less than that of the

Ordinary Differential

206

1 order predictor combined with

denote APCnas the n

Equations

the

1. A predicted value of the solution at the next time step is denote


d
At (3f(k) f(k-l))

2. This value is now corrected, using the implicit third-order Adams.

Moultonformula

x(k) +

x
where

At

+ 8f(k) f(k-l))

12

t( k+l ))

APC3displays third-order accuracywith only one more functioneval_


uation than explicitEuler and comparable stability. Figure 2.27shows
the stability regions for the APC2, APC3, and APC4 methods. If Atfor
each eigenvalue of the Jacobian of f (k)is within the region, the method

is stable. If the solutions are expected to be very smooth and function


evaluationsare expensive,the APCmethods are very economical,
because of their high-orderaccuracywith only two function evaluations
per time step.
Adams predictor-corrector methods are multistep methods because

(RK)meththey use information from prior time steps. RUNGE-KUTTA

ods also have higher-degree accuracy than Euler, but are one-step meth-

ods, a useful feature in situations where one may want to changethe


time-step during the course of the integration. The simplestof these,
RK2,uses the trapezoid rule to obtain second-order accuracy,extrapolating to f(k+l) using a simple forward Euler step: f (1<+1)f(k)+
Atf (k). Letting
f (x (k) + Atkl,t + At)

the trapezoid rule formula becomes

At

RK2is in fact identical to APC2(because a first-order Adams-Bashforth

formula is simply an explicit Euler step), but RK4,the fourth-order

2.8 Numerical Solutio


ns of

Initial-value
Problems

207

0.5

AB2
AB3

Im(At)

-0.5

-1

-1.5

-1

-0.5

Re(At)

0.5

Figure 2.26: Stability


regions for
see also Canuto Adams-Bashforthmethods;
et al. (2006,
= Rx;
Fig. D.l).

Runge-Kutta formula, has a larger


stability limit than
the corresponding
APC4method; see Figure 2.28. The
RK4formula is
x(k+l)
(k) At

in which

+ Atk2, t( k) + At)

+ Atkl, t(k) + At)

+ Atk3, t( k) + At)

If f were independent of x, this would reduce to the Simpson'srule


formula. RK4 requires four function evaluations. Becausethey have
better stability properties than APCformulas,Runge-Kutta
methods
are generally preferable for nonstiff problems unless evaluationof f is
expensive. If f is stiff, AM2 is the method of choice.

Ordinary Differential

208

Equations

1.5

APC2
APC3

APC4
0.5

Im(At)

0
-0.5
-1

-1.5
-2
-2.5

-2

-1.5

-1

Re(At)

-0.5

0.5

Figure 2.27: Stability regions for Adams predictor-correctormeth-

ods; = Rx; APCn uses (n l)st-order predictorand


nth-ordercorrector; see also Canuto et al. (1988, Fig.

4.7)

2.9 Numerical Solutions of Boundary-Value Problems


2.9.1 The Methodof Weighted Residuals

There are basicallytwo ways to make a continuous problem, likean


ODE,discrete. One is to choose a finite number of points (valuesof
the independent variable)and find an approximate solution at those
points. This is what we did to solve initial-value problems (IVPs).We
picked a point a distance At from the current time step, and usedvarious approximateintegration techniques to find the solution at that
point. This is a natural approach for IVPs,because the solution at each
time depends only on the solution at the immediately previoustime.

The situation with boundary-value problems (BWs) is different. In this

case, the solution at any point is coupled to the solution at all other
points in the intervalbecause the boundary conditions are imposed
at both ends of the interval (think of a diffusion problem). Soif the

2.9 Numerical Solutions of

Boundary-Value
Problems
209

RK4
1

RK3
RK2

Im(NAt) 0
-1

-2

-3
-3

-2.5

-2

-1.5

-1

-0.5

Re(At)

0.5

Figure 2.28: Stability regions for Runge-Kutta


methods; = Rx; see
also Canuto et al. (2006, Fig. D.2).

solution at a point changes, so does the solution at the neighboring


points. A natural way to take this fact into accountis to approximate
the solution as the sum of a finite number of functions, i.e., to choose a
set of functions over the interval and represent the solution as a linear

combination of those functions. A general and systematicapproach


OFWEIGHTED
to this approximation process is given by the METHOD
RESIDUALS(MWR).

Considerthe linear ODE

Lu = f (x) x e [a, b]
We choose a set of TRIALFUNCTIONS

in which to represent the

solution
solutionu(x) and let un(x) be the approximate
n

un(x) = E cj(x)
and the trial funcu(x)
solution
the
For the moment we require that
though it is easyto
conditions,
tions satisfy homogeneous boundary

Ordinary

210

Differential

relax this requirement. As n 00,we expect un to approach


solution. For finite n, we expect a finite error, or residual,R
define pointwise as

R = Lun -f
Obviously,if un = u, then Lun = f, the equation is solved
In any case, we want R to be as small as possible. In what andR
sensedo
Ctlons,
{(/'i(x)}, and require that the WEIGHT
or TESTFUNCTIONS
FUNCTIONS
(2.87)

This condition is equivalent to requiring that the residual be


to all of the test functions, with respect to the chosen inner
product
Weexpectthat an approximate solution un (x) that satisfiesthese
ditions will converge to the exact solution as n 00because a
function
that is orthogonal to infinitelymany basis functions must be zero.
Us.
ing the expressions for R and un, the condition becomes
n

Setting

and

AiJ

(Lcj(x),

bt = (f (x), (Vi(x))

(2.88)

(2.89)

results in the linear algebraic system Aijcj = bi. Weknow,of course,


how to solve this. Once we have done so, we have the coefficientscj in
the series for un and therefore we have our solution.
As yet, the trial and test functions have been left unspecified.We
already have introduced several examples of trial functionsandshall
shortly see another. As for test functions, there are two common
choices,which lead to two types of formulations:
If the trial functions are orthogonal,
1. Galerkin: (Pi(x) = (hi
this approach simplyforces the first n terms in the representation
of R in the trial function basis to vanish.
= 1,2,..., n is a
2. Collocation: (Pi(x) = (x Xi), where
Xi)) = R(Xi),the
POINTS.Since (R,
set of COLLOCATION
collocation method simply requires the residual to be zeroat the
chosen set of points.

2.9 Numerical Solutions

of

Boundary-Value
Problems

We introduce a number
of specific
modelproblem

211
MWR

implementations

Sincethe boundary
conditions are
not
1). Now u(0) = u(l) = O
homogeneous,
and the
let u
equation
becomes

using the

(2.90)

(2.91)

Galerkin Method

Finite element Galerkin


method. In this
are low-order piecewise
method,thetrial
polymmials localized
functions
domain, known as the elements,
to small subsets
and are zero
of the
space 1.2(0, 1) and the set of
elsewhere.
Considerthe
functions (i(x)
where
o,

o,
o,

otherwise

h, xj < X xj+l
otherwise

h' XN-I S X
otherwise

with xj = jh and h = I/N. These functionsare called"hat"functions


and are shown in Fig. 2.29 for N = 2. Observethat
and

are nonzero in overlapping regionsthese regions are the "elements"

to which the name of the method alludes. Thesefunctionsare not


orthogonal. Attractive features of this set are that the functionsare
spatially localized (important for multidimensionalproblemsin complicated domains) and simple, and that the coefficientscj are the actual
values of the (approximate) solution at the points xj: cj = Un(Xj).

For (2.91), the boundary conditionsu(0) = u(l) = 0 obviatethe


in the basis, since they do not satisfythe boundary
and
use of
(i(x), so the weighted
conditions. In the Galerkin approach, (Pi(x) =
residual conditions become
= 0, i = 1,2,...,n

212

Ordinary

Differential

40 (x)

0.5

0
0.5

Figure 2.29: Hat functionsfor N

= 2.

where n = N 1. Thus

o,

(W"'i + "i)dX

otherwise

and
= ih 2

Note that integrating by parts is unnecessary if we are willingtodeal

with the delta functionnature of W' for the hat functions.Nowwe

have a linear algebra problem Aijcj bi, which can be solvedbyLU


decomposition, for example. For this particular choice of basis,Ahasa

special structure: only the diagonal elements and those just aboveand
below the diagonal are nonzero. Such a matrix is calledTRIDIAGONAL
and can be LU decomposed quickly, i.e., in O (n) operations, sincemost

of its entries are already zero. In general, an n x n matrixthatonlyhas


n nonzero elements is said to be SPARSE.
Becausethe trialfunctionS
in this case are piecewise linear, the 1.2norm of the error Ilun co.
as n
decays rather slowly as n increases: llun u112=
=
-The maximum (LOO)
error decays even more slowly: llun

2.9 Numerical

Solutions

of

Boundary-value
Problems
213

0.06

Exact

- 12
UN(X) 0.04

0.02

0.2

0.6

0.8

Figure 2.30: Approximate


solutions
ment method with hat to (2.91) using the finite eleThe exact solution alsofunctions for N = 6 andN = 12.
is shown.

as n +oo (Hughes, 2000; Strang and Fix,


2008). Figure 2.30 shows

finite element solutions for this problem, as well as the exact


solution

u(x) = x+ csc(l) sinx.

The finite method bears some similaritiesto FINITEDIFFERENCE

methods, which instead of expanding solutionsin basis functions,considers function values at distinct grid points in a domainandreplaces
derivatives by difference formulas (Press,Teukolsky,Vetterling,and
Flannery, 1992). For example, u' (x) can be approximatedas

u'f(xj)
or

U(Xj+1) u(xj) + O(h)

u'b(xj) 'z

(2.92)

(2.93)

two equationsareknown
These
above.
The
as
where x j and h are defined difference formulas, respectively.
as FORWARDand BACKWARD

Ordinary Differential

214

formula for the first derivative is givenby


difference
CENTRAL
u(X i) u(X l) + O(h2)
u'c(xj)

(2.94)

easily verified by Taylor expansion. Applying


are
formulas
These
central difference formula for the second this
the
gives
formula
tive
U(Xj+1) 2u(xj) + U(Xj-1)

u'c'(xj)

112

(2.95)

approximate the second derivative in (2.91)


Using this formula to
yields
equations
of
set
the following

U(Xj+1) 2u(xj) + U(Xj-1)


112

with u(xo) = u(xn+l) = 0. For comparison, writing the finiteelement


formulationabovein the same format gives
U(Xj+1)2u(xj) + U(Xj-1) 4U(Xj-1) + u(xj) + 4U(Xj+1)
6

112

- -jh j

Observethat the term corresponding to the second derivativeis identical in the two cases, as is the right-hand side. In many situations,

finite difference and finite element formulations lead to similar sets of

discretizedequations.A great advantage of the finite elementmethod,


however,is its flexibilityin dealing with multidimensional problemsin
complexgeometries,as one does not need to develop multidimensional
analoguesof the difference formulas.
Fourier-Galerkin method and eigenfunction expansion. Here,instead of the hat functions, we use the sine functions as trial and test
functions,i.e.,
= sinfrrx; we seek a solution in the formof a
truncated Fouriersine series. In the present case these trial functions
are eigenfunctionsof L. Choosing the trial functions to be the eigenfunctions of the linear operator is called EIGENFUNCTION
EXPANSION
and in this situation the matrix A defined by (2.88)becomes diagonal.

For the example

J 2 Tt2

(1)i-1
i7T

Numerical Solutions of Boundary-Value


Problems

215

diagonal nature of A makes the


solution
ace the above integrals have been performedprocedurefor c simple
--- j 2 Tr2 jTr

-3 for large j, the 1.2error scales


Because cj
as Xln3. This error
for
the
finite
than
element
method,but not as smallas it
is smaller

could be, because the solution to the problem is not a smoothperiodic


function, as the use of the Fourier basis implicitlyassumes.
Legendre-Galerkin method. The trigonometric functionsused in the
presious example were the eigenfunctionsof a regular
happens if we instead use the eigenfunctionsof a sinproblem. What
problem, for example the Legendrepolynomials?
gular
is set in the domain (O,1), whilethe natural
problem
example
Our
polynomials is (1, , so we changecoordinates,
Legendre
for
domain
X,which gives the new equation
2x
=
z
letting

d2u

=
Welet (j(z)

Pj-1(Z), so

Cj+1Pj(z)

conditions,so
boundary
the
satisfy
TAU
polynomials do not approach, calledthe GALERKIN
The Legendre a slightly modified
use
we need to
i 1,2,... ,n-2
for
only
conditions
method:
X.

unknownscj.
n
the
for
2 equation
expressionsfor
equations
these
2. Supplement
on
conditions
ary

Ordinary

216

Differential

Equ

2 rows of A and b contain the weighted


n
first
the
Now
last two rows the equations needed
the
and
equations,
to
equations resulting from the weighted
the
construct
To
properties of Legendre

conditions, the following

Qti0hs

polynomials

-1

(2k +

k=0,jk odd

These can be derived from

results can be used to yield

For the sample


problem,these

-(2k+

41 1(

+k+

jk even

ik +

jk even

for i = 0, n 3, j = 0, n 1 and

bi+l

-1 +
(z

dz

-1
(Po(z) + PI

dz

1
- -i0 -il
3

conditions leadto
for i = 0, n 3. The expressions for the boundary

bn-l

bn = 0

and exact
Wedo not plot the comparison between approximate visually
inare
two
the
5,
=
n
for
even
lutionsfor this case, because
j for n
distinguishable.Rather, Figure 2.31 shows lcjl versus plot,indicatFor j 4, the plot is nearly a straight line on a semilog or spectral
ing that cj decays exponentiallywith j. This exponential

Numerical Solutions of

Boundary-Value
Problems

10

-1

217
o

o
10

-3

10
10
10
10

-11

10

Figure 2.31: Dependenceof


on j for the
approximation of (2.91)
with n = 10.Legendre-Galerkin
convergenceis characteristic of MWRmethods
that use trial
functions
eigenfunctions
be
to
of
a singular
chosen
Sturm-Liouvilleproblem
(Gottlieband Orszag, 1977). For this reason these methods
are often called
METHODS.The rapid convergence
spECTRAL
reflects the fact that
the
Galerkinapproximation yields a solution

very close to the truncated


the
solution in the trial function
Fourierseries of
basis. Theveryhigh
spectral
methods
of
accuracy

does come at a costthe


matrix A is not

sparseso it cannot generally be factorized in O(N) operations.


Collocation Method

Galerkinmethods require evaluation of many integrals of productsof


trialfunctions. This fact is particularly cumbersomein nonlinearproblems.In the collocation method, the integrals of (2.87)are simplified
greatlyby the fact that the test functions are delta functions.Another
attractivefeature of the collocation approach is that the solution canbe

directlyrepresented by its values at the collocationpoints,ratherthan


as coefficientsin a series. To illustrate the structureof a collocation
(x), (x), (x)}and
formulation,consider the trial function set
threecollocation points Xl,
x3. The approximatesolutionis thus
un(x) =
The coefficients - c3 are
+
+

Ordinary Differential

218

Equations

uniquely determined if the values of un are known at three


we can see by writing in matrix form the equations for the points,
valuesof uas
(Xl)

(2(x1) (3(x1)

(h1(X2)

This equation can be written Sc = U, where S is the (invertible)


transfor_
mation that relates the coefficientvector c, with the

vector of

values at the collocationpoints U

solution

U = [un(X1) un(X2) un(X3)]


We also can write the equations for dun/dx at the collocation
points
(Xl) ('2(X1) (Xl) Cl
4)'1(X2)

('2(X2)

(X2)

uln(X2)

u'n(X3)

or Sac = U'. Using the fact that c = S-1 U, we can write U'
= SdS-1U
or U' = DnU, where Dn = SdS-1 is called the COLLOCATION
DIFFERENTIATIONMATRIX.
With this formula, we can compute the derivative
of
the function un (evaluatedat the collocation points) directly
fromthe
function values at the collocationpoints. All of the informationabout
what basis functions have been used is absorbed into the operatorDn.
Similarly,the second derivativematrix is simply . Note that within

the space of functionsthat are spannedby the set of trial functions,


the differentiation is exact. For example, if we use a polynomialba-

sis, the derivative of any quadratic function is evaluated exactlyby the


collocation differentiation operator constructed above.
The choice of collocation points depends on the basis functionsand
is based on the followingidea. A weighted integral (inner product)of
functions

u(x)(x)w(x) dx
can be approximated as a sum
(2.96)

where wj * w (xj) in general. It can be shown that for certain choices


of u(x) and v (x), the points xj and weights wj can be chosen so that
(2.96)is exact. These points are the ideal choice for collocationpoints.

2.9 Numerical Solutions of Boundary-Value

example, let u and v be periodic

u(x)=

n/2-1

Problems
219

functionsthat
canbe

keikx

written

n/2-1
i'keikx

0 < x < 2TT.Equation


in the domain
(2.96),modified
n,
which
is
redundant
due to periodicity, to excludethe
term J
integralif xj = 2Ttj/n and wj = 2TT/n.Similarly, yields the exact
if u and v are
nomialsof degree n, (2.96) can be made
polyexact
using
the GAUSSIAN
FORMULAS.Canuto
INTEGRATION

et al. (2006)
provide a detailed
dis-

cussion.

chebyshev collocation.

Chebyshev polynomials
are aparticularlypoptrial
functions
as
for
the
choice
ular
collocationmethod. These

functionsare the solutions to the Sturm-Liouville


equation
dy +
2 d2y x
v2y = 0
dx
(2.97)
Whenv is an integer, this equation alwayshas a polynomialsolution
called a Chebyshev polynomial (of the first kind) Tv(x); see Exercise
2.36.These polynomials form an orthogonal basis in the domain-1 <
x < 1 with the weight function w (x) = (1 x2)-1 and have the form

To(x) = 1

Tl(x) = x
= 2xTv(x) - Tv-I(x)
Aswith Legendre polynomials, Chebyshev polynomials also arise from
Gram-Schmidt orthogonalization of the set {1,x, x2,... }, but now us-

ing the weighted inner product. A particularly important propertyof


Chebyshev'sequation is that when using the coordinatetransformation
x = cos e, it reduces to

d2y
d 02

andthe Chebyshevpolynomials become


(ve)
Tv(0) =

collocation
< O < TT.In this domain,the optimal
in the domain TT
domain -1 < x < 1
original
the
in
which
spaced,
points are uniformly
resultsin the points
xj = COS,j= O,...,n

Ordinary Differential

220

These points are very closely spaced near x = 1, makin

collocationan attractiveapproach for problems in w


dients near boundaries are expected. The differentiation

given by

CL (-1) l +j

operator

is

I-EIT,

Dn,lj

2(1-Xj)
2n2+1

2n2+1

+ jn
where - = 1 +
As with the Legendre-Galerkin method, the natural setting for
Cheby_
Shev collocation is the domain (1,1). For our example problem,
(2.91)
this domain,the equations of the

transformedinto
tion approximationare

4(D2) U + U

Chebyshevcolloca_

This gives n 1 equations; the additional two equations come fromthe

boundary conditions: [Jo= Un = 0. This is a set of


1 algebraicequations in n + 1 unknowns and can be solved in the usual way. Because
it uses orthogonal polynomials as trial functions, the Chebyshevcollocation method also achieves the exponential convergence illustratedin
the Legendre-Galerkinexample.

2.10 Exercises
Exercise 2.1: A linear constant coefficient problem
Find the general solution to
where

= Ax

-1 -1
1 -1 -1
o

Express it so that only the arbitrary constants are (possibly) complex. Youshouldbe
able to solve the problem without explicitlyperforming any similarity transformationS,
i.e., you should not need to invert any matrices.

Exercise 2.2: Phase plane dynamics of a linear problem


Find the general solution to

= Ax

2, 10 Exercises
221

where

14 -16
sketch the dynamics on the phase
Plane
to showthe invariant directions and in the original

the stability

Exercise2.3: Members of function


spaces

coordinate system,

along those

being careful

directions.

Determine which of the following


functions are
in the linear
space spanned
by the set
2x

1. cos
2. cosx(cosx + sin x)
3. 1 + sin2 x
4. 1 + cosx

Hint:remember to look at the basic trigonometric


identities.

Exercise 2.4: Weighted inner products and


approximation of singular
func-

Considerthe function f (t) = 1 It in the interval (0,


1].
(a) Show that f (t) is not in L2(0, 1), but that
it is in the Hilbert space
Law(0,1),
where the inner product is givenby

(x,y)w x(t)(t)w(t)dt
andw(t)

= t2.

(b) From the set {1, t, t2 t3, t 4}, construct a set of ONbasis functions
for L2,w(0, 1).
These are the first five Jacobi polynomials (Abramowitzand Stegun,
1970).

(c) Find a five-term approximation to 11t with this inner product and basis. Plot
the exact function and five-term approximation. Computethe error betweenthe

exact and approximate solutions using the inner product aboveto define a norm.

This type of inner product is sometimes used in problemswherethe solution


is known to show a singularity. As your analysis will show,polynomialscan be
used to get a fairly good approximation except very near the singularity.
Hint:this problem is a good excuse to begin using a symbolicmanipulationprogram
like Mathematica. The calculations are not hard, but they are tedious and that is exactly

the kind of problem Mathematica is good at.

Exercise2.5: Fourier series of a real function


For a real function f (x) with Fourier series representation
the Fouriercoefficients satisfy Ck= c -k.

Exercise2.6: Fourier series of a sawtooth function


Considerthe "sawtooth" function in the domain 0 < x
if x < Tt
x

f(x) = 2TTX
ifxr

27T

Cke ikx,

show that

Ordinary Differential
222
for this function using the basis functions
coefficients
Findthe Fourieras 1/1<2 as Ikl 00.
decay
that they

series of a square wave


Fourier
Exercise2.7:
for the "square wave" function
exercise,but
Repeatthe above

if X < TT

1ifxr

the fact that this function


the integrals by using
of
is
all
redoing
Avoid
(so its Fourier series is the derivative of that

derivativeof the sawtooth Fourier coefficients decay as l/k as Ikl


the
Showspecificallythat
approximation to this function , 1s.e, 10 s k
s 10
to plot the 10-term
MATLAB

of the finite element method


Exercise2.8:Basisfunctions
described in Section 2.9.1.
Considerthe hat functions
inner products
(a) For N = 2, find the
the set orthogonal?

= 0,

' ..,N.1s

the function f (x) = 1 + x(l x) in terms


(b) Approximate(in L2(0,1))
of thehat
solve
a
and
linear
find
system
is,
That
2.
=
for
N
with
the
functions
coefficients
Ciin the expression f (x)
integrals.
Hint: use symmetry to save time evaluating

Exercise2.9: Parseval's equality


Consider a function f (x) , represented in an orthonormal basis as a generalized Fourier

with Ci = (f (x), (i(X)).

series:f (x) = E

ICi12. This result is called PARSEVAL'S


EQUALITY.

(a) Show that llf112=

(b) Consider a truncated approximation, f (x)

E i=0

where

mightbe

different from ct.Showthat in fact, the truncation error


Ei=o
when
=
Ci.
This
result
shows
that
is smallest
the generalized Fouriercoe
cients are the optimal coefficients for the truncated representation of f.

Exercise2.10: Fourier series of a triangle wave


Consider the Fourier sine series approximation for the triangle wave depictedin Figure 2.32.

fM(x) =

n=l

an sin(nrrx)

(a) Find the coefficientsan, n = 1, 2

To save time you may find the following

integral formulas useful

(mx + b)

sin(nrx)

x e [0, 11

= mx + b cos(nrx)

nrr

1
= -nm,

sin(nTtx)
+ (TITT)2

n, m = 1,2,..

2.10

Exercises
223

Figure 2.32: Triangle wave on [0,


1].
(b) plot the function fM(x) for M = 5, 10, 50 with parameter a = 0.1 to demonstrate
convergence to f (x). How many terms are required to obtain
good accuracy?

Exercise2.11: Differentiating integrals

Usethe Leibniz rule for differentiating integrals to solve the following two problems.

(a) Checkthat the solution to the differential equation

dy +

dt

= q(t)

with initial condition y (0) = yo is

Rememberto show the solution satisfies both the differential equation and initial
condition.

(b) Derivea Leibniz rule for differentiating the double integral


b(t) d(t,p)

f(t) =

h(t, p, s)dsdp
a(t) c(t,p)
Youranswer should not contain the derivatives of any integrals.

Exercise2.12: Convolution theorem


(a) Usethe definition of the Laplace transform to derive the convolution theorem

f(t')g(t t')dt'l =

(b) Use the definition of the inverse Laplace transform to derive the convolution
theoremgoing in the other direction

- t')dt'
Whichdirection do you prefer and why?

Ordinary

224

Differential

eqt.4Q

Exercise 2.13: Final and initial-value theorems

Ohs

Twouseful theorems are the final and initial-value theorems


lim f (t) = lim sf (s)

if and only if sf (s) < oofor all s such that Re(s)


otherwise lim f (t) does not exist

00

and

lim f (t) = Slim


00sf (s)

(a) The conditions on sf (s) for the final-value theorem are crucia
below, state which satisfy the conditions and give their final 1.Forthe
values functions
1.

2.

3.

4.

1
1

s(s a)
1

s(s + a)

Re(a) > O

Re(a) > 0

(b) What are the initial values, f (0+)?

(c) Invert each of the transforms to get f (t) and check your results.

Exercise2.14: Network of four isomerization reactions


Consider the set of reversible, first-order reactions

taking place in a well-mixed,batch reactor. The reactions are all elementaryreactions


with corresponding first-order rate expressions. Let the concentration of the species
be arranged in a column vector

CA
CC CD CE
(a) Write the mass balance for the well-mixed, batch reactor of constant volume

dc

What is K for this problem?

dt

(b) What is the solution of this mass balance for initial condition c(0) = co?What
calculation do you do to find out if this solution is stable?

(c) Determine the rank of matrix K. Hint: focus on the rows of K. Justifyyour

answer. From the fundamental theorem of linear algebra, what is the dimension
of the null space of K?

(d) Whatis the condition for a steady-state solution of the model? Is the
state unique? Whyor why not?

steady

2.10 Exercises
Exercise2.15: Network of
consider the generalization first-order
of Exercise

taking place in a well-mixed,

chemical
reactions
2.14 to
the

following
set of n

batch reactor.
The reaction
rate for the
ith

Letthe concentration of the


species be
arranged in a
columnvector
CAI CA2
(a) Write the mass balance
for the
well-mixed,
batch reactor

dt

reactionis

of constant
volume

(b) What is the solution of this


mass balance
for initial
conditionc(0) =
(c) What is the steady-state
co?
solution of the
model? Is the
steady state
unique?Why
(d) What calculation would You do
to decide if the
steady state is
stable?

Exercise2.16: Using the inverse Laplace


transform formula
Establish property 4 of the Laplace

transform pair given


in Section2.2.4,
whichstates

df(s)

ds
This formula proves useful in Exercise 3.19.

Exercise2.17: ODEreview
Solvethe following ODEs: unless boundary conditions are
given, find the general solution:

(a) y' = ex+0 ' (separable)

(b)

= y 2,y(0) = 1 (separable)

(c) (y 2x)dy (2y x)dx = 0 (exact)

(d) (x 2 y 2)dy = 2xydx (integrating factor)


(e) xdy + (y + eX)dx = 0 (integrating factor)

Exercise2.18: General solution to a first-order linear system of ODEs


Findthe general solution to j' = J y, where

oil

x 10

Ordinary Differential
226

A linear
Exercise2.19:

systemdynamics on the phase plane

system
Considerthe

-1

1 -1

x + h(t)

problem, i.e., with


solution to the homogeneous
h(t) O,
general
the
(a) Find
acterize its stability.
solutions on the
-- x2 plane.
qualitative behavior of
Where
the
Sketch
(b)
2.1-2.2?
Figs.
does
on
this system fit
and characterize
inhomogeneous problem with h(t) = (1,
the
solve
Now
(c)
its
stability.

of a freely
Exercise2.20: Dynamics

equations
Consider the system of

rotating rigid body

-13)

(13 11)

13)3 = (01 (02 (Il 12)

equations describes the motion of a rigid body


11 > 12 > 13 > 0. This set of
freely

of inertia of the body relative to eachof


rotatingin space. The Is are the moments
the
the
angular
are
velocities
(os
the
with
and
body
the
respect
of
to those
principalaxes
axes.

(a) If( = ((l, )2,(3) is a steady state of this system, find the linearizedequation
= ((l, (2,()3)from the steady state.
for

(b) Find three steady states of the system that satisfy (01 + c02 + (03 = 1. Whichare
linearly stable?

(c) Sketch,in the ((l, (2, (3) phase space, the qualitative behavior of trajectories
that begin near each of the steady states, using the linearized equationsasyour
guide.

(d) Yourresults can be tested experimentally. The principal axes of a bookare,in


order of decreasing moment of inertia, the axis passing through the frontand
back covers, the right and left sides, and the top and bottom. Experimentally
assess the stability of free rotation of a book with respect to these threeaxes.
(Youhave to do something to keep the covers from flying open whilethe book
spins.) Do the theory and experiment agree?

Exercise 2.21: Duffing's equation


DUFFING'S
EQUATION
describes the dynamics of an undamped beam
x + x 3 = 0

0
wherex is proportionalto the
beam. When >
displacement
the
of
the
middle
Of
the beambuckles:the "unbuckled"
state x = = 0, is unstable.
(a) The two nontrivial steady
states are x =
linearizations around those states.

Ofthe
= 0. Find the eigenvalueS

2.10 Exercises
(b) In this model there is
no friction
so the
total energy
(kineticPlus
elastic)is
x4

A given initial condition


will have a
tory must have the
specified
same
a curve of constant H. value of H for all value of H, and
the
time,
Show that
the trajectoriesso a trajectory resultingtrajecstates are closed curves
and thus that
near the two in phase spaceis
the linearized
nontrivialsteady
equations

givethe correct

Exercise2.22: Predator-prey model

The following model describes a


"predator-prey"
= Xl(l

In this model,

> O and

> 1, and

system: species
1 eats the
grass

and

X2)

and

(a) There are three steady states to this

represent the sizes

model.Findthem.

(b) Find the linear stability of each of


the steady

system, the trace and determinant

of the prey and

states. Since

criterioncanbe used.this is a 2-dimensional

(c) Draw the phase-plane behavior near


each of these steady
states.

Exercise 2.23: Cell in shear flow


The following differential

equation arises from a model


of a cell moving in a shear
flow
= A+ cos 20

where O is the orientation angle of the cell with respect


to the flow
a parameter that is determined by the geometry and mechanics direction and A is
of the cell (Kellerand
Skalak, 1982).

(a) For A = 0, there are four steady states in the domain0 <
and determine which ones are linearly stable.

2TT.Findthem

(b) Draw the trajectories in phase space for A = 0, along with the steady states.
Here phase space is simply the line, and since e is periodic can alternatelybe
considered to be just a circle with unit radius.

(c) For A larger than a certain value, this equationhas no steady-statesolutions.


What is that value? What do the phase-space dynamics look like, i.e., draw a
picture, when A exceeds that criticalvalue?

Exercise 2.24: Steady-state heat conduction in an annulus

in FigConsiderthe steady-state conduction of heat in a solid annularregionshownrate is


heat-generation
ure 2.33. There is uniform heat generation in the solid. The
givenby

Q = so(l +

- To))

228

Ordinary

Differential

Equ

Qti0hs

Figure 2.33: Annulus with heat generation in the


solid
in which is a dimensional constant. The inner wall of the annulus
is
the outer wall is at constant temperature To. The material has thermal insulated
conductivityand
(a) Writethe steady-state heat equation with the source term.
k
(b) Define dimensionless variables
soR2

Showthat the model reduces to


+ 2 = 1

with boundary conditions

(c) Whatis the complementaryfunction?


(d) Byinspection, what is a particular solution?
(e) Using the two boundary conditions, specify the two unknowns in the comple
mentary function.

(f) Plot

for the followingvalues


K = 0.8

[1, 3, 5, 7, 7.5]

profile
Exercise 2.25: Existence of a positive steady-state temperature
Consider Exercise 2.24 again.
and = 7.5,
(a) Plot and compare the solution 9) if you set K = 0.8
What happens as you increase in this problem?
for
assuming
you
are
What
(b) Lookagain at how you solve for the constants , c2.
this solution to exist?

2, 10

229

Exercises

ranging from O to 0.99, find and plot the critical value of such
exist. If you exceed this critical value of F,
solution for , c2 does nottransient
(c) For
the
that the
in
heat-conduction problem?
think happens
whatdo you
es
K valu

through a porous medium in a tube

2.26: Flow
Exercise modificationof Darcy's law for flow in porous media is
an's
Brillkrn

be containing
flowin a tu
axial
For
1 d

r dr

a porous medium this becomes


dvz

dr

permeability of the porous medium


viscosity of the fluid
pressure difference + gravity driving force
z-component of the "superficialvelocity"v
radius of tube
length of tube
boundary conditions are vz (R) = 0 and vz (O) < 00.
Reasonable
dimensionless velocity and radius
(a) Introducea
vzuL

in which

A'2R2

and rewrite the differential equation and the boundary conditions in terms of
the dimensionlessvariables. Howmany dimensionless parameters does the new
differentialequation contain?
(b) Obtaina particular solution of the differential equation obtained in (a) by inspection.

(c) Obtainthe solution of the homogeneous equation; it should contain two constants. Oneconstant can be immediately evaluated from the boundary condition
at p = 0. Why?

(d) Evaluatethe remaining constant using the boundary condition at p = 1. Write


thegeneralsolution (p) to the differential equation. Plot (p) for permeability
K/R2 = 0.01,0.1, 0.3, 1.0. Also include on this plot the velocity profile for HagenPoiseuille flow.

(e)Evaluatethe averagedimensionlessvelocity W) and show that

m(p) dp
Jolpdp

1-

11(+)

PlotW) versus K/R2 with a log scale for the


x-axis for 10-4

K/R2 102

Ordinary

230
(f) Showthat in the limit of

Differential

small permeability,K, the result

(g) Showthat in the limit as K


an empty tube).

00,

(which is exactly

in (e)

the result

for flow
in

Exercise2.27: Laguerre's equation


The ODE

xy" + (1 x)y' + Ay = 0
whereAis a constant, is called Laguerre's equation. It arises in determinin
function for the electrons of a hydrogen atomthe orbitals that you
lear:
quantum mechanics (and thus the structure of the periodic table) emerge
in partfrom
(a) Showthat x = 0 is a regular singular point.
(b) Determinethe roots of the indicial equation and one solution to this

problem

(c) Showthat when is a positive integer, this solution reduces to a polynomial


These polynomials are called the Laguerre polynomials.

Exercise 2.28: Hermite's equation


Hermite's differential equation is
u" 2xu' + 2ku = 0
Among other places, it arises in the solution of Schrdinger's equation for a particlein
a potential well.
(a) Write Hermite's equation as Lu + Au

0, where

form of a Sturm-Liouvilleoperator, Lu
w(x) = e-x2. Whatare p(x) and r (x)?

= 2k and L takes the standard

(b) Consider the inner product

(a,

= lim

a(x)b(x)w(x)dx

where w (x) is as given above. What boundary conditions must we imposein


=
the limit C 00so that L is self-adjoint, i.e., so that (Lu,
(c) The point x = 0 is an ordinary point for this equation. Find the generalsolution
by series expansion around this point. Show that if k is an integer, onesolution
to the equation is a polynomial. These polynomials are known as the Hermite
polynomials.

Exercise 2.29: Series solution


Find the general solution to the differential equation
(x 2 x)u" xu' + u = 0

Start by seeking a solution of Frobenius form, expanding around x = O.

2.10 Exercises
231

Exercise2.30: Another series solution


general solution to
Findthe

5x2y"

+ (x3

Expandaround x = 0 and keep up to quartic


terms.

=O

Exercise2.31: Bessel's equation: singular


solution
equation of order
TheBessel

zero is

and the associated Cauchy-Euler equation is


x 2y" + xy'
=0

(a) Findthe general solution to this Cauchy-Euler


equation.
(b) Motivated by this result, seek a second
solution to the the
the form Y2(x) = Jo(x) Inx + g(x),
Besselequation,of
where g(x) has a
power series solution.
be convenient to note that g is even
It
and write it as
2n
x
g(x)
first
two
the
terms
Find
in the power series
= En=ocn (2)
for g.

Exercise2.32: Sturm-Liouville problem with mixed


boundary condition
Sturm-Liouville
Considerthe

eigenvalue problem

u" + Au = 0,

u(0) = O,u(l) + u' (1) = 0

Findthe eigenfunctions of this problem and the nonlinear


algebraic equation that determines the eigenvalues X. (This equation cannot be solved
analytically.)Draw a sketch
that there will

be an infinite number of these eigenvalues,


that indicates
and use your
sketchto propose an approximation for the eigenvaluesthat is valid in the situation

Exercise2.33: A higher-order variable coefficient problem


Findthe general solution to the third-order equation

x y +3x2y" 3xy' = 0
Exercise2.34: A fourth-order variable coefficient ODE
The following differential equation arises in the analysis of time-dependent flow of a

polymericliquid

x 2D 2 -x 2 +2-2xD)

-2iD

whereD = a-. This equation has solutions of Frobenius form. Find the roots of the
indicialequation.

Ordinary Differential

232

Legendre's
Exercise2.35:
Legendre's equation

have the form

is

equation

(1 x2)y" 2xy' + 1(1+ 1)y = 0

| (1+1)

Y2(x) = x

4!
5!

3!

recursion relation, show that for every integer l, one of these


Byexaminingthe
series
a polynomial. These are the Legendre polynomials.
willtruncate,becoming

equation
Exercise2.36:Chebyshev's
Chebyshev's equation is

(1 x2)u" xu' + v2u = O


the approximation of functions and in numerical
Its solutions are important in problems.
solution
methods for boundary-value

problem Lu + Au = 0 in the
(a) Put this in the form of a Sturm-Liouville What
domain

(1
boundary
=
w(x)
v2,
=
conditions
with
1],
[1,
mustu
and u' satisfyat x = 1 for self-adjointness to hold?

(b) Byexpandingin a power series about x = 0, obtain two LI solutionsof this


equation. Showthat when v is a nonnegative integer, one of these is always
a

polynomialof degree v. Because these satisfy a Sturm-Liouville problem,these


polynomials form an orthogonal basis for L2,w (1, 1), with w (x) = (1 x2)-112

(c) The points x = 1 are regular singular points for this equation. As a firststep
toward findingthe behavior of the solution near these points, find the rootsof
the indicialequation for a solution in Frobenius form expanded aroundx = 1.

Exercise 2.37: Laplace's equation as second-order, variable coefficient ODES


+
Expressthe radial part of Laplace's equation V2y y = 0 in the form
al
+
= 0.
(a) What are ao, al , (.12for one-dimensional rectangular coordinates, cylindricalcoordinates, and spherical coordinates?

(b) What are two linearly independent solutions for each coordinate system?

Exercise 2.38: How many solutions?


Considerthe second-orderdifferential equation

d2u
(a) Howmanylinearlyindependent

solutions exist for the single boundary

u(0) = u(l)

condition

2. 10 Exercises
(b) How many
lin early
in
deendent
(c) How many
solutions
lin early
ZZ(O)
in

deenaent

(d) What can


you
order differential
conclude
ab
operator

Exercise 2.39:

Heat

solutions

QRist
for the

Oun

twob

conditions
ound

conditions

conduction
With
for
heat

conduction
Supposewe set up
heat
the body at the samethe problem
Witha
temperature.
(a) Identify the
temperature
controller
tional Bl, so appropriate
that keeps
this problemdifferential
the ends
operator, L,
can be written
of
and
as
associated

boundary func-

BIT = O
(b) Notice that we
do
not
solve (2.98) uniquely. have enough
boundary conditions
Define the adjoint
to expectto be
operator and adjoint
ableto
boundary
func(Lu,v) = (u, L*v)
for every admissible
u
ary condition on u (x) , (x) and v
that sinceYouare
you will require Notice
missinga boundthree boundary conditions
on v (x). What

(c) What are the null spaces of L


and L* with their associated
For which f can (2.98) be solved?
boundaryconditions?
Is the solution

general form of the solution?

unique?If not,whatis the

(d) Solve (2.98) using any method at your disposal.Laplacetransforms


would
work,
for example. Check your solution by substitutingintothe differential
equaand
tion and boundary condition. Does your solutionagreewiththe existence

uniqueness result you determinedpreviously?

g apidentifythefunction
(e) What is the Green's function for this problem,i.e.,

pearing in the T(x) solutionas


1

f
not involving
g (x, E)f (E)d; + terms

Ordinary Differentia/
234

conditionsand general solution for


Solvability
2.40:
Exercise

differential operator
second-order
Considerthe
Lu --

and two boundary


conditions

dX2

Blu = u(TT)

du
dx

B2udx

conditions, L* , Bf, and B*


operator and boundary
adjoint
(a) Find the
and N(L*).
null spaces N(L)
the
Find
(b)

can you solve the


(c) Forwhat f

nonhomogeneous problem
Lu = f (x)

Bl(u)

(d) Forf

B2(u) = Y2

cosx) = Y2
answer:(f, sinx) = Yl (f,
what is the general solution?
satisfyingthis solvability condition,
answer:u(x) =

f (9 sin(x 9d + a cosx + b sinx

Exercise2.41: Steady-state temperature profile

in Examples 2.11 and 2.12 using Laplace


Solvethe steady-state heat-conduction problems
transforms.

Txx= f

Txx=f

Tx(0) = O

Exercise2.42:Heat-transfer boundary conditions


Considerthe one-dimensionalsteady-state heat-conduction problem

-k d2T(x)
dx2 = (x)
d2T
dX2

=f

f = -/k

Consider Newton's law of cooling boundary conditions

ho(Teo- TO)) = -kTx(O)


= -kTx(1)
in whichho, hi are the
heat-transfer coefficients at the two ends, and Teo,Tel arethe
temperatures

providingthe heat-transfer driving forces at the two ends.

2.10

Exercises
235

(a) Write

this problem as

BIT = Yl
What are D, Bl and B2, and

B2T =

and )'2?

(b) Solvefor the steady-state temperature profile.


(c) For what f(x) does the solution exist? For
these

Exercise2.43: Orthogonality of Sturm-Liouville

f(x), is the solution


unique?

eigenfunctions

of a Sturm-Liouville
showthat two eigenfunctions ul and
problem(pu')' + ru +
orthogonal
if
the
inner product weighted
= 0 are
with
w
is used.
zeroboundary conditions u(a) = u(b) = 0. Multiplythe equation Consideronly
for (setting
by 112',multiply the equation for
(setting = 2* M)
by 111',subtract and
integrateover the interval. Use the boundary conditions
and integration by parts to
proveorthogonality.

Exercise2.44: The convection-diffusion operator


Forproblems with convection and diffusion, an important differential operator
is

d2u
du
dx2 + pe
dx
boundary condition u(0) = u(l) = 0. Pe is the PECLET
number,measuringthe

relativeimportance of convection and diffusion.

(a) Find the adjoint of this operator, first with an inner product with a constant

weight function w (x) = 1, and thenwiththeweight functionw(x) = exp(-Pe x).

(b) Solvethe eigenvalue problem Lu + Xu = for arbitrary Pe. Hint: since the equa-

tion has constant coefficients, express the solution as y(x) = eiwc. Plot the

eigenfunction corresponding to the first (closest to zero) eigenvaluefor Pe = 10.

Exercise2.45: Testing a CSTRoperating condition for stability22


Thereaction

r = kCA= koe-EITCA

is carried out in a CSTR.The mass and energy balances are given by


dCA

dt

kCA

dt - VRPp

Tf-T kCA
pep

conditions in the followingtable.


Find the three steady states corresponding to the
steady states is stable or unstable.

Determinewhether each of these three


22

See also Exercise 6.7 in Rawlings and Ekerdt (2012)

Ordinary

236
Parameter
E

f
CA
110

Differential

Value
7550
298

Units

kmol/m3

0
-2.09 x 10 8

4.48 x 106
4.19 x 103
103

J/kmol

vs

J/(kg K)
kg/m3

18 x 10-3

m3

60 x 10-6

m3/s

solver
Exercise 2.46: Choosing an ODE

the dynamics of a chemical reactor


You are given the task of modeling
in
rate constants for the reactions which a large
The
occurring.
are
reactions
of
number
vary between
code on a fourth-order Runge-Kutta
Is-I and 107s -1. Willyou base your
scheme,an
scheme? Why?
explicit Euler scheme, or an Adams-Moulton

Exercise 2.47: Numerical stability criterion for RK2


Derive the numerical stability criterion for integrating the single equation

vsith the second-order Runge-Kuttamethod. Allow

to be complex. Hint: the = Ax


general

solution to a linear constant-coefficient difference equation anx (n) + an- x(n- l)


0 is of the form x

Exercise2.48: Dynamics of a nonlinear problem


Consider the pair of ODES

521= (1 -M) - 10Y12Y2


522 = -0.05Y12Y2

(0) = 0.2, Y2 (0) = 1


with initial conditions
(a) Find the Jacobian of the RHSat t = 0. Show, using the eigenvalues of the Jacobian, that the you expect the problem to be stiff.

(b) Writea computer program to use the Adams-Moulton second-order methodto


solve the initial-value problem. Integrate the equations out to t = 20 and plot
the solutions. Can you find any stability limit on the time step?
(c) Write a second-order Runge-Kutta program and attempt to use it for the above

problem.Whattime step do you have to use to get a stable result?

(d) Modifyyour RKcode to use variable time steps. Use the criterion that At <
/5. Estimate tmtn from the values of y/ S' at each time step.

Exercise 2.49: Solutions of difference equations

many
When examining the numerical stability of integration schemes, as well as in
other situations, we run across the linear constant-coefficient difference equation
(2.99)
aMYn+M+ aM-1Yn+M-1 +
+ aoYn = O.

2.10 Exercises

237
Forexample, could be the value of y at the
nth
(a) Showthat this equation can be written in vector time step of someprocess.
form
= Gxn
Whatare x and G in terms of y and a coefficients?
(2.100)
(b) Giventhe initial condition xo, find the solution
to this equation
of n and xo) in the situation where A has
distinct eigenvalues (i.e.,xn in terms
X.
the
case
for
where
Repeat
(c)

(d) What is the general criterion for asymptotic stability


of the steady state

Exercise2.50: Numerical integration for undamped oscillations


initial-value

x = O?

problems = f (u) are important


second-order
for many applications.
f
(u)
case
q2udo the following:
Forthe specific
(a) Find the exact general solution.
(b) Byletting = v convert the equation to a pair of first-order
equations and show
that the forward Euler method is always unstable for integrating
these.
(c) Consider the following numerical integration formula
tin +1 2un + un-1 =

For f (u) q2u find a quadratic equation for the growth factor G for this
method,i.e., look for solutions of the form un+l = Gun. Up to what threshold
(qAt)2 are the numerical solutions stable?
(d) Byexpanding all terms in Taylor series around time step n, find the local trun-

cation error p of this formula (the first power of At that does not cancel).

Exercise2.51: The velocity Verlet algorithm of molecular dynamicssimulation

is very commonly used to perform numerical time


VERLETALGORITHM
The VELOCITY
integration for molecular dynamics simulations. Consider the numerical stability prob-

lemfor a very simple case

=v

= ax

wherea e R.

(a) What property must a satisfy so that the true solution x = O,v = Ois stable?
(b) For this problem, the velocity Verlet algorithm becomes

xn+l = xn + vnAt + axnAt


Vn+l = vn + (aXn+ axn+1)At

Putthis expressionin the form

wherex = (x, v)

Ordinary

238
(c) Find the criteria that

Differential

aAt2 must satisfy for numerical stability


of the

Exercise2.52: Stability of predictor-corrector methods

fourth-order) predictor-corrector
Denote the general (up to
x

formulas for
the

(nl)
n
= x (n ) + W PIX ( ) + P2X

x (n) + W
x 01+1)

+ p4x(n-3)

+ C2X(n)+ c3x(n-l)

in which w = aAt. The coefficientvectors of the first four Adams-Bashforth

predictors

c{l}=
p{3} -

10, 0]

-59, 37,-9]

(1/12) [5, 8, -1,


0]

19,-5, 1]

Show that combining the two steps gives


x

x (n ) (1 + WCI + w(cltvpl

+ Q)) + X (n-l) (w (cltvp2 +

x 01-2)(tv(c1tvp3 + C4)) + x(n-3) (wc11vp4)


Let z(n) = (x(n) x(n-l) x(n-2) x(n-3)), and find matrix G such that
z(n+l) =
The eigenvalues of G(w) then determine the stability of the method.

Exercise 2.53: Stability boundary of predictor-corrector methods


Given G(w) from the previous exercise, to map out the boundary of the stabilityre-

gion, consider (0 = e i0 for 0 O 27T,so co has unit magnitude, and solve the single
algebraic equation det(G(w) WI) = 0 for the complex value w as a functionof pa-

rameter e. The stability boundary of the APCmethod then comprises these valuesof
w. That is how Figure 2.27 was prepared, for example.
orderin
Now consider the class of predictor-corrector methods that use the same with
predictor
the predictor and corrector. Recall the methods in Figure 2.27 used a
through
order one less than the corrector. Find the stability boundaries for first-order
Contrastthe
fourth-order methods. Compareyour calculated results to Figure 2.34.
standpoint, which
stability results displayed in Figures 2.27 and 2.34. From a stability
class of methods do you prefer and why?
boundary. Why
You will need to increase the O interval to [0, 47T]to close the stability out the square
do you suppose this increased interval is required? Consider mappingclose?
root function on the unit circle using O e [0, 2TT].Does this boundary boundaryto

in the
Youwill need to clip off some unstable regions made by loops

match Figure 2.34.

2.10 Exercises
239
1.5

APC4'
APC3'

0.5

APC2'

-0.5

-1

-1.5

-2.5

-2

-1.5

-1

Re(NAt)

-0.5

0.5

Figure 2.34: Stability regions for Adams


predictor-corrector
methods;
Rx; APCn' uses nth-order
predictorand nthorder corrector.

2.54:Airy's equation
Exercise
Theequation
arisesin optics, quantum mechanics, and hydrodynamics, and is known as
Airy's equation.

(a) Find an approximate power series solution to this problem (expandingaround


x = N),
keeping terms up to fourth order.
(b) Usethis solution to approximate the first two eigenvalues \ of the equationwith
boundary conditions y (0) = y (1) = 0. Use Newton's method if necessary.

(c) Usethe finite element method to construct an algebraic eigenva\ueproblem for


the Airy equation. Find the approximate eigenvalues and eigenfunctionsusing
six hat functions as the approximate basis and also using 12 functions. Are all
of the eigenvalues of the algebraic problem good approximations of the exact
eigenvalues? You may use Mathematica or a programming language,whichever
you prefer.

Ordinary Differential

240

Exercise 2.55: Applying Galerkin and collocation methods


Solve the problem

x2y" + xy' + x2y =

=0

using
(a) The Legendre-Galerkin method.

(b) The Chebyshev collocation method. Recall that the Chebyshev

Exercise 2.56: Modeling a tubular reactor: convection, diffusion,


and
The equation

2u' = u" + l,u(l) = 0, u' (1) = 0


models the temperature profile in a tubular reactor in which an exoth

ermic

occurs.

reaction

(a) Find the exact solution.

(b) Use the Galerkin tau method to construct an approximate solution. Usethe
= 1, (x) = x, (2(x) = (3x2
Legendrepolynomialbasis set:
- 1)/2.
and
u'
(0)
u (1) to compare the approximate
at
look
Sketch the solution and
and
solutions.
exact

Exercise 2.57: Converting a differential operator to an algebraic operator


Solve the eigenvalue problem
x 2y" + xy' + x 2y = 0, y' (0) = y(l) = 0

using the Legendre-Galerkinmethod. You should be able to reduce this problemtoa


linear algebraproblem of the form Ac + ABC= 0. Note that because of the boundary
conditions, B will be singular, but A will not. How many basis functions do you need
to compute the first three eigenvalues to four-digit accuracy? Plot the first foureigenfunctions. This is the eigenvalueproblem for Bessel's equation of order zero.Inthe
chapter, we showed that the eigenvalues of this problem are related to the rootsofthe

Bessel function Jo.

Exercise 2.58: An eigenvalue problem with finite elements

Solve the above problem again, using the finite element method with the "hat functions"
described in Section 2.9.1. Study how the approximation converges as the numberof
node points N increases. Also look at the computation time as a function of N,

Exercise 2.59: Chebyshev collocation for a nonlinear problem

program to solve
Using the Chebyshev collocation technique, write an Octave or MATLAB
the boundary-value problem (a steady-state reaction-diffusion problem)

ET" + T- T3 = O,TGI) = T(l) = O


Studyhowthe
for e = 0.05. Use the initial guess T = 1 to find a nontrivial solution.increases.Also
approximationconvergesas the number of collocation points N + 1
look at the computation time as a function of N.

2.10 Exercises
41

Lxercise2.60: Attractivity and asymptotic

stability for
showthat asymptotic stability and attractivity
linear system:
are identical
for linear
systems, = Ax
Stability

Exercise2.61:

and asymptotic
stability for linear
systems

dt

Is this system asymptotically stable? Why


or why not?
(b) Is the system (Lyapunov) stable or unstable?
Prove it.
(c) Generalize this example and provide a
checkable condition
to test for (Lyapunov)
(d) Given this result, characterize the class of linear
systems that are
stablebutnot

Exercise2.62: Lyapunov equation and linear systems


Establishthe equivalence of (a) and (b) in Theorem 2.24.

Exercise2.63: Discrete time Lyapunov function for linear

systems

Statethe discrete time version of Theorem 2.24. Show that (a)


and (b) are equivalentin
the discretetime version.

Exercise2.64: Nonsymmetric matrices and definition of positive


definite

Forreal, square matrix S, consider redefining S > 0 to mean that XTsx > O
for
O. We are removing the usual requirement that S is symmetricin the all x e
definition
definite in Section

ofpositive

1.4.4.

(a) Define the matrix B = (S + ST) 12. Show that B is symmetric and xTBx = xTSx

for all x e Pt. Therefore S > 0 (new definition)if and only if B is positive
definite (standard definition).

(b) What happens to the connection between this new definitionof S > Oand the
eigenvalues of S? Consider statement 1. from Section 1.4.4

S > 0 if and only if \ > 0,

e eig(S)

Does this statement remain valid? If so prove it. If not, provide a counterexample.

Exercise2.65: Stabilities of a linear system


Considerthe linear, time-invariant system = Ax. Characterize the class of Amatrices
forwhichthe systems exhibit the following forms of stability.
(a) Stable (in the sense of Lyapunov).
(b) Attractive.

(c) Asymptotically stable.

Ordinary Differentia/
242
stable.
(d) Exponentially

(e) which of

stability are
these forms of

Equations

equivalent for the linear,

system?

solution to higher order


regular perturbation
a
Extending
in Section 26.4, compute
Exercise2.66:
solution of (2.67) presented

the

perturbation
For the regular
c2(r).
next term in the series,

a two-time-scalesingularper.

the outer solution in


as
QSSA
2.67:
Exercise

turbation

simple reaction
Consider the following
volume, batch reactor

mechanism taking place in a well-mixed, constant_

low-concentration species for which we wish to examine


and assume k2 > kl so B is a
the OSA.
show
(a) SolveA's material balance and
= cAoe-klt
Apply the usual QSSAapproach, set RB = 0, and show that
CBsCAs =
k2

The concentration of C is always available if desired from the total speciesbalance

ccs(t) = CA(O)+ CB(O)+ CC(0) CAs(t) CBs(t)

(b) The B species has two-time-scalebehavior. On the fast time scale, it changes
rapidly from initial concentration CBOto the quasi-steady-state value for which
RB 0. Divide Vs material balance by 1<2,define the fast time-scaletimeas
T = k2t, and obtain for B's material balance
dCB

= k1CA

CB
dT
Wewish to find an asymptotic solution for small e. We try a series expansionin
powers of for the inner solution (fast time scale)

CBi = YO+ EYI +

The initial condition, CBi = CBO,must be valid for all e, which gives for the initial
conditions of the Yn

Yo(0)
Yn(0) = O, n
Substitute the series expansion into B's material
balance, collect like powersOf
and show the following differential
equations govern the Yn

o.
1.

dT
dY1
dT
dT

n 22

2.10 Exercises
243
(c) Solvethese differential equations and show
Yo =

kl
kl/k2 -1

Yn=O n2

-- e-k1T1k2

Because vanishes for n


2, show you
concentrationfor all e by using the first twoobtain the exact solution for the B
terms.
B's
analyze
large-time-scale behavior,
(d) Next we
also called the outer
Divide B's material balance by
again but do not rescale
solution.
time
and
obtain
dCB
dt

= ek1CA -s CB

Expand CBagain in a power series of


CBO

BO +

substitute the power series into the material balance


and collectlike powers of
e to obtain the following equations

dBo

1.

dt klCA Bl

dBn

Solvethese equations and show

dt = -Btl+l

Bl = klCA
Sowe see the zero-order outer solution is CBO= 0, which is appropriatefor a
QSSAspecies, but a rather rough approximation.
(e) Showthat the classic QSSAanalysis is the first-order outer solution.

(f) To obtain a uniform solution valid for both short and long times, we add the
inner and outer solution and subtract any common terms. Plot the uniform
zeroth-order and first-order solutions for the followingparametervalues
= 10
Compareto the exact solution and the first-order outer solution (QSSAsolution).
(g) Showthat the infinite-order uniform solution is also the exact solution.

Exercise2.68: QSSAand matching conditions in singular perturbation


Consideragain Exercise 2.67 with a slightly more complex reaction mechanism

andassume that either 1<-1> kl or > kl (or both) so B is again a low-concentration


or k2 may be
speciesfor which we wish to examine the QSSA.Notice that either
for B. Onlyif
largewith respect to the other without invalidating the QSSAassumption
for this mechanism.
k-l > kl >
is the reaction equilibrium assumption also valid

Ordinary Differential

244
on species B and show
(a) Apply the QSSA
CA() + CBO

CBs

kl

1+1<2

1+1<2
CA() + CBO

1
1+1<2

) e-ft#2t

in whichK2 = k2/k-1.
the A and B species have
(b) With this mechanism, both both CAand CB.Let thetwo-time-scale
inner solution behavior,
we use a series expansion for
be givenby

+
2X2
C
XI
+
+
CAi = XO

CBi= yo + EYI+ 2Y2+...


inverse of the largest rate
in which the small parameter E is the
constantin the
mechanism. In the following we assume 1<-1is largest and E = 1/1<-1.
is order unity or smaller. If 1<2wereDefine
K2 = k2/k-1 and we assume that
large
we should have chosen e = 1/1<2as the small parameter. Collect terms of like
power of and show

axo

dT

dT

1.

dXl = klX()+ Yl
dT
dXn -klXn-1 +
=
dT

dY1

dT
dYn

dT

= klXn-1 -(1+K2)Yn

What are the initial conditions for the Xn and Yn variables?


(c) Solvethese for the zero-order inner solution and show
1

XO = CA() + CBO
1+1<2 k

-(1+K2)T

YO=

-(1+K2)T

(d) Next we construct the outer solution valid for large times. Postulatea series
expansion of the form

CBO= BO + EBI + E2B2 +

Substitute these into the A and B material balances and show

Bo=o
1.

dt
dBn-l

dt

dt

= k 1AnI-(1+K2)Bn

2.10 Exercises
245
(e) Solvethese and show for zero order

Again we see that to zero order, the


B concentration
is zero after
unlike
that,
also
Note
in Exercise 2.67,
a short time.
we
require an initial
outer solution An differential equations.
conditionfor the
we obtain the
missing initial
condition
lim XO(T)=
limAo(t)
In other words, the long-time solution
(steady state) on
the fast time scale
short-time solution (initial condition) on
is the
the slow time
scale. Using this matching
= CAO + CBC)

(0 Find also the first-order solution, Bl, and show that


the

QSSAsolutioncorresponds to the zero-order outer solution for CAand


the first-order outer solution
for

Exercise 2.69: Michaelis-Menten kinetics as QSSA


Considerthe enzyme kinetics

ESE P+E
in which the free enzyme E binds with substrate S to form bound substrate ESin the
first reaction, and the bound substrate is converted to product P and releases free enzyme in the second reaction. This mechanism has become known as Michaelis-Menten
kinetics (Michaelis and Menten, 1913), but it was proposed earlier by Henri (1901). If the
either the free or bound enzyme is present in
rates of these two reactions are such that
with the QSSA.
small concentration, the mechanism is a candidate for model reduction

and
Assume kl > 1<-1, so E is present in small concentration. Apply the QSSA
reduces to a first-order, irreversible decomposition
show that the slow time scale model

satisfies
reactor, show the total enzyme concentration
batch
well-stirred
a
For
(a)
CE(t) + CES(t) = CE(O)+ CES(O)

corresponding
concentration of E. What is the
QSS
the
for
expression
(b) Find an
concentration of ES?
model's singlereactionis
reduced
the
for
(c) Showthe rate expression
kcs
1 + Kcs

k = k2KEo

k-l + k2

Eo =

+ CES(O) (2.101)

Ordinary

246

Differential

which depends solely on the substrate concentration. The


Stant K is known as the Michaelisconstant. The productio inverse Of
n rates
th
and product P in the reduced model are then simply
Ofreact

Rp= r

(d) Plot the concentrationsversus time for the full model and
followingvalues of the rate constants and initial conditions.QssAmodel

kl = 5
CE(O)= I

-1

CES(O)= O

10

cs(0) = 50

th

cp(0) = 0

Exercise 2.70: Michaelis-Mentenkinetics as reaction equilibrium


Consider again the enzyme kinetics given in Exercise 2.69.

E+ S

kl

ES

k2

k-1 >
Nowassume the rate constants satisfy 1<1,
scale
of
the
time
the
second
equilibrium on
reaction.

so that the first


reactionis
at

(a) Find the equilibrium concentrations of E and ES

(b) Showthe production rate of P is given by

Rp =

kcs
1 +1<1cs

k = k2KIE()

= kl/k-l

(2.102)

in which 1<1is the equilibrium constant for the first reaction.


Notice
is identical to the production rate of P given in the QSSAapproach. thisform
son, these two assumptions for reducing enzyme kinetics are oftenForthisreamistakenly
labeled as the same approach.
It is interesting to note that in their original work in 1913, Michaelis
and Menten
proposed the reaction equilibrium approximation to describe enzymekinetics,
in
which the second step is slow compared to the first step (MichaelisandMenten,
1913). Michaelisand Menten credit Henri with proposing this mechanismto
explain the experimental observations that (i) production rate of P increaseslin-

early with substrate at low substrate concentration and (ii) production rate ofPis

independent of substrate concentration at high substrate concentration(Henri,


1901).

The QSSAanalysis of enzyme kinetics was introduced by Briggsand Haldane


in 1925, in which the enzyme concentration is assumed small comparedto the
substrate (Briggsand Haldane, 1925). Since that time, the QSSAapproachhas
become the more popular explanation of the observed dependenceof substrate
in the production rate of product Rp in 2.101 and 2.102 (Nelson and Cox,2000).

The reader should be aware that either approximation may be appropriatedepending on the values of the rate constants and initial conditions. Althoughboth

2.10 Exercises
247

reduced models give the same


form for
quite different in other respects.
the production
Finally,
rate of P, they
particular k-l
for some
k2, both the
are often
values
Qss assumption
of rate
constants, in
and the reaction
equilib(c) show that the slow-time-scale
reduced model
for the reaction
equilibriumas-

tri
the following rate expressions
tri =

tr2 =

tr2

kcs

k = k2KIEo

kcs

= kl/k-l

Notice here we have not reduced the number


of reactions;we still have
reactions, but as before we have reduced the
two
number of rate constantsfrom
to
1<2)
1<-1,
two
(k, 1<1).The first rate
three (kl,
expression here depends on Cs
than
rather
CE
only
and
Cs as in the previous QSSA
reduction. Thereforethe
production rates of E, ES, and S depend on CEas well as
cs. Only the production
rate of P (Rp = tr2) loses the CEdependence.
(d) Plot the concentrations versus time for the full model and reaction equilibrium
model for the following values of the rate constants and initialconditions.
kl = 0.5

CE(O)= 20

k 2 = 0.5

CES(O) = 10

cs(0) = so

cp(0) = O

Recall that you must modify the initial conditions for the slow-time-scalemodel

by equilibrating the first reaction from these startingvalues.

Exercise2.71: Asymptotic expansion of an integral


Findan asymptotic expansion of the integral

f(x) =
integration by parts. Showthat the approxfor large positive values of x. Use repeated

imationis asymptotic as x

00.

always power series


Exercise2.72: Asymptotic series are not
to the two solutions of
Find the leading-order approximations

xe X= E
for

x =
1. Seek solutions of the form

where (E)

1.
1 and one where () >

one
find two dominantbalances:

Ordinary Differential
248

eigenvalue problems
Perturbed
Exercise2.73:
problem
eigenvalue
Considerthe

whereA

Hint:

matrix and
is an n x n

Ax + B(x) = Ax

B(x) and x are Il-vectors. Assume that the

and uniqueness
existence
the
roiew

theory for linear equations.

analysis of a problem with a pitchfork


Multiple-scales
bifur
Exercise2.74:
cation

Consider the system

of equations

1/2 2
cy

e l / 2 xy

are both ord( 1). (They have already been scaled by 112.)perform
Assumethat x and y
letting to = t, ti = el/ 2t, t2 = ct. Show that

the solvability

expansion,
a multiple-scales
conditions require that

yo
to

yo
tl
dyo = RYO+ Yo
dt2

solutions of the amplitude equation for yo?


when to > 1. What are the steady-state
1and +1.
between
Sketchthe steady states as varies

Exercise2.75:Degenerate pitchfork bifurcation


Considerthe one-dimensional system

wheref (x; u)

and fxxx

0 at x = 0. Although this equationhas

the correct symmetry to display a pitchfork bifurcation, (2.81) does not hold because
fxxx = O.

(a) Derivethe correct normal form in this case and draw the correspondingbifurcation diagram(s).

(b) Nowlet fxxx be nonzero, but very small. How are the above bifurcationdiagrams modified?

Exercise2.76: Multiplescales to determine stability of a time-periodicsolution

Consider the stability of a periodic orbit of a nonlinear system. Let XP(t) = xp(t + T)
be a time-periodicsolution of the differential
equation
Nowlet x = xp(t) +

1.

2.10

Exercises

(a) show that

the linearized

equation for
z

whereA(t) = A(t + T) is a mat

249
takes the

A(t)z

rix operator

(b) The DAMPEDMATHIEU EQUATION


is a
time-periodic coefficients.

With

form

time-periodic

coefficients.
It is (writtenparticular case
as a single of a linear
+
second-orderequation With
+ (0 2 +
equation)
ecos2t)x =
< < 1,u =
Letting =
0
determine the stability of the ord
point z = O.
1/2. (Although this equation
Showthat
put
in
in second-order
the form z Ois stable
a solution of the form x(to, to form.) Use time scales
easiertowork
= A(tl) COs
to t,
to +
andassume
+
Exercise2.77: Oscillator with slowly

varying

Usethe multiple-scales approach with ti = t,


solutionto the problem of an oscillator with
2 d2Y

frequency

=
to find
slowlyvarying the leading-ordergeneral
frequency

Assumethat w (t) > O in the domain of interest.


Show that a
leading-order
the form yo r(tl)
will not work,
but that a solution solutionof
yo
form
r(tl)
general
more
of the slightly
will. You
see
the
that
quantity r2(k) is independent
from
scalesresult
of tl, to leading the multipleorder:it is a so-

Exercise 2.78: Multiple-scales solution to a nonlinear


oscillator problem
Usethe method of multiple scales to find a leading-order
solutionto the nonlinear
oscillationproblem
k + (x 2

Usetimescales to = t,

+ x = 0,

x(0) = 1,

= 0

= ct.

Exercise2.79: Synchronization of oscillators


Huygenswas the first to observe that two oscillators (mechanicalclocksin his case)

whose natural frequencies (01 and (02 are close but not identical can be synchronized

has since
("phaselocked") if they are coupled to one another. Suchsynchronization
been observed in a diverse range of applications, including coupledchemicalreactors.
Asimplemodel for a pair of coupled oscillators is
I

(01 + Kl sin(02

01)

sin(1- 02)

Thus these equations


where and 02 are the phase variables for the two oscillators.
when the phase difference
describe trajectories on a torus. Synchronization occurs
of to determine

Analyze the dynamics


= 02 - 01 attains a stable steady-state value.
Drawthe bifurcation
synchronized.
are
oscillators
the range of parameters in which the
synchronized
system passes from the
the
as
torus
the
on
happens
diagram. Draw what

to theunsynchronized state.

Bibliography

M. J. Ablowitz and A. S. Fokas. Complex Variables: Introductio


n and
tions. Cambridge University Press, Cambridge, 2003.
AliCQ.
M.Abramowitz and I. A. Stegun. Handbook of Mathematical Fu
nctions.
Bureau of Standards, Washington, D.C., 1970.
National

R. G. Bartle and D. R. Sherbert. Introduction to Real Analysis.


Sons, Inc., New York, third edition, 2000.

John Wiley

C. M. Bender and S. A. Orszag. Advanced Mathematical Methods


for Scientists
and Engineers. I. Asymptotic Methods and Perturbation

Theory. Springer
_

Verlag, New York, 1999.

G. E. Briggs and J. B. S. Haldane. A note on the kinetics of enzyme


action.
Biochem.J., 19:338-339, 1925.

C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. Spectral Methods


in
Fluid Dynamics. Springer-Verlag, Berlin, 1988.

C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. SpectralMethods:


Fundamentals in Single Domains. Springer-Verlag, Berlin, 2006.

W. J. Cody. Rational Chebyshev approximations for the error function.Math.


Comp.,

1969.

P. A. M. Dirac. Principles of quantum mechanics. Oxford, ClarendonPress,


fourth edition, 1958.
M. V. Dyke. Perturbation Methods in Fluid Mechanics. Parabolic Press, Stanford,

CA,annotated edition, 1975.

C. Gasquet and P. Witomski. Fourier Analysis and Applications. SpringerVerlag, New York, 1999.

H. Goldstein. Classical Mechanics. Addison-Wesley, Reading, Massachusetts,


second edition, 1980.
D. Gottlieb and S. A. Orszag. Numerical Analysis of Spectral Methods:Theory
and Applications. SIAM,Philadelphia, 1977.
NewJerM. D. Greenberg. Foundations of Applied Mathematics. Prentice-Hall,

sey, 1978.

250

Bibliography

M.Grmela and H.-c. ttinger.

Dynamics
fluids. 1.Development of a
and
general
formalism. thermodynamics
of complex
Phys.Rev.
E,
J. Guckenheimer and P. Holmes.
Nonlinear
and Bifurcations of vector
Oscillations,
Fields.
Springer
Dynamical
Verlag, New
systems
York, New
York,
M.V. Henri. Thorie
gnrale de
l'action de
quelques

E. J. Hinch. Perturbation Methods.

Cambridge

M. W. Hirsch and S. Smale. Differential

diastases.

University Press,

Equations,

Comptes

Cambridge,

Dynamical Systems

and

T. J. R. Hughes. The Finite Element


Method. Dover,
Mineola, New York,
2000.
E. L. Ince. Ordinary Differential Equations.
Dover Publications
Inc., New York,

G. looss and D. D. Joseph. Elementary


stability and Bifurcation
Springer-Verlag, Berlin, second edition,
Theory.
1990.
S. R. Keller and R. Skalak. Motion of a
tank-treading ellipsoidal
particle in a
shear-flow. J. Fluid Mech., 120.27-47,1982.
J. Kevorkian and J. D. Cole. Multiple Scale and
Singular Perturbation Methods.
Springer-Verlag, New York, 1996.
H. K. Khalil. Nonlinear Systems. Prentice-Hall, Upper Saddle
River, NJ, third
edition, 2002.
O. Mangasarian. Nonlinear Programming. SIAM,Philadelphia, PA,
1994.

L. Michaelis and M. L. Menten. Die Kinetik der Invertinwirkung. Biochem.Z,

49:333-369, 1913.

A. H. Nayfeh and D. T. Mook. Nonlinear Oscillations. John Wiley& Sons, New


York, 1979.
A. W. Naylor and G. R. Sell. Linear Operator Theory in Engineering and Science.

Springer-Verlag, New York, 1982.


D. L. Nelson and M. M. Cox. Lehninger Principles of Biochemistry. Worth Pub-

lishers, New York, third edition, 2000.


E. Polak. Optimization: Algorithms and ConsistentApproximations.Springer
Verlag, New York, 1997.

BibliogtQh

252

ess

Cambridge, 1992.

Analysis and
Ekerdt. Chemical Reactor
Design Fund
G.
J.
and
second
J. B. Rawlings
WI,
edition,
Madison,
2012.
Q.
Publishing,
mentals. Nob Hill
Control: Theory
Q. Mayne. Model Predictive
and Design
J. B. Rawlings and D.
2009.
WI,
Madison,
Nob Hill Publishing,
J.-B.Wets. VariationalAnalysis. springer-Verlag,
R. T. Rockafellar and R.
1998

E.D. Sontag. Mathematical Control


edition, 1998.

Theory. Springer-Verlag, New York,


second

I. Stakgold. Green's Functions and Boundary Value Problems. John Wiley&


Sons, New York, second edition, 1998.
G. Strang. Introduction to Applied Mathematics. Wellesley-Cambridge press
Wellesley, MA, 1986.

G. Strang and G. J. Fix. An Analysis of the Finite Element Method.


Wellesley.
Cambridge Press, Cambridge,
MA, 2008.

S. H. Strogatz. Nonlinear Dynamics and Chaos: With


Applications to Physics,
Biology,Chemistryand Engineering. Westview Press,
Cambridge, MA,1994.

3
Vector Calculus and Partial
Differential
Equations

3.1 Vector and Tensor Algebra


3.1.1 Introduction
Manyof the partial differential equations (PDEs)
that we encounter as
biological
chemical and

engineers arise from field


equations such as the
Navier-Stokesequations of fluid

dynamics or the Schrdinger


of quantum mechanics. These equations govern quantities equation
(velocity,
wavefunction) that vary with position in three-dimensional
physical
space. In general, such a quantity is known as a FIELD.
Therefore,this
chapterbegins with a

discussion of the properties of vectorsand


related objects (tensors) in physical space. In general,a TENSOR
is an
objectthat has an intrinsic geometric definition,independentof coordinate system. It may be a velocity vector, a dot product between two
vectors (a scalar) or, as we shall see, even a linear operator.

3.1.2 Vectors in Three Physical Dimensions


In this chapter, we consider only vectors in three-dimensionalphysical
spaceand following convention in the physics and engineeringliterature, represent these vectors using bold type. We begin with a brief
reviewof vectors, tensors and their algebra. For now, let us consider
only a Cartesian basis for the space, with position independent,orthonormal basis vectors el, e2, e3. Any vector u can be represented as

u = Ei=l Cliei,or, using the summation convention,uiei. In CARTESIANTENSOR


notation, we streamline the notation even further, denotingthe vector as ui. The unsummed index i on indicatesthat u is a
253

VectorCalculus and Partial Differentia/

254

EquQti0hs

u; = uati.
a vector is Ilull =
of
length
The
vector. The
two vectors is determined by the dot
degree
between
of alignment

UiVj(ei ej)

U V = UV

Usingsome elementary

utt'i = llull llvll


coso

geometry, it can be shown that


1

(llu112

+ llv112

v can be expressed without referring


This result shows that u
lengths of vectors. Therefore,to a
the
to
only
but
system,
ordinate
thedot

coordinate system; it is a GEOMETRIC


product is independent of
INVARI.
of

ANT.Recallthat the inner product


the dot product.

Chapter 1 is the generalization


of

In Chapter 1 we also introduced the outer product between two

tors, also called the DIRECT PRODUCT or DYADIC PRODUCT. The outer

product between vectors u and v is the DYADIuv. A dyad is a SECONDa quantity that incorporates information regarding two
TENSOR:
ORDER
directions. (Avector, which has one magnitude and one direction,is a

first-ordertensor). A dyad can act as a linear operator

(uv) w = u(v w)
Similarly,

Notethat uv

w (uv) = (w u)v
vu. Based on this definition, we can write uv out,

includingbasis vectors

UV

In Cartesiantensor notation, uv is denoted as uiVj (the presenceof


the basis vectors ei and ej is implied by the presence of the twosub-

scripts). Whena dyad operates on something, the rightmost index(and


basis vector) is involved. An example of a useful dyad is the projection

operator, where is a unit vector. The product () v is the


componentof the vector v in the direction. You can checkthisby

applyingthe definitionof the outer product.


1As noted in
Chapter 1, sometimes

v.
the dyad uv is denoted by uvT or u

3,1 vector

and Tensor Algebra

255

general second-order tensor T can be written as a linear


combina-

of the
0011

basis dyads eiej

Tijetej

tensor notation, the summations and base


vectors are imIn artesian can denote the tensor
by
its
we
component
matrix Tij. The
pliedand
T

v
between
a
=
u
second-order
tensor and a vector is
dotproduct
=
TijVj.
Similarly,
u = v T is, in Cartesian
anothervector:
coordiThe second-order identity tensor is
vjTji.
=
nates:
denoted and
property a = a

= a for all a. In Cartesian


is simply the Kroneckerdelta coorij, or
elel
+
e2e2
+
e3e3.
=
equivalently
Alsoimportant is the cross product, u x v. Recallthat, while the

satisfiesthe
dinateS,the ij component of

dotproduct is a scalar, the cross product is a vector, with magnitude

ullllvllsino and direction orthogonal to both u and v and deter-

minedby the "right-hand rule." The cross product is not commutative:


vx u. Because of the invocation of the right-hand rule in

its definition,the cross product is strictly speakinga PSEUDOVECTOR,


becauseits definition is affected by the handedness of the coordinate
systemin which it is computed.
It is useful to view the cross product as a matrix-vector multiplication.Usingthe Cartesian components
o

113

VI

-112

Wecan write the cross product more compactly if we introduce the


followingoperator, called the LEVI-CIVITA
SYMBOL
1,

Ejk -1,
0,

ijk = 123,231 or 312


ijk = 132,321 or 213
i = j, i = k or j = k

Thisis the Cartesian coordinate representation of the


ALTERNATING

UNITTENSOR
or PERMUTATION
TENSORE. As with the cross-product
itself,qjk is not actually a tensor, but rather a pseudotensor,
because
itsdefinitionis based on the
use of right-handed Cartesian coordinates.
Nowthe operator (ux)
can be written EijkUj. This quantity has two free

and Partial Differe


Vector Ca/CUlUS
ntial

256

indices, so it is a

cross product as

A useful identity

EquQti0hs

second-order pseudotensor. Finally, we can


CijkUjVk

Writethe

involving Eijk is
int jl

ijkklm

whicharises in the computation of double cross products such


(b x c). Sincethe Kronecker delta is not handedness dependent,the
doublecross product between three vectors is a true Vector.

3.2 Vector Calculus: Differential Operators and Integral


Theorems
3.2.1 Divergence, Gradient, and Curl
Considera vector that is a function of position, v(x), a vectorFIELD.
Physically,this vector field could be a fluid velocity (mass flux)oran
electriccurrent (charge flux), for example. An important physicalcon-

siderationis the total flow into or out of a closed region. Wedenote


this region as V, its boundary surface as S and the outwardunitnormal vector to S as n, as illustrated in Figure 3.1. The volumeof Vis
Vol(V)= fvdV. If v is a flux of some quantity, then n v dS is the
amount of that quantity crossing the boundary element dS perunit
time and thus

vol s

n v dS

is the amount of that quantity leaving V, per unit volume. Nowletthe


region be centered at a position xo and let V shrink to zero aroundthat
of v at point xo is defined by
point. TheDIVERGENCE
1

n v dS
div v = lim
Vol
vol-o
s

(3.1)

volumethat
Thus the divergenceof v measures the amount per unit
coordinates,so
of
independent
leaves the point xo. This definition is
the divergence is a tensor.
GRADIENT
For a scalar field (x) there is an analogous quantity, the
of 4, defined by
(3.2)

grad

= lim

vol-o Vol S

n dS

2 Differential Operators and Integral


Theorems
257

xo

Figure 3.1: Volume V shrinking to zero size

arounda pointxo.

Givena unit vector s, the quantity s grad ( is the derivative


ot v a\ong
i.e., the DIRECTIONAL
DERIVATIVE.
the s direction,
The
of (t)is
vector whose direction shows the direction

of the maximumchange
a
magnitude
is
the
whose
magnitude
of that change.
in 4)and
important
operation,
the CURL,measures the
The final
rotation ot
v at a point. It is defined

by

vectorfield

curl v

lim

n x v dS

Becauseof the cross product involvedin its definition,

is a

pseudovector.
The above definitions of div, grad, and curl are independent ot co-

xo
ordinatesystem and illustrate the concepts under\yingthem,
actuallywork with these operators we need coordinate systems.
three of the above operations can be expressed in terms ot the GRAoperator, V, also called "nab\a" or "del." is also sometimes
DIENT
denoted
x

by
In Cartesiancoordinates, it is given
3

or in Cartesian tensor notation


Xi

et is
vector
basis
the
The presence of
and curl operators
gradient,
unrepeated index i. The divergence,

and Partial Differential


VectorCalculus
Equations

258

by
then given

Vi
Xi

grad 4) =

Xi
Vk

curlv = V x v = Eijkxj
important operator is the LAPLACIAN
operator div
extremely
Another
grad, given by

div grad = V V xixt

notation for the Laplacian operator is V2. UnfortuThe most common


somewhat misleading, implying that the opera.
nately,this notation is
div grad. Some literature uses the symbol
tor is grad grad rather than
follow engineering convention and use 92
A for the operator. We

Non-Cartesian Coordinates
3.22 The GradientOperator in
In manyapplications,Cartesiancoordinates are not the most practical
for solvinga problem.2 We are familiar with cylindrical and spherical
coordinatesystems,but there are many others, including bipolarand
parabolicsystems. We consider here only orthogonal coordinate systems;the basis vectors may change from point to point, but at each
point they are mutually orthogonal. We denote an arbitrary set of orthogonal coordinates by 111, 112, and the (orthonormal) base vectors
by eul , eu2, eu3. The most important distinction between Cartesian and

other coordinatesystemsis the actual distance traversed in moving

from one coordinateline to another. For example, in Cartesian coordinates (Xl,x2,x3) = (x, y, z), the distance between the coordinatelines
y = 1 and y = 2, keeping x and z fixed, is always 1. But in cylindrical

coordinates,

= (r, e, z), the distance traveled goingfrom

= 1 to e = 2 (at constant r and z) depends on r! This dependenceis

quantified in the SCALE


FACTORS
for a coordinate system, defined by

x2
u i) 2

2Appendix
A of Bird, Stewart, and
deal of useful
informationabout this topic. TensorLightfoot (2002) contains a great
analysis is not restricted to orthogonal coordinate systems;if you want
to learn about tensor analysis in general coordinates,some
good references are Aris
(1962); Block (1978); Simmonds (1994); Bird, Armstrong, and
Hassager
(1987).

3.2 DifferentialOperators and Integral


Theorems
259

Thisquantity determines the distance traversed


coordinatecurve. For example, in cylindrical in movingalongthe
ut
= 1, h2 = r,
computethat
= 1. The coordinates,it is easy to

distancecovered
in moving

vectorby the scale factor), then we can write


the basis

1
h2

vectors in terms

xj
u ej
i

Notethat despite the notation, the number hi is


not a componentof
a vectorbut rather is a property of the particular
coordinatesystem
underconsideration.
For any orthogonal coordinate system, we can
now write the gradient operator as

ut
(summationimplied). In general, the gi depend on
position. The importance of this fact becomes clear when we consider
operators like
Laplacian
the

v.v=gt ut

gJ

hihj

ui uJ

Thesecond term in this expression does not appear in Cartesiancoordinates,where the base vectors are independent of position. In terms
of the scale factors, the derivative of a basis vector with respect to
positioncan be written as follows
gJ

hJ

uk

hk hk

hJ

i=l

ut

gj

Summationis not implied, as ukis not a component of a tensor.

Example3.1: Gradient (del) and Laplacian operators in polar (cylindrical)coordinates


(a) Without referring to Cartesian coordinates at all, derive a formula
for the gradient (del) operator in polar coordinates shownin Figure 3.2 so that one obtains for the differential of a scalar function

260

VectorCalculus and PartiaI

DifferentiQl

quqtio

rdOe

drr
x = rr

Figure 3.2: Polarcoordinates (r, O) and unit vectors er and

in which dx is the differential of the position vector in polar


ordinates.
(b) Using this formula for V, derive the formula for the Laplacianin
polar coordinates.

(c) Finallycheck these two results by relating them to Cartesianco.


ordinates using the hi and gi formulas given previously.

Solution
(a) As shown in Figure 3.2 we have for the differential of position

dx = drer + rdee
From the definition of partial derivative, we have the formula for
the total differential of an arbitrary function (r, O)

= eral + eea2 and solve for al, a2, the two


We substitute
vector components of
= (eral + eea2) (drer + rdee)
dr + ---d
= aldr + a2rd
dr + dO

3.2 Differential

Operators and
Integral

Theorems
261

Comparing the two sides, We


have
1

a2

which gives for V in polar coordinates

r
(b) Next we use the definition of the Laplacian
to obtain

(3.4)

v2=v.v

Taking the derivatives, and noting the dot

re

producter ee = 0
because the unit vectors are orthogonal, gives
er

r r

1 eo

r2 r r
er
r

1 ee

r 2 r r

Now we require the derivatives of the unit vectors with respect


to
(r, O). As shown in Figure 3.2 these are given by (see also Exercise
3.2)

er

eo

er

ee

(3.5)

Substituting these derivatives into the previous result and collecting the nonzero terms gives

r2

rr

r2 2

Note that we can combine the first two terms for an equivalent
form

r22

(3.6)

Vector Ca/CU/USand Partial Differentia

262

of the coordinates are


(c) The partial derivatives
x
y = sin O
ax = cos e
-= rsin O

1q

QQtio

Substitutinginto the previously given formulas for h

hi = 1

cos

andg

1
g2 r2 - r sin

gl = cos ex+ sin ey

g2 = eo

= er
We then have

which agrees with (3.4).


For the Laplacian,we require the derivatives of gl,g2
g2
1
g2

The formula for the Laplacian then gives


112r2

gl

h 2 02

r r

g2

r )

gl

g2

The gl term vanishes upon substituting the various derivatives,


and the g2 term produces the additional term (l/r)/r giving

r2

r2 2 r r
r2 2

which agrees with (3.6).

Table 3.1 collects expressions for the gradient and Laplacian


ators in Cartesian, cylindrical, and spherical coordinate systems. The

conventionused for the angles

shown in Figure 3.3.

and

are
in spherical coordinates

3.2 Differential Operators and Integral

Theorems
263

Figure 3.3: The orthonormal unit vectors


in

Cartesian

X2
Cylindrical

2+

spherical coordinates.

z z
2

V = er + eo

z
2

r
Spherical

r 2 2 + z 2

V = er
r 2 r

r sine
r

r 2 sine

sin 0

r2 sin2e

Table 3.1: Gradient and Laplacian operators in Cartesian, cylindrical,


and spherical coordinates.

VectorCalculus and Partial Differential

264

QQti0hs

Theorem
3.2.3 The Divergence
concerns the integral of the
The divergencetheorem
divergence

V. It is central to many
vectorfield v(x) in a region
aspects

volumeand the divergence theorem plays a key role in development


Toillustrate the arguments underlying the divergence theorem
out digressingtoo far, we will prove a limited version of it. Consider
the two-dimensional"volume" VAshown in Figure 3.4, whose "surface
consists of three pieces, Sl, S2, and S3 and whose outward unit normal
is TIA.In this domain

vx

VA

V.vdV= VA X

vy

Yl xc(y) vx

Yc(x ) v

dx dy +
ax

dy

y dy dx

(vy(x,yc) vy(x, 0) dx
Vy(X,0) dx +

dy
= eyand S2,
Since on Sl,
expression can be simplified
V.vdV=

ex,the first two terms in thelast

V dS +

dx +

V dS

(3.7)

both corTo simplify the remaining two terms, observe that they
respond to integrals along the surface (a curve in two dimensions)an
into
They can be combined into one by converting the second term
limitsof
the
changing
and
integral over x, noting that dy = dycdx
ax

Operators and
3.2Differential
Integral

Theorems
265

(x)

Figure 3.4: A two-dimensional volume


for evaluation
of the divergence.
of the integral
Differential
elementsas
and ax are

integrationappropriately

dX+
dx +

dyc

dy
xl Yc=Y1

dyc

dx dx

xly=o

ex + ey vdx

OnS3the normal vector can be written


1

dyc

dx

1 + dyc2
and dS =

dyc) 2dx (see Figure 3.4),


so this integral becomes

v dS

Combining
this result with (3.7),we find that
V.vdV=

v dS +

v dS+

v dS

(3.8)

S2

Finally,consider what happens if we extend the integral to the larger


domainV that includes both VAand a contiguous subdomainVB,with

Vector Calculus and Partial

266

Differential

Eq u

Qti0hs

Figure 3.5: Two contiguous subdomains.

normal vector 11B,as shown in Figure 3.5. By the same arguments


given
above,
V.vdV=

V dS+

V dS+

VB

V dS

(3.9)

The two domains VAand V2 share one side S2; on this side
nB.Adding (3.8) and (3.9) and recognizing that the integralsover

S2 cancelanything leaving VAvia S2 is entering VBwehave that

n v dS

V.vdV=

(3.10)

whereS is the boundary of the entire domain and n its outwardunit


normal. Bypiecing together domains like these and repeatingthearguments used here, (3.10)can be seen to hold for any closeddomain

on the plane.

Equation(3.10)is the divergence theorem. By extending theseelementary arguments it can be shown to be valid for arbitrarybounded

domainsin an arbitrary number of dimensions. In one dimension

THEOREMOF INTEGRALCALCULUS:
it reduces to the FUNDAMENTAL

fab dx dx = f (b) f (a). It is extremely important in a widevarietyof


on

contexts as it relates behavior in the interior of a domain to behavior

its boundary. Finally,one can see that the definition of the divergence
operator mirrors this result for an infinitesimal domain.
In Cartesiantensor notation, the divergence theorem is
vt

dV =

V Xi

niVidS

3.2 Differential Operators and Integral

Theorems
267

Byreplacing the vector by a scalar


in this expression, the related results or by a second-order
can be found
tensor
(nowexpressed
in
dS
(3.11)

n T dS

(3.12)
A closely related result is the
multidimensional
which we state without proof
RULE,
version of LEIBNIZ'S
here.
Consider
the time derivative
of an integral over a volume that is moving
ement in a velocity field. If a point on the with time, e.g.,a fluid elboundary is moving
velocityq (x), then Leibniz's rule states that
with a
dt V(t)

m(x, t) dV =

dV +

mn qdS

The second term in this formula appears only


if the volumeis moving
changing shape with

or
time and represents the
net amountthat is
swept into V because of the motion of its boundaries.

Example3.2: The divergence theorem and


conservation laws
Conservation laws can be written for many quantities.
Important examples include mass, energy, chemical species, and
probability. Consider
a quantity A that satisfies a conservation law in some
arbitrary
of space V with boundary S and outward unit normal n. The region
density
(amount per unit volume) of A is PAand the flux (amount per
unit area
per unit time) is FA. We allow for the possibilitythat A is created
or
destroyed within the volume, with rate RAhaving units of amount of A
per unit volume per unit time. If A is a chemical species,then Ris a
volumetric reaction rate of production of A. The conservationlaw for
A can thus be written for the domain V as follows

dt v

n.

as + S RAdv

(3.13)

The left-hand side is the rate of accumulation of A in the domain.The


first term on the right-hand side is the net rate of entry of A into the domain across its boundary and the final term is the net rate of production
of A via sources or sinks of A within the domain. Use the divergence
theorem to write a conservation statement for A that is valid at evers:

point in the domain.

and
VectorCalculus

Partial Differential

Equations

268

the lone
allows
theorem
Thedivergence
integral

a volume

n FAdS -

surface integral to be

dV

independent
V is time
because
Furthermore,
d PA dV =
v t
v

dt

equations
Substitutingthese two

into (3.13) yields

PAdV = V
v t
equation
Sinceall terms in this
bined

dV+

RAdV

are volume integrals, they can be com.

+ V.FAdV-RA av

that this equation can


Becausethe volume V is arbitrary, the only way
be satisfiedin general is if its integrand vanishes at every point within
V. That is

QA = -V FA+RA

(3.14)

This is the general pointwise statement of the conservation law for A.


To be more specific,let A be a chemical species. Its molar density,
or concentration,will be denoted CA.Chemical species are transported

by moleculardiffusion and flow; if the species is dilute the flux of

can be written FA =

DAVCA,where v is the velocity field for the

fluidin whichA is dissolved, and DAis the diffusivity of the species.

Now (3.14) becomes

CA

= -V

(CAV) +DAV

2cA + RA

(3.15)

Thisis a partial differential equation


for spatial and temporal distribUtion of a chemicalspecies. If U
and L are characteristic scales for the
fluidvelocityv and domain
size, respectively, then the relative importance of convectionand
diffusion is estimated by the PECLET
NUMBER
Pe = UL/DA.

3.2 Differential

Operators and Integral

Theorems
269

3.2.4 Further Integral Relations and Adjoints


of Multidimensional
IDENTITIES
are special cases
GREEN's
of the divergence
working
for
with
integrals over
theoremthat
are useful
quantities
other
than
the divergence.
involvingdifferential operators
Green's
first identity is
divergencetheorem for the case where v is
the
replaced by uVv,
scalars
now
whereu
and v are

(Vu vv + uv 2v) dv =

uVv ndS

(3.16)

Green's second identity comes writing Green's


first identity
v exchanged and subtracting this expression from Green's with u and
first identity
as written above

(uv 2v -

dv = (uvv - vvu) nds

(3.17)

Finally, GREEN'SFORMULAcomes from replacing v in the


original exby uv where u is a

scalar and v a vector

pression

(Vu.V+uv

uv .ndS

(3.18)

In one dimension, Green's formula reduces to the expressionfor integrationby parts.


The above theorems all deal with the divergenceand its closest relatives,the gradient and the Laplacian. The final results are instead for
vx
the curl. In two dimensions, V x v reduces to ( vy
x y )e3. GREEN'S
shows how the integral of this over an area A can be reduced
THEOREM
to an integral over the (closed) boundary curve C

vy

vx

dA =

(vx dx + vy dy)

The proof of this result closely follows what we did above with the

divergence theorem. STOKES'STHEOREM


is more general, applying to
anybounded orientable ("two-sided") curved surface A floatingin three
dimensions,with boundary curve C

n (V x v) dA=

v tdC

Heret is the unit vector tangent to the boundary C, pointingin the

directionin which the integration around C is being performed. The

and Partial Differential


VectorCalculus
Equations

270

surfaces like a Mbius strip. The


precludes
orientabilityconditionvector to the surface A. Since the surface
normal
does
tor n is a unitthree-dimensionalvolume, however, inward and
not enclosea
for C. For example, if S
the integration path
were
of
direction
upward
point
on the
would
out
n
of
the paper
paper, then
region on a sheet of around C is counterclockwise.
path
if the integration
of the above results is in the determina_
application
important
One
div, grad, and curl. First, we define the relevant
to
adjoints
the
of
tion

inner products. Let

uvdV

if u and v are (real)scalars, and

u vdV
if they are vectors. In our earlier discussion of adjoints, we used integrationby parts to help us compute them; in multiple dimensions,
Green's formula and identities are the appropriate replacements. For
example,using Green's formula, (3.18), we can easily find that, with
u (S) = 0 (Dirichletboundary conditions)

(Vu,v) = -(u,v v)
Thus the adjoint of grad (with Dirichlet boundary conditions) is -div,
Similarly,rearrangingGreen's second identity we find that

(V2u,v) =

(uvv - vvu) nas + (u, v 2v)

If we imposethe same boundary conditions on u and v, then u Vv v Vu on the boundary. Thus the boundary term vanishes, leaving

(v 2u,v) = (u, v 2v)


Therefore,the Laplacianoperator is
always self-adjoint. This fact has
important implications for the
solution of partial differential equationS
that involve the Laplacian.

3.3 Linear Partial Differential

Equations:

3.3 Linear Partial Differential


33.1 Classification and
Canonical

Properties

Equations:

and

solution

Propertiesand

Forms for

Second-Order
Partial
general
properties of
Many
partial
differential
duced with this second-order
equations
equation in
two dimensionscan be introauxx + 2buxy +
Cu
where x, y e R and ux = u etc. For
(3.19)
the
moment x and
essarily position variablesthey
y are not necare simply
the
problem.
the
The coefficients
for
a, b, and c independentvariables
latter
are real and
though the
restriction can be
relaxed.
Nowconsider constant,
of whether there exists a change of
independent variablesthe question

= 7xx+ nyy
that can simplify the left-hand-side of this
equation. Here
are constants and Exr7y
must be nonzero for Ex,5, nx,
the coordinate
transformation to be invertible. Applying the
chain rule yields that

(a; 2 +

+ cg)

(a;xnx + b(xny + Eynx) + c;yny) u;n+


ar7x2 + 2b7x7y+ cny) unn = g(,

(3.20)

If b2 ac > 0, then (3.19) is said to be HYPERBOLIC3.


In this case,

we can find real constants Ex, 5, 7x,

such that the coefficientsmul-

tiplying u;; and unn in (3.20) vanish, leaving the simpler differential
equation
(3.21)

This is the canonical, or simplest, form for a hyperbolicpartialdifferential equation. Lines = constant and = constant are called
for the equation. The WAVEEQUATION
CHARACTERISTICS
utt C2uxx = O

(3.22)

3The nomenclature introduced in this section arises from an analogywith conic


ey + f = 0. If they exist,
Sectionsdefined by the equation ax2 + 2bxy + cy2 + dx + parabolas,dependingon
real solutions to this equation are hyperbolas, ellipses, or
whetherb2 ac is positive, negative, or zero.

272

Vector Calculus and Partial Differen

tiQ1
EquQti0hs

= x ct, = x + ct. We present


the
has this form, with
Section 3.3.6.
in
equation
this
general
solution to
is ELLIPTIC.No real
Ifb2ac < 0, then (3.19)

coefficients

Ex,
of u;; and
coefficients
the
vanish.
make
exist that will
InsteaY hi
+ il,
characteristics =
conjugate
complex
= FOR , one
finds
and
n'
=
Using
as new
All
is not lost, however.
coordina
to
made
vanish,
leading
be
can
to the
the coefficientof
form
(3.23)

equation is the two-dimensional


The left-hand side of this
Laplacian
above
reduces
(3.15)
to
state,
this
steady
At
operator.
form in two
function
a
of
x
and
only
is
g
If
y,
dimensions.
this equation
spatial
If
g
=
0,
it
is
EQUATION.
called
the LAPLACE
is called the POISSON
EQUATION.

The borderline case b2 ac = 0 leads to the PARABOLIC


eqUation

urn = g

(3.24)

The standard example of a parabolic equation is the transient species


conservationequation, (3.15)in one spatial dimension, whichwecan
vsTite

ut + Vux = Duxx + RA

The Schrdingerequation is also parabolic. Elliptic and parabolicequations are treated extensivelyin the sections below.
The classificationof partial differential equations into these categories plays an important role in the mathematical theory of existence
of solutions for given boundary conditions. Fortunately, the physical

settings commonlyencountered by engineers generally lead to well-

worryabout
posed mathematical problems for which we do not need to
the presenthese more abstract issues. Therefore we now proceed to

insensitive
tation of classicalsolution approaches, many of which are
to the type of equation encountered.

with
Expansion
Eigenfunction
3.3.2 Separationof Variables and
2
Equations involving V

most familthe
perhaps
is
OFVARIABLES
The technique of SEPARATION
equationS
differential
iar classical technique for solving linear partial
mechanicS,

It arises in problems in transport, electrostatics, quantum

3.3 Linear Partial Differential

Equations:

Propertiesand
solution

and many other applications. The


technique is
tion property of linear problems (section
based on
2.2.1)as well the superposias the following
1. a solution
x3, .. to a
PDEwith
independentvariables

2. The boundaries of the domain are


boundary conditions for the PDE coordinatesurfacesand
the
can also be

written in the above

3. A distinct ODEcan be derived from the


originalPDEfor each
func-

4. Using superposition, a solution satisfying


tions can be constructed from an infinite the boundarycondiseries of solutionsto
these ODEs. This condition implies
that separation of variables
useful
primarily
for
is
equations involvingself-adjoint
partial dif-

ferential operators such as the Laplacian,in whichcase


eigenfunctions of various Sturm-Liouville problems provide bases for

representing the solutions. Consider a problemwith three independent variables and two of them, say x2 and x3, lead to

Sturm-Liouville problems with eigenfunctions Yk(X2)and Zl(x3),


I = 0, 1, 2, .. The basis functions for the x2 x3
The solutions to the problemin
direction are thus

the inhomogeneous direction are then coefficientsin the series


and the total solution has this form
00 00

= E EXkL(X1)
examples.
Weillustrate the method with several

distribution in a circular cylintemperature


Steady-state
Example 3.3:
der
radius and an imposedtemperaunit
with
cylinder
Considera circular
temperature profile
steady-state
The
ture profile us (O) on its surface. EQUATION

u(r, O)is a solution to LAPLACE'S


v 2u = o
in polar coordinates

1 u

1 2u

rr r r2

(3.25)

274

Vector Calculus and Partial Different


lal

the origin and satisfying u(l, O)


with u bounded at solution u(r, O) =
scribed above, seek a

Equqtions

us (9).

Asdes

Solution

simplifyingyields
Plugginginto the equation and

(rR')' d9/d. Notice that the LHS


where R' = dR/dr and 9' =
of the equa.
the
RI-IS
a
and
function
only
r
of
of
O. The only
tion is a function
for
them
is
both
to
equal
equal a constant,
for the two sides to be
c
This observation gives us a pair of ODEs

r (rR')' -

=O

(3.26)

(3.27)

The constant c is as yet unspecified.

Equation (3.26) satisfies periodic boundary conditions 9 (9) = 9(+

2TT),9' (O) = 9' (O + 2TT);it is a Sturm-Liouvilleeigenvalueproblem


with eigenvalue c. This has solutions 9k (o) = eikfor all integersk

with the corresponding eigenvalue c = 1<2.So in fact we have found


not a single solution, but a family of solutions; a basis for functionsin

the direction.

Nowconsider the equation for R (r), setting c = 1<2.A littlemanipulation puts the equation in this form

r 2 R" + rR' + k2R = O

This is a Cauchy-Eulerequation, with k as a parameter and solutions

Rk = Akr k+Bkr -k . To satisfy the boundedness condition atr = 0,only

the solution with a positive exponent must remain, so Rk = akrlk


Sinceevery integer k gives a solution, we can use the superposition

principle to write

krlkleik

u(r, O) =
= 00

This is a Fourier series, using the Sturm-Liouville eigenfunctions 6k(9)


as basis functions. The coefficients ak come from the boundary condi-

tion. At r = 1,

ake LkuS (O)

3.3 Linear Partial Differential

Equations:

Properties

and solution

the coefficients
Wecan extract
ak from
the
this formula
of
Sturm-Liouville
thogonality
by
basis
functions: take using the orinner products

akeiko ile
= 00

Us

die

eil)/2Tt, this
Letting = (us
process simply
(known)
the
Fourier
is,
That
coefficients of the gives us that ak = Ck.
boundary temperature
determinethe Fourier coefficients in the
cylinder,

so

Ckrlkleike

Example3.4: Transient diffusion in a slab


Thetransient diffusion of heat or a chemical
speciesin one direction
is governed by the transient diffusion equation,
also called the heat
equation

2u

X2

(3.28)

Considerthe initial and boundary conditions u(x,0) = 0,


u(0,t) = 0,
0) = uc, i.e., the initial concentration in the domain
0<x <
is zero and at t = 0 the right end of the domainis exposedto a known
concentration u = uc. Seek a separation of variables solution u(x, =
t)
Solution

Usingthe form u(x, t) = X(x)T(t), (3.28)becomes

XT' = DX"T
whereagain ' denotes the derivative of a function with respect to its
independent variable. Rearranging yields

IT' X"
Observingthat this expression equates a function of t to a function of

x we again conclude that each side of it must be constant

T' = cDT

(3.29)
(3.30)

and Partial Differenti


Vector CalCUlUS
al

276

A simple change of

EquQtions

variable solves this problem. Welet u

satisfy the inhomogeneous boundary


v (x, t) and choose us to which case v satisfies
homogeneous
at x = 0 and x = C, in

tions
v (C,t) = O. A particularly convenientbound.
ary conditions v (0, t) =
choice
the steady-state solution to this

problem.Thus
is us = ue, whichis
steady
the
state.
from
Substituting
into (3.28)
v (x, t) is the deviation
=
0
2us/x2
yields
=
us/t
that
and observing
2v
X2

with v(x,0) = us,v(0, t) = 0, v(C, t

0) = 0. Nowletting

X(x) T(t) and repeating the above steps we find that the problemforx

is a true Sturm-Liouvilleproblem, including the homogeneousboundary conditions X(O) = X(C) = 0. The eigenvalues are c = k2,where
for positive integer n and the eigenfunctions are sin
now k = 717T/C
Equation (3.29)is an initial-value problem. Its solutions, parametrized
by n, are
Tn(t) =

so the overall solution again has the Fourier series form


42

v(x, t) =

sin

n=l

(3.31)

The initial conditions Tn(0) are determined from the initialcondition v (x, 0) = usby setting t = Oin (3.31) and taking its inner product
mrrx
with basis function sin
x

. mrrx

uev, sm

, sin
n=l

Thus

Sin

Tm(0) = C mrrx
Sin
Sln

2uc

dX

ntTT

dX

The final exact solution is thus

u(x,t)

n 2ue

=
n=l

sin

nrx

(3.32)

3.3 Linear Partial Differential Equations:

Properties and

Solution
<
t
ID, this series
At short times
converges
-I decay of the Fourier
coefficients Tn(0) very
of the initialbecauseof
In this situation, alternate approaches that
semi-infiniteare more appropriate
approximatethe
because
domainas
heat
or
the
solute over a short diffusion has only
to spread
had time
distance from
see Exercises3.23 and 3.36. As t increases,
the
boundary.
becomessmaller and the series converges the exponentialdecayterm
more rapidly.

Withthese two examples, one can see


a pattern emerging.
tion of variables leads to at least one direction
Separathat
problem
whose
presents
Liouville
a Sturmeigenfunctions are a
useful
solution.
basis for repreIn the second example,
sentingthe
a
change
requiredto find a direction with the homogeneous of variablewas
boundaryconditions required of a Sturm-Liouville eigenvalue
problem. The following

Example3.5: Steady-state diffusion in a square domain


solveLaplace's equation V2u = O in a unit square
domain
< y < 1, with boundary conditions u = 200 on x = O O < x <
and
u = 300 on x = 1 and u = 500 on y = 1, as shown in Figure y = O,
3.6(a).
Solution

As stated, there are no homogeneous directions. Nowwe split the solution into three pieces: u(x, y) = U (x, y) + V(x, y) + W(x, y), where
U,V, and W all satisfy Laplace's equation, but with conveniently chosen
boundaryconditions that sum to the boundary conditionsfor the original problem, as illustrated in Figure 3.6(b). The problem for U is trivial

because all the boundaries have the same value of 200; thus U = 200.
The problem for V has homogeneous boundary conditions at y = 0
and x = 1, while that for W has homogeneous boundary conditionsat
x = 0 and x = 1, aside from a multiplicative constant, it is just a IT12
rotation of the problem for V. The solution to the Wproblem (to within
a multiplicative constant) is Exercise 3.32. From it the solution to the
Vproblem can be found so the solution for u = U + V + W is complete,

and Partial Differential


Equations
VectorCalculus
278
(a)

500

300
200

200
300

200

(b)

200

v2U

200

v2v

= O

100

v2w = 0

200

in a square domain. (a) Original probFigure 3.6: Laplace's equation


lem. (b) Three subproblems whose solutions sum to the
solutionof the original problem.

Example3.6: Eigenfunctionexpansion for an inhomogeneous prob


lem
Solve the Poisson equation

uxx + uyy
in a unit square with Dirichlet boundary conditions, which modelsa
steady-statedistribution given a source f (x, y) distributed withinthe

domain.

Solution

Separationof variables does not work for this problem (try it), but a
versionof eigenfunctionexpansion does. Think of this problem as a
linear algebraproblem Lu = f. Here L is self-adjoint, so the solutionS
to the eigenvalueproblem Lw + Rw = 0 form an orthogonal basis and
allowus to diagonalizeL. We can express u
and f in this basis, and
sinceL becomesdiagonal we can
easily solve for u.
To perform this procedure in
the present case, we need to solve
wxx + Wyy +

=O

3.3 Linear Partial Differential Equations: Properties


and Solution
Techniques

the unit square with w = O on the boundary.

279

We can solvethis
of variables: it gives
problemby Separation
Sturm-Liouvilleproblems
and yields eigenfunctions
in
Uhnn(X,y) = sinmrrx
bothX and y,
eigenvalues
mn
sinnTry
=
TT2(m2+ n2), for all
with (real) the Poisson equation,
solve
we express u andinteger pairs mn.
Nowto
f in terms of the
enfunctions
eig

UmnWmn (X Y)

Sincef is known, fmn = (f, u.'mn) / (Wynn,

, where the inner prodjust


is
the
case
integral
over the square. Now since
uct in this
I.vxx+

write
can
we
mnUmn
NW,
=

fmn,
which
we
can
solve immediWyY
ately to give Umn = fmn/

SO

mn
sin ml-rx sinnrry
mn

In some situations, a separation of variable solution canbe obtained


viamultiple approaches. For example, the Laplacian operator in polar
coordinatescan be written

where

r r

02

Z2

Givenappropriate homogeneous boundary conditions, all three of these

are Sturm-Liouvilleoperators, so depending on the boundary conditions,there may by the possibility of more than one method of solution.
Thefollowingexample illustrates this situation.

Example3.7: Steady diffusion in a cylinder. eigenfunction expansion

andmultiple solution approaches


ConsiderLaplace's equation in a cylindrical domainwithboundary conditionsu(r,z = 0) = 1, u(r = l,z) = 0, u(r,z = 1) = 0. That is, the
bottomis heated, and the top and side are cooled. Solvethis equation
in twodifferent ways:

Vector Calculus and Partial Differential

280

that depend on r.
(a) Usingbasisfunctions
that depend on z.
(b) Usingbasis functions
Solution

Wecouldproceed by seeking a solution v (r, z) = R(r)Z(z) as


directlyimpose a Fourier series form for the above
Insteadwe
solution

(a) For the current problem there is no O-dependence and


we first
seek a solution that uses basis functions in the r
-direction.,
i.e.
eigenfunctionsof Lr. This is a singular Sturm-Liouville
operator
(p(r) = r) so only boundedness is required at the
origin, and the
boundary condition at r = 1 is homogeneous.
Referring back
Example2.8, we recognize that the eigenfunctions
to
of
Lr
are
the
Besselfunctions of order zero so we can seek a
solution

u(r,z) =
where

= 2.4, 5.5, 8.7, 11.8


To simplify
van. Substituting this solution
let kn =
form into Laplace's
using the fact that LrJo(knr) =
equation and

14Jo (knr) yields that

d2un
dz2

k 2un = 0

Becauseof the bounded


domain, it is convenient
solution to this problem
to represent the
as

un (z) = an cosh
so

knz + bn Sinh knz


'i

u(r, z) =

(an cosh knz +


bn Sinh

(3.33)

At z = 0, u =
l.
from r = Oto Takingthe inner product, i.e.,
r = 1 of (3.33),
weighted integral
evaluated at z = O,with Jo(kmr)

an =
(JO

JO (km r )

(3.34)

Partial Differential Equations:


Properties and
Solution
Techniques

3.3 Linear

281

Evaluationof these and related integrals is


facilitatedby the following general results for Bessel functions with
integer n and arbitrary k, L
nJn(kx)] =

tX
dx
xnkJn-1(kx)

d
dx

(3.35)

xnkYn-1(kx)

(3.36)

x- [Jn(kx)] + nkJn(kx) = xkJn-1 (kx)


x-

(3.37)

+ nkyn(kx) = xkYn-1(kx)

(3.38)

x- [Jn(kx)] nkJn(kx) = xkJn+1(kx)

(3.39)

x- [Yn(kx)] - nkyn(kx) = xkYn+1(kx)


dx

(3.40)

J-n(kx) = (-1PJn(kx)
Y-n(kx) =

(3.41)
(3.42)

dx
0

-1

(k) k =
LJ2
2 71+1

(3.43)

Usingthe first and last of these expressions, one can find that
1

dr = v-J1 (km) (3.44)


JS (kmr)r

dr

(3.45)

Theboundary condition u = 0 at z = 1 requiresthat


bn _

an cosh kn

Sinhkn

Usingthese results, the solution is


cosh kn
an coshknz Sinh kn sinhknz Jo(knr)
with an given by (3.34), (3.44), and (3.45).

VectorCalculus and Partial Differential

282

EquQti0hs

z) = 0) with the homogeneous


(because V2(1 --

boundary

Condi?

seeking a solution v (r, z) =


z).
Wecould proceedby
impose
directly
a
will
Fourier series
above. Instead we
eigenfunctions
sin nrrz of Lz form
the solution based on the

vn(r) sinnrrz

v(r,z) =
Substituting this solution form

n=l

into Laplace's equation

leads to

1 d dvn n27T2vn
(r) sinnrrz = 0
r dr dr

Takingthe inner product of this equation with sin mrrz, invoking

orthogonality,and changingm to n yields


1 d dvn

r dr dr

n 2TT2vn(r) = 0

This is called the MODIFIED BESSELEQUATION OF ORDERZERO.It

differsfrom Bessel's equation by the sign in front of the second


term. Its solution can be found by the method of Frobenius;the

general solution is

vn(r) = anlo(n27T2r) + bnKo(n27T2r)


The functions 10 and 1<0are the MODIFIED BESSELFUNCTIONS
of

order zero; they are shown in Table 2.3. The function has
a logarithmicsingularity at the origin, so for boundednesswe
require that bn = 0. The coefficients an are found by imposingthe
boundarycondition at r = 1 and again taking the inner product
with an eigenfunction

an =

z),sinnrz)

-2

(n27T2)

nnIo(n27T2)

The solution in final form is

v(r,z) =

sinnrz
an10(n27T2r)
ItI

3.3 Linear Partial Differential Equations:

Properties

In spherical coordinates, the


Laplacian

and solution

operator can

283

be written

where

sino OSin

It oftenis useful to rewrite the first of these


in this form

Lyf =

r r2(r f)

Accordingly,the introduction of a new

variable g = Yf
often is useful

Example3.8: Transient diffusion from a


sphere

considerthe transient diffusion of a


chemical species
out
withradius R into uniform surroundings
where the speciesof a sphere
concentra-

withu(r,0)=

= DLru

> O) = O.

Solution

sphericallysymmetric problems like this can be


solvedusing the eigenfunctionsof
This is the SPHERICAL BESSEL'S EQUATION

dx
dx + (m 2x 2

+ 1)) y = 0

inthe specificcase = m 2 and n = 0. Its solutions are the SPHERICAL


BESSEL
FUNCTIONS
of order zero, which are simply

sinmx
x

cosmx
x

Thesefunctions are orthogonal with respect to an inner product with


weightfunction w (r) = r2. This factor arises naturallyin the differentialvolume element in spherical coordinates. The eigenvalues
m2,and coefficients a and b are determined as usual by the (homogeneous/boundedness) boundary conditions. For example, for diffusion
in a sphere,boundedness at the origin will require that b = 0.

and Partial Differential Equations


VectorCalculus
284

around a sphere in a linear gradient


field
Temperature
0) that surrounds
temperature field T(r,
Example3.9:
steady-state
inclusion in a solid material,
Considerthe R, e.g., a spherical
sphereof radius
it. Thus
heat fluxinto

with boundary

we are solving

2
O=LrT+ ALnT

conditions

VT Gez,

00

Solution

involving the Laplacian in Spherical


A'symmetricdiffusionproblems expansion in the eigenfunctions
by
of
geometriesare naturally treated
If we make the substitution

= cos O, Lbecomes

d
and the eigenvalueproblem can be written as
1 z ) dz2

dz

Thisis Legendre'sdifferential equation, see Example 2.9. Its eigenvalues are = n(n + 1) for nonnegative integers n and its eigenfunctions
are the Legendre polynomials Pn
Substitutingthe solution form
(3.47)

into the governingequation, recalling that LnPn = n(n + 1)Pn,and


using the orthogonality of the Legendre
polynomials yields that

r 2 LrTn -

+ 1)Tn = O

3 Linear

Partial Differential Equations: Properties


and Solution

echrliques

285

as
ewritingthis
2d2Tn + 2r dTn

dr2

it
recognize
we

(3.48)

as a Cauchy-Euler equation with solution

Tn(r) = anrn + bnr-(n+l)

(3.49)

the boundary condition at infinity. We can rewritethis


first
consider
is arbitrary;we have not
+ Too= r GPI (n) + Too,where TOO
anywhere, only its gradient. Comparingthis
temperature
as
cified the
al = G, and
we see that ao = TOO,
spe to the series solution (3.47),
form
Atr = R

1.
0 for n >

dr
Pn(n)

nanRn-l - (n

vanishterm
must
sum
this
of the pn(n),
orthogonality
the
Becauseof
-(n+l)-l = O
1)bnR
_ (n +
byterm

nanR

known values
the
Using

of an

bo

al G - 2b1R-3

bl -

GR3

(n + 1)bnR

is
Thefinalresult

(3.50)

Too G

R3 cose
2r2

and Partial Differential


VectorCalculus
286

perturbation: heat conduction around


Domain
Example3.10:
heat conduction outside an object describ
of
problem
Considerthe
coordinates by

in spherical

r = R(O) = 1 + EP2(cos O)
1)/2 is the quadratic Legendre polynomial.

(3x2
=
(x)
whereh
the poles and narrower at the equator
at
elongated
than
shape is slightly
surface area. Use a regular perturbation
same
the
has
is a sphere,but
smallness of the deviation of the surface from
approachbased on the
spherical.

Solution
the technique of DOMAINPERTURBATION.
This
This example illustrates

where the possibly unknown bound.


approachis applicableto problems
from a shape for which a closed form
ary shape is a small perturbation
Fourier transform) solution can be ob(e.g. separation of variables or

tamed. This approachis sometimes also used in numerical Solution


approachesto simplifythe domain shape. In the present example,the
choice of the Legendrepolynomial simplifies the calculation but the
solutionprocedurewould be similar, but more tedious, with a more
complicatedsurfaceshape, as long as the deviation from a sphereis

uniformly small.

The equationand boundary conditions are

V2T = 0, T(r = R) = 1, T 0 as r 00
Becausethe boundary is not a constant-coordinate surface, separation

of variables(in sphericalcoordinates) cannot be used to find an exact


solution. Nevertheless,a perturbation approach can be used to impose
an asymptoticallyexact boundary condition at r = 1. This is doneby
expandingthe boundary condition in a Taylor series around r = 1:

1 2T

2 r2 r=l

Inserting the particular expression for the boundary shape:

1
2T
O()
EP2(cos)+
E2P2(COS +
2 r2 r=l

3.3 Linear Partial Differentia/

Equations:

Properties

Notethat this boundary condition

and

solution

is imposed
287
use of separation of variables. There
at
perturbation approach is necessary, is no
so we indicationthat
expansionT(r, 0)
To(r, 0) +
a singular
posit a
(r,
regular
0)
order
each
+
is simply
equationat
perturbation
Laplace's 2T2(r,0). The
equation, With governing
boundarycon-

Usingthe fact that axisymmetric decaying


solutionsto
Laplace'sequaPi (cos o)

ri+l

i=0

wefind that the solutions at each order are:


1

P2(cos O)

21

r3

0)

36
0)
35 r5
solutions,
these
we
can
Given
find that the dimensionlessheat
fluxfrom

the object is

27T
O r

sine dO = 4TT(1+ 2)

whereQ = 1 corresponds to the heat flux from a sphere. Thusthe

changein heat flux from the sphere is proportional to the square of the
deviationof the surface from spherical. Notice that the entire solution
procedureis valid, and the heat flux the same if E < 0, so the objectis
actuallya slightly flattened sphere. Therefore both prolate and oblate
deviationsfrom a spherical shape increase the heat flux.

33.3 Laplace's Equation, Spherical Harmonics, and the Hydrogen


Atom

exSchrdinger'sequation for the wave function Y(x, t) of a particle


posedto a potential energy field V(x) is
(3.51)
_ _v2Y +
i

Vector Calculus and

288

Partial Differentia/

spherically symmetric
the case of a later, so it is natural potential
consider
to Workin
Wewill
we will specify
form
whose
equation
have
this
of
V(r),
a very richSh
solutions
The
ical coordinates.
many features of systems with

ture that encompasses

examples, we will allow


metry.
the previous couple
to
the
contrast
In
to again guide us. To begin, let Y
procedure
(x, t)
ration of variables
temporaland spatial variables are separated but not
coordinate directions. Inserting this form into (3.51)
individual
the
(yet)
yields
and rearranging
dt

Thus
where E is a constant.

dt

(3.52)
(3.53)

The solution to (3.52)is

-iEt
f (t) = foe

Equation(3.53)has the form of an eigenvalue problem where the eigenvalueEis a dimensionlessenergy. This must be real so that Y doesnot
vanish at past or future times. Now, since L4) = 2/2 with periodic
boundary conditions, we let (P(r, 17,4) = u(r, 7)eimfor any integer
m. As above, we have let = cos O. Equation (3.53) becomes
2

(3.54)

r2 (1 172)

Substitution into (3.54) and rearWenow write u(r, 17)=


rangement to group terms dependent only on r and 17yields
1

-r LrR + r2(E- V(r)) = --LnP +


2

Therefore

r 2LrR + r 2(E -

- CR
2
I 172

(3.55)
(3.56)

3.3 Linear

Partial Differential
Equations:

Properties and

Solution

Equation (3.56) describes the angular


behavior of
0, it reduces to Legendre's differential
the solutions.
For
equationwe

knowthat
* O it is
I DIFFERENTIAL
the ASSOCIATED
EQUATION. Seeking
a
LEGENpower series
vealsthat this equation has bounded solutions
solution
rein 1<

number. For

. , l. These
solutions are the

if

ASSOCIATEDLEGENDRE POLYNOMIALS

Pim(n) and Pi,-m

2)

dnmPl(n),

(3.57)

Pint.

Recapitulating, the products Pim(cos O)eim<)


pendenceof the solution. Suitably normalized and

denoted Yim(O, ,
these products are called SURFACESPHERICAL
HARMONICS,
or someHARMONICS;
timesjust SPHERICAL
they are

the eigenfunctions
of the

angularpart of the Laplacian

(3.58)

Eacheigenvalue I has I + 1 corresponding eigenfunctions Yim


with m =
I. The normalized functions

have the form

21+1
(3.59)

andsatisfy orthonormality with respect to integration over the surface


ofthe unit sphere
Yim(O,

O,

sine d d4) = lnmp

(3.60)

Thefunctions Yimfor I = 4 are shown in Figure 3.7. Surfacespherical


harmonicsare widely used to represent functions on the surfaceof a
sphere.

Returning to (3.55) for the r-dependence, consider first the case

E = V(r) = 0, in which (3.53) becomes the Laplace equation

= 0.

Equation(3.55) and its solution reduce to (3.48) and (3.49),respectively,

withn replaced by l. Thus the general solution to


in sphericalcoordinates, is
00
1=0

almrl + bimr

-(1+1)

= 0, expressed

(3.6

Vector Ca/cu/us and Partial Differe


ntiQ/

290

eqt4Qtio

ecsoe
m=2

'77=1

"77=0

Figure 3.7: From left to right, real parts of the surface spherical
har_

Equation (3.50)is a particular case of this solution. Terms rlY1


and r -(1+1)Yim(O,4)) are called the growing and decaying SOLID

ICAL HARMONICS, respectively.

Now consider the case of an electron "orbiting" a protonahydro.


gen atomwherethe potential energy is the Coulomb potential
1

As boundary conditions, we require that Y is bounded at r = 0 and that

it vanishes as r If the latter condition is not satisfied, the electron


is not bound to the proton and we do not have an atom. Equation(3.46)

motivates the substitution w(r) = rR(r) into (3.55),yielding


d2w

dr2

oowe can approximate this as w" + Ew = 0, suggestingthat


As r
where E.Thisresultinwe seek a solution w(r) =
dicates that E < 0 for a bound electron. Without going into the details
(with which we are now largely familiar), seeking a Frobenius solution
00leads to = 1+1
F(r) = r ag(r) and requiring that Fe -r 0 as r
and requires that g(r) be a truncated power series, i.e., a polynomial.
Inspecting the recursion relation for the power series, one finds in close
analogy to the results in Chapter 2 regarding Legendre and otheror-

thogonalpolynomials that it will truncate at degree n' if

The solutions, which we denote RTZ'can be written in terms of ASSOCIATED LAGUERREPOLYNOMIALS(Merzbacher,

1970; Winter, 1979).

3.3 Linear Partial Differential

Equations:
Properties

and
n'
+
1
=
as
n
solution
pefining
the PRINCIPAL
QUANTUM
NUMBER
this
becomes
n2
expression

determines the
This
verywell the energy levels of a eigenvalues of
(3.53)
hydrogen
n,(x) = Yim(O,
(r) of (3.53) atom. The and describes
l, called the ANGULAR
are
eigenfunctions
characterized
MOMENTUM
RADIAL
the
QUANTUM
called
by l,
NUMBER. QUANTUM
NUMBERS,
onlyon I + n', various combinations Since the
and n'
eigenvalues
of
I
true
for
is
and
E
all eigenfunctions
Thesame
n' have
the samedepend
atomic
with
f
and
d,
orbitals correspond
energy.
the same
to
m.
I = O,1, 2,
Thes, P,
sinceE < O,when n = 1 only 1 =
and
O
3,
respectively.
is the ground state or lowest-energy states, s orbitals,can
state of the
exist. This
hydrogen
atom.
1
on. Thus we see in this analysis the (p orbitals)can exist, When
basic features
and so
of the electronic

33.4 Applications of the Fourier Transform


to PDEs

In Section 2.4.1 we saw that functions in a


finite domain
Ck =

k = 00

couldbe rep-

(f eikX)

(eikx

eikX)

The FOURIERTRANSFORMgeneralizes this idea to an


unbounded domain.First some definitions: the Fourier transform j (k) of a
function
f (x) is given by

f(x)e

tkXdx = F {f (x)}

Thisis the analogue of the expression for Ckin a bounded domain;becauseperiodicity is no longer required over a finite interval, k can be

anyreal number rather than needing to be an integer. The INVERSE


is the analogue of the Fourier series representaTRANSFORM
FOURIER
tion of f

1
f(x) = 2TT

-00

and vice
Theseoperations are mappings from "x-space" to "k-space" which
transforms,
versa. Here are some useful properties of Fourier
areeasilyderived from its definition:

and Partial Differential


vector Calculus
292

property
Derivative
l.

df(x)

2. Integral

ikF {f (x)} = ik(k)

property

f(x)dx = j(k) + c(k)


(3.63)

xo

on the
where c depends
3. Shift in x

lower limit xo of the integration.

F {f (x - a)} = e tka(k)

(3.64)

4. Shift in k
(3.65)

5. Scaling

F {flax)} =

lal

(9

(3.66)

where oxis a real scalar.

6. Behaviorupon exchanging variables: if j (k) is the Fouriertrans.


This property is usefulfor
form of f (x), then (x) =
extending the usefulness of lists or tables of transforms, likethe
one in the following paragraph.
of two functions Gand
7. Convolution theorem: the CONVOLUTION
h is

u(x) =
This is often written u = G * h. The CONVOLUTION
THEOREM
states that
(3.67)

A convolution in x-space is a product in k-space.


These properties
help us to solve PDEs.
Fourier transforms of some important functions are:
1. f(x) = (x)
(k) = 1. A spike localized in space has equal
components at every wavelength.

3.3 Linear

partial Differential Equations: Properties


and Solution

Techniques

293

2TT(k)
f (x)
1. Conversely,
is smeared all over space. a spike located at
wavenumber
zero
L)
f (x) eilx. A spike at
2TT(k
k = I corresponds
3. j'(k)
of wavenumber l.
sinusoid
to a
1, L< x < L and zero elsewhere
=
j (k) = (2 sin kL)Ik.
f(X)
4.
1

5.

6.

blx\

2b

7.

ex214a

The Fourier transform of a Gaussian is a Gaussian. If a is large, the

function decays very quickly as \k\ increases, so the Gaussianin


k-spaceis very localized. Because a appears in the denominator
in x-space, however, the function is very spread out in x. The
opposite is true if a is small, with the balance holding at a = 112.
spread of the function the samein k and
Here and here only is the
- (x), in which case this property
0, 2 ex214a
x. Asa

reducesto the first result on the list: f (x) = (x)

formula
Example3.11: Derivation of a Fourier transform
its Fourier transform.
Letf (x) = e-blxl with b > 0. Find
Solution

eblx\eikx dX
e

(b-ik)x
1

b -ik
2b

b+ik

j (k) = 1.

VectorCa/cu/us and Partial Differen


tiQ/

294

of Fourier
Weillustrate the use

EquQti0hs

transforms to solve
PDEs

With

amples.

diffusion in an unbounded domain:


Example3.12: Transient
one
multiple dimensions
(a) Considertransient diffusion

ut Duxx

(3.68)
in the one-dimensionalinfinite domain (00,00)with initial
ditionu(x, 0) = uo(x), where uo (x) is known but otherwise
bitrary. Usethe Fourier transform in x to find the solution
(b) Extend this result to three directions, with initial condition u(x,
0)
uo(x). Do so by first considering a -function initial condition
(x) = (x)(y)(z) and noting that it can be incorporated
into
the governingequation as a point source in space and time

) (t)(x)

(3.69)

Solution

(a) Takingthe Fourier transform of the equation and applyingthe


derivativeproperty yields

t) = k2D
Thisgives us an ODEfor each value of k, with initial condition
o(k). The solution is simply (k, t) =
Theinverse
Fourier transformputs this back in physical space. Considerthe
evolutionof a delta function, whose Fourier transform is simply
(k) = 1. Now
2
u(x, t) = F -1 {e-Dtk
F {ut = DUxx}

t(k, t)

e -x2/4Dt

2 7TDt

(3.70)

Thus at any time, the temperature field that starts as a delta function is a Gaussian distribution, with height
1/2 TTDtand width
4Dt. An important extension of this result comes from the

observationthat any function can be written as a superposition

of delta functions

uo(x) =

(3.71)

3.3 Linear

partial Differential Equations:

Propertiesand
solution

Thus the solution can be written


as the
1

superposition
s of the

2 TTDt
(3.72)

(b) Now the THREE-DIMENSIONALFOURIER


TRANSFORM
will be intro-

(kx, Icy,kz) = F3D{f (x, y, z)}

f (x, y, Z)e-ikXXe-ikyye-ikzz

dx dy dz

Here Fourier transforms have been applied


in all three spatial
ordinate directions. Defining the WAVEVECTOR
cok = (kx,19, kz),

(k) =

dx

Similarly
1

(2TT)3

The results presented above for one-dimensionaltransforms


can
be used to generate formulas for multidimensionaltransforms.
For example

F3D{Vf} = ik(k)
F3D{V v} = ik (k)
2),
F3D i V 2f I 1<

= k2x + k2y + k2z

F3D

Taking the three-dimensional Fourier transform of (3.69)yields

k2D = (t)

which is easily solved (by Laplace transform in time for example)


to yield
(t) = e-k2Dt

vector
296

formula

results

dk
k2Dteik.x

(27T)
2 TT

transform

three-dimensional

the inverse
Applying

u(x,t)

Partial Differential
Calculusand

-kDteikxXdkx

co

kiDteikyYdky 27T 00-eDteikzZdk

27T 00

x
2/4Dt

2 TTDt

e -y2/4Dt

2 TTDt

2 7TDt

_r2/4Dt

(2 7TDt)
this result, (3.72) generalizes to an arbitrary
Using
Ixl.
(x)
wherer =
initial condition uo
three-dimensional
u(x)

(2 TTDt)

from a wall with an imposed concenExample3.13:Steadydiffusion

tration profile

Solvethe steadystate diffusion or heat conduction problem

uxx + uyy

-00 < x < 00,0 < y < 00,with boundary conditions


in thehalf-plane

u(x,0) = uo(x) and u(x,y) bounded as y

00.

Solution

Basedon our experiencewith the previous example, we begin by con-

sidering
theboundaryconditionu(x, 0) = (x). Taking the Fourier
transformof the equationand boundary
condition in the x-direction

(theproblemis not unbounded


in y) yields

-k2(y) +yy(y)
Requiringthat the
solution be

(0) = F {(x)} = 1

bounded as y -+ 00, this has the solution

(y) =

3.3Linear

Partial Differential

Equations:

Properties

and

solution
1.
Now
the
110
inverse
that
297
from the pointtransform of
found. Recall
this
of
the variable Y
view of
solution
is a
andits inverse,
the
constant (we Fourier mustbe
transforms
involving
the
Fourier
are only transform
x-coordinate).
Therefore considering
we can
comand(cxx) = k(k/CX). Letting y = we
( ky)

e-l kly

have that

11

Try + x2
572

TTx2 +

Giventhis Solution, we can use (3.71) and


the
forlinear problems to determine that, given superpositionprinciple
an arbitrary
boundarycon-

u(x,y) =
Observethat (3.73) has the form of a
convolutionG(k)h(k),
and
(k)
uo(k)
=
with
,
i.e,
(k)
it is a convolution.
Thus,the
solutionarises directly from the convolution theorem

33.5 Green's Functions and Boundary-Value Problems


Overview

Thetransient diffusion problem we solved in Example3.12gaveus an


exampleof a GREEN'SFUNCTION,a solution to a differential equation
witha point source forcing. 4 We saw in that example that the solutions

foran arbitrary initial distribution u(x, 0) = uo(x) couldbe written


and the Green's function. Exercise3.36extends
asa convolution of

thatresult. In the present section we will develop the basic theoryof


Green'sfunctions, with a particular focus on boundary valueproblems.
Considera linear boundary value problem

Lu = f (x)

(3.74)

for a transientproblem
In quantum mechanics in particular, a Green's function
condition
a -function initial

likethis one is called a PROPAGATOR,


since it propagates
forwardin time.

VectorCalculus and Partial Differential

298

conditions that may be inhomogeneous


boundary
specified
in
G(x, xo) for the operator L is
function
the
solution
eral. The Green's
to

the solution for a point source placed


is
xo)
G(x,
Forexample
of interest.

The discussion
within the domain
xo
position
arbitrary
boundary conditions G should satisfy. For the
what
reveals
low
for self-adjoint problems present
functions
Green's
and
we will consider
Sturm-Liouville
consider
will
operators. Recall
specificinitial example
(2.33)from Section2.42
1

+r(x)u

a w(x) dx

v w dx

= p(b) (u'(b)v(b)

p(a) (u'(a)v(a) u(a)v'(a))


b

+ r(x)v

w (x) dx

w dx

Lettingv (x) = G(x, xo), this becomes

+ p(b) (u'(b)G(b,xo)
p(a) (u' (a)G(a,xo)
Applying(3.74)and (3.75)in the two inner products gives us that

+ p(b) (u'(b)G(b,xo)
p(a)
xo) u(a)G'(a,xo))
The inner product (u, (x
ranging leads to

u(xo) =

(xo), so rear-

evaluates to

dx

w (xo)

p(b)

xo)

+ p(a) (u'

xo)

(a, xo)) ]

Finally,we specify boundary conditions. For example, we can set inhoBe-

mogeneousDirichletboundary conditions u(a) = ua, u(b) =


cause we are not specifying homogeneous boundary conditionshere'

3.3 Linear Partial Differential Equations:

Properties and

solution

the operator L is said to be FORMALLY


for a differential operator requires thatself-adjoint;true
we impose
self-adjointness
ary conditions such that the boundary
homogeneous
terms vanish.
and u' (b) are not specified. If we require
In this caseboundu' (a)
G to satisfy
pirichlet boundary conditions G(a, xo)
=
homogeneous
the unknown boundary values u' do not G(b, xo) = O,however,
appear and we
then
lutionfor u in terms of f, G and the
arrive
boundary conditions at a so1

u(xo) w (xo)

f(x)G(x,

dX+

Therefore,given the solution G(x, xo) to the

= G(b, xo) = 0, we can find the

(3.76)

problem LG =

xo),
solution
to
Lu = f for any
f through (3.76). Note that (3.76) is closely
A-1b of the algebraic problem Ax = b,analogous to the solution
with G
of A-I. Example 2.15 shows a derivation of this playingthe role
formulafor a specificproblem. Because that example already imposes
Dirichletboundary conditions, reworking it with u(x) =homogeneous
xo) would directly yield the Green's G(x,xo) and
f(x) =
functionfor the
Dirichletproblem.

The above discussion focused on a Sturm-Liouvilleproblem,


which
is formally self-adjoint. For a non-self-adjoint operator, the Green's

functionfor the adjoint operator satisfies


L* G* (x,xl) =

Xl)

(3.77)

alongwith appropriate homogeneous boundary conditions. In general,


the position of the source is arbitrary, which is why we let its position
herebe Xl, which is generally distinct from xo. From the definitionof
the adjoint

(wehave chosen homogeneous boundary conditions on G and G* so


theboundary terms vanish), and inserting (3.75)and (3.77),yields

((x xo),

(x,xl)) =

Xl))

Thisreduces to simply

(xo,xl) =

(3.78)

Vector Ca/cu/us
300

Thisresult

of the
is the analog

and Partial Differential

matrix adjoint result

to the Poisson Equation


Solution
Green's Function
identities provide the foundation
Green's
for
dimensions,
We focus
here on the
In multiplesolutions based on Green's functions.
developing
Poisson equation
solution of the

-v2u = f(x)

specified below. The Green's function of in


conditions
_
with boundary
terest here satisfies

= (x

(3.17),
Green's second identity,

xo)

(3.79)

with v replaced by G, is

xo) V2u (x)) dV(x)


(u(x) V2G(x,xo) G(x,

- G(x, xo) vu(x)) ndS(x)


differential volume and surface elementsas
wherewe have written the
us that it is the independent variable.
explicitfunctionsof x to remind
evaluating the integral containingthe
Inserting(3.3.5)and (3.79)and
-function yields

u(xo) =

G(x,

dV(x)

ndS(x)

(3.80)

requiring
If u satisfiesDirichletboundary conditions u = us on S, then
that G = 0 on S yields a solution for u

u(xo) =

G(x,

dV(x) us

xo) dS(x) (3.81)

the term
5Weput a negative sign in front of the Laplacian here so that physically,
function
f(x) represents a source of heat, chemical species, etc., and thus the Green's
represents a point source. Someauthors do not use the negative sign.

3.3 Linear Partial Differential

Equations:

Properties

and

solution
whereG/n n VG. A Green's
pirichlet boundary conditions is function
sometimessatisfying
TIONOFTHEFIRSTKIND. If u satisfies
called a homogeneous
we apply
Neumann
GREEN's
u/n = js, then
homogeneous
FUNCto
boundary
O
the
tions G/n
Green's function, Neumann
conditions
in which
boundarycondicase the
solution for

u(xo) G(x,

G(x,

and G is a GREEN'S FUNCTION OF THE


Evaluatingthe solutions (3.81) or SECONDKIND.

dS(x)

(3.82)

(3.82)
the solution to -V2G
xo) with the requiresus to
determine
ditiOnS.To do this, it is useful to let G appropriateboundary
be
con+ GB. In this sum,
written as the
parts: G =
is
sum
of two
called the
It is a solution to the
FUNCTION.
FREE-SPACE
equation
GREEN'S
domain, and contains the singular behavior Lu = in an unbounded
the point source. The boundary correctioninduced by the presenceof
GBsatisfiesLGB
singularbehavior is contained in GO, and is
= O(the
determined
satisfy
by
G
specific boundary
ment that
conditions on S. We the requirewillfind GOO
for L V2 in two dimensions.
and CYB

For the purpose of obtaining the free-space


Green'sfunction,we
place the source at the origin: xo = 0. Because
the -function
has no angular dependence, we will seek a two-dimensional
V2G00(x) that is only a function of r. Therefore,at solutionto
every
in the domain except the origin, GOO
(r) satisfies the equation point

r dr dr
Thesolution to this is simple

Goo(r) =

Inr + C2

Weset c2 = 0; any constant component of the solution can be incorpo= (x)


rated into GB. To find Cl we first integrate the equation
over any volume V (area in this case) containing the origin
(x) dV = 1

Recallingthat V2

theorem to the
V V and applying the divergence

left-handside of this expression yields that

s n

dS = 1

Vector Ca/cu/us and Partial Differential

302

14Qtio

to evaluate if we let V be a circle


The integral is simple in which case
Of
the origin,

surrounding

radius

s n
Therefore

1
27T
Letting r = Ix - xo l, the free-space

becomes
tion for V2 in two dimensions

Goo(x XO)=

-1 In

Green's
fun

Ix xol
(3.83)

To determine GB,the shape of the domain and the boundary


tions must be specified. We will take the domain to be the
half-plane
N< x < 00,O < y < 00and seek a solution that vanishes as
y
co
In the case of Dirichlet boundary conditions, GB satisfies
G00ony 0. Wecan solve this problem using the
withGB
"method
of images." Since GNrepresents the field due to a point source
position xo = (xo,yo), if we place a point sink (an "image"or at the
"reflection" of the source) at xo/ (xo, yo),symmetry shows us that
field due to the source-sink combination will be zero at y = 0 the
(Figure
3.8). Therefore we set

-1

In Ix

This satisfies V2GB 0 in y > 0 because the sink is in the imageregion


y < 0. Thus the total Green's function is given by

= = In Ix xo/ = In Ix
27T

2m

27T

In

XOII

Ix D
XO/ I

Finally, the solution, (3.81), becomes


-1 00 oo
2m o -00

Yo

Ix xo I

us (x)
(x xo) 2 + Yo

f (x,y) dx dy

dx

Iff(x,y) = 0, this solution reduces to what we found using Fourier


transforms in Example 3.13. For the solution with Neumann boundary

3.3 Linear Partial Differential

Equations:

Properties

and

solution
303

Figure 3.8: A source (as indicated


by the +)
in the
position (xo, yo) and
Physical
an image
domainat
sink (-) at
shaded region is
(xo, -yo).
outside the
The
Physical domain.
source and sink have
equal magnitude
Because
and are the same
and opposite
distance from
sign
the Plane Y
fields due to them
= O, the
cancel out on
that line.

conditions,(3.82), CJBwould have to satisfy


GB/n =
0. In this case CJBis the field due to
an image source
rather
position XOI.
sink at

on
than a

The simple geometry used here required


only one "image
point" to
boundary
conditions.
the
satisfy
Nevertheless, the
geometrydoes not
needto be much more complicated to require
many or even an
infinite
numberof image points. The infinite strip, -00
< x < 00,0< y <
1,
infinite
number
an
of image points since
requires
the image point we
use to satisfy, say, the boundary condition at y =
0 will changethe
= l, which must

be compensatedby another
imagepoint,

fieldat y

and so on ad infinitum. As a practical matter, often using


one image
two
the
boundaries
of
provides an adequate
for each
approximation.

BoundaryIntegral Formulation of the Laplace Equation


Equations(3.81) and (3.82) require the availability of the solution for
the Green's function with the appropriate boundary conditionsand in
the domain of interest. In some cases, as we saw above,this solution

is availablein closed form but often it is not. To addressthis situation,we step back to (3.80). In developing this equation, the boundary
conditionson G have not yet been specified. For example,it is valid if
welet G =
which has a simple closed-form solution, (3.83).Using
thischoice and letting f (x) = 0 so that we are consideringthe Laplace
equation, (3.80) becomes

u(xo) =

Cx,xo)
u(x)
dS(x)
- u(x)
CAx,xo)
n
n

(3.84)

Vector Calculus and Partial Differe


ntial

304

EquQti0hs

Above,we have taken xo to be a point within the domain


solution to this equation could then be inserted into (3.84)t
solution at any point within the domain. We will derive
for the case where the domain is the interior of a bounded volume
Thereis an important subtlety in doing this, which arises

fact that GN/n changes sign as xo crosses from one Sidfrom the
boundaryto the other. Considera vertical boundary define
line x = 0 (with the outward normal pointing to the right) and
let
xo = (xo,y). Takingthe limit xo 0 corresponds to approaching
the point (0,y) on the boundary
G00(x, xo)

xo-0

lim

G00(x, xo)

1
xo-0

xo

27Txo + Y2

sgn(xo)(y)
2

where the last step is accomplished by recognizing

Ix I

as a delta

family, see Section22.5. Thus this term is singular as xo approaches


the boundary, and the sign depends on the side from whichit approaches. Usingthis result, and recalling that here xo is approaching
the boundary from the left (interior)

lim u(x)
xo-s s

Gco (X, XO)

dS(x) =

u(x)

G00 (x, xo)

dS(x) u(xo)
(3.85)

Here the integral on the boundary must be evaluated in the senseofits


CAUCHY PRINCIPAL VALUE

G00(x, xo)

lim

dS(x)

where SEis the portion of S within a tiny radius E of xo.


Finally,inserting (3.85)into (3.84) yields that for points xo on the
boundary
1
u(xo) =

(x, xo)

u(x)

u(x)

(x, xo)
GOO

dS(x)

(3.86)

2
n
n
If Dirichletboundaryconditionsu = g are imposed, then the lefthand side and the second integral is known and the boundaryvalues

3.3

Partial

Linear

Techniques

Differential

Equations:

Properties

and
of u/n are determined by
solution
the solution
305
1mPosed,then of this
the unknowns. If u is impos
the
u/n on the remainder, then ed on some boundary
boundarywhere u is imposed u/n is an part of thevaluesof u are
boundary,and
and
closed form solutions
to (3.86) viceversa.
can be
its importance goes beyond
these.
that Laplace's equation, a
on aobtained in special
partial
fundamental cases,but
mulated as an integral equation
differential
level,it
original domain. on a practical whose domainequation,canbeshows
is
reforcomputational approach to level, it forms the the boundaryof the
basis of an
solving the
problems, the BOUNDARY
Laplace
important
ELEMENTMETHOD. equationand
tegrals in (3.86) are discretized,
related
In this
approach,
leading
to
the ina system of
equations whose unknowns are
linearalgebraic
values of u and

u/n at points
on the

3.3.6 Characteristics

and D'Alembert's
Solution to the wave
Equa-

The wave equation


utt

= C2 V 2 u

(3.87)
governs wave propagation in many physical
contexts,
tromagnetic waves (light), vibrations of strings and includingelecmembranes,and
sound propagation. In one spatial dimension, the equationis

utt = c 2uxx
which was introduced

(3.88)

in Section 3.3.1 as an archetypal hyperbolicequa-

tion. Following the change of variable procedure introducedthere,we


find that

= x ct and

yields(3.21)with g = 0

x+ct.

Rewriting (3.88)in these coordinates

solutionof the
Wecan easily integrate this twice to find the general

waveequation

ct) +F2(x + ct)

Fl(x
=
+F2(n)
FIG)
u(x, t) =

right-movinganda lefta
of
equationas an
It says that any solution is a superposition
wave
the
understand
and
conditions
moving wave. Usually, we want to
initial
two cases of
at
100k
we
so
initial-value problem,

then combine them to get a general

result.

Vector Ca/CU/USand Partial Differe


ntia/

306

First, consider the initial

Equations

condition u(x, O) = uo(x),


ed

to
0 the above general solution
and its . n'

but no initial velocity. At t


derivative become

u(x, 0) uo(x) = Fl(x) + F2(x)


- -CFI' + cF
ut(x, 0) = 0
The latter equation integrates to yield Fl F2, and using this factinthe
first equation gives Fl = F2 2110.Thus the solution for these
initial
conditions is

u(x, t) = uo(xct) + uo(x+ ct)


The initial condition splits immediately into two identical waves,one
travelingto the right and one to the left. These waves have the same
shape, but half the amplitude, of the initial condition. In Contrast
to
uxx,
ut
which
equation
smooths
the parabolicheat
discontinuous
initial conditions as illustrated in Example 3.12, no smoothingoccurs
in the wave equation. If an initial condition contains a discontinuityat
a point x, this will simply propagate along the characteristic directions
= constant, = constant.
Nowconsider a struck string rather than a plucked one. Theinitial
0. There is no initial
condition is u(x, 0) 0, ut(x, 0) = vo(x)
deformation, but there is an initial velocity. Now at t = 0 we have

= O= El (x) +F2(x)
ut(x, 0) vo(x) = CFI'+ cF
This tells us that Fl
find that

F2and that vo
1

2cF. We can integrate thisto

x+ct

2c o
Similarly

Fl(x ct) =

xct

2c o

The solution is Fl + F2, which is


1

x+ct

20' x ct

partial
Linear
3.3

Differential Equations:

Techniques

Physical domain x < 0


xo

Properties and

Image domain

solution
307

xo

Figure 3.9: An initially right-traveling wave in the domain


x

< 0 reflecting across a wall where u = 0,


as solved using superposition of a left-traveling "image"with
oppositesign.

The complete solution to the initial-value problem is the sum of the


abovetwo cases. This is D'Alembert's solution

u(x,t)

I
uo(x ct) + uo(x+ ct) +

x+ct

2C xct

Wehave only considered the very simplest hyperbolic equation here.


For example, if the coefficients a, b,c depend on position, then the
characteristicsare curved. The references contain extensiveinformation about more complex hyperbolic problems.
Becausethe wave equation is linear, we can superpose multiple solutions to form another solution. As an application of this fact, imagine
a pulse traveling rightward toward a boundary at x = 0, at which the
boundarycondition is u = 0. At time t = 0, the pulse is centered at
x = xo. To understand this situation, recall Figure 3.8 and the "method
of images" analysis of Section 3.3.5. Applying the same idea here, we
placean "image" pulse of the same shape but opposite sign at the position x = xo(which is outside the physical domain) and make it move
leftwardas shown in Figure 3.9. Now the real and image pulses will
eventuallyoverlap, and by symmetry they will satisfy u = 0 at x = 0.
Oncethe "image" pulse enters the physical domain, it is no longer an
image,but a component of the true solution. The implicationof this
COnstructionis that when a wave hits a boundary where no deformation is allowed, it reflects but with a change of sign. Whathappens if
theboundary condition is ux = 0?

Partial Differential
Vectorcalculus and
Equations

308

Methods
Transform
3.3.7 Laplace
PDEs with Laplace
solution of several linear
the
trans.
Nextwe illustrate with some experience, Laplace transforms are
forms. For a user
method for solving linear, low-dimensional PDE
powerful
s
Laplace transform of a PDE,
bly the most
the
taking
After
usually
in closedform.the time variable, the result is a linear ODEin the trans.
wthrespect to
can often solve this ODE.To perform the

inverse
form function. Werequire some inverse formulas for transforms
With
transform,we then
inverse formulas next and then solve
these
develop
singularities.We
transform function

some example PDEs.

Let the

p(s)
a(s)

of q(s), which is assumed to havem


havesingularitiesat the zeros

simple zeros

a(s) = 0

Theinverseof this Laplacetransform is given by the followingformula

f(t) =

n=l

anesnt

an

(3.89)

whichis usually called the Heaviside expansion theorem. When(s)


and Q(s)are polynomials,the coefficients an can be derived usingpartial fractions. But the result applies to more general cases as werequire in the two examples below, where Q(s) = Sinh v/k-s
and Q(s)=
Whenthe zeros of Q(s) are higher than first order, f (t) is a linear
combinationof products of polynomials and exponentials of time,and
the coefficientsare more complex. Let the zero sn have order rn, n l, 2, ... m. Then the inverse is given by
(3.90)

6The
singularities of complex-valued

points, and essen-

functions are poles, branch


tial singularities(Levinsonand
is the smallest
integeri such that q(Sn)/(S Redheffer, 1970). The order Of a zerofirst-orderzeroe
sn)t is nonzero, and a simple zero is a
Soweare assuminghere
that the function f (s) has m simple poles.
Weare in good

a(s)
company. Heaviside also used the
expansion for the case Of
sinhxs (Vallarta,
1926)(Heaviside, 1899, p. 88).

partial
3.3 Linear
Techniques

Differential Equations: Properties


and Solution

coefficientsani, for i
Tile

309

1,

1, 2, ...m, are given


by

ill which

denotes the ith derivative of ((s) evaluated at s = sn. For


(sn)
and
a background in complex variables, Exercise A.2provides
students with
establish (3.90) (and hence also (3.89)),which requires insomehints to
transform by performing the contour integral (2.7).
vertingthe Laplace
transforms to solve the reaction-diffusionequaNextwe use Laplace

wave equation. We will see that the


tionand the only simple zeros and we will use transformin both
(3.89)for calculating
problemshas
the inverse.

and diffusion in a membrane


Example3.14: Reaction
describes diffusion through a membranein which
Thefollowingmodel
componentA decomposes by a first-order reaction. The membrane
initiallyhas zero concentration of A. At t = Othe concentrationat the
sideof the membrane at x 0 is abruptly raised to concentration
maintained at zero concentration.
andthe other side is
PDE

2CA

CA

DA x2 - KcA

BCI

=O

BC2

(a) Definethe dimensionless variables


tDA

cm

and showthat the model reduces to


PDE

2c

z 2

kc

BCI
BC2

c(z,0) = 0

KL2

Vector Calculus and Partial Differential

310

Equations

the only dimensionless


in whichk = KL2/DAisThis dimensionless
parameter
problem.

parameter is
pearing in the
kno
modulus
Thiele
in
the
chemical
as the Thiele number or
engineering literature (Rawlings and Ekerdt, 2012, p. 363).
rate to the diffusion rate It
dicates the ratio of the reaction
(b) Take the Laplace

transform of your model (also the


boundary
differential

ditions). solve the resulting

equation and

bouns

sinh( s +
z))
s sinh s + k
(c) Apply the final-valuetheorem to C(z,s) to find the steady-state
solution Cs(Z).

(d) Take the limit of this solution as k 0 for the zero-reactioncase.


Does your solution satisfy the diffusion equation?

(e) Sketchthe solution cs(z) for a range of k values and showthe


effect of reaction on the steady-state concentration profile.
(f)Let p(s) = Sinh

s+

z) and Q(s) = s sinhv/7k,

and find

the zeros sn of a(s). Also find the value of


at the
zeros of a(s). The followingformulas may be helpful: cosh(iu) =
cos(u), Sinhiu isin u.

(g) Invert the transform and find c(z, t). Check that the solution
satisfies the PDEand boundary conditions.

Solution
(a) Inserting the defined dimensionless variables in the PDEgives
2c
L2

and rearranging gives

2c
z 2

2c
Z2

KL2
c
DA

kc

Partial Differential Equations:


Properties and
Solution
Techtliues

3.3Linear

the dimensionless variabies


in the boundary

OKzLKL,

311
and

Simplifyingthese expression gives

C(Z,T)
C(Z,T) O
C(Z,T) O

0 KZ KI,

T 0

(b) Taking the Laplace transform of the PDE and BCSgives


d 2C

d 2 k
z

CII, s) 0

The solution of the ODEcan be written

C(z,s)

acosh s +

z) +

z)

and we use the two BCSto find the constants a and b. Wehave
1

bsinh s + k

so we have

s sinh s + k

which gives for the Laplace transform of the solution


(z,s)

z)
sinh s +
s sinh s + k

VectorCa/cu/us and Partial Differentia/

312

QQti0hs

the
(c) Applying

final-value theorem gives

(z) = lims(z, s)
Sinh s
= lim s

= lim
cs(z)

z)

s sinh s + k

sinh

- z)

s-o Sinh s + k
Sinhv/k(l z)
Sinh v/k

x for small x gives

sinhx
(d) Usingthe fact that

limcs(z)

Yes, the solution

satisfies the steady-state

diffusion

boundary conditions

d2cs(z)

equation and

cs(0)
-1

(e) Theconcentrationprofile (z) versus z for a variety of rate constant k are given in Figure 3.10. We see that a large reactionrate
constant prevents species A from diffusing very far into the membrane.

(f) Sincethe zeros of sinu are u

nr, n

0, 1, 2

the zerosof

sinhu are u = nni, n = 0,


8 The zeros of sinhv/s
are given by sn = (n27T2 + k), and for these roots, wehave

that sn + k = nni, in which we choose the positive squareroot.


Thereforethe zeros of the denominator q (s) are given by
s = {0, (n27T2 + k)},

Theseare simple zeros so the inversion formula in (3.89)is applicable. Differentiatinga(s) and evaluating Q'(s) at the zeros
8SeeExercise3.48 for a proof that these are the only zeros of sinu for u G C,

partial Differential Equations:


Linear
33
Properties
chrliaues
('e

and solution
313

0.8

0.6

2
0.4

10

30

0.2

100
0.4

0.2

0.6

0.8

Figure 3.10: Concentration versus membrane penetration


distance
for different reaction rate

constants.

gives

Q'(0) = sinhNQ
Q' ((n 2TT2 + k)) =

(n 27T2 +
2nTTi

Evaluatingp (s) at the zeros gives


p(0) = sinhQ(1 z)

+ k)) = isinnr(l z)

(g)Puttingthese terms together in (3.89)gives

C(Z,T) =

sinhVk(1 z)
Sinh vfk
n=0

2rn

n 2TT2+ k

(n2TT2+k)T

and Partial Differential


Calculus
Vector
314

Noticingthe n
C(Z, T)

vanishes,
0 term

we can rewrite the solution


as

z)
Sinhk(l --Sinh

(1)n+1Trn

34 in Table A. 1.
entry
to
Compare also

wave equation
the
Solving
Example3.15:
utt = c2uxx on x e [0, 1] for a stringWith
equation
wave
Revisitthe
0, and the plucked string initial con.
u(l,t)
=
u(0,t)
fixedends
ut(x,()) = O. Solvethis equationusingthe
uo(x),
=
u(x,0)
dition,
Compare the solution to D'Alembert's solution.
transform.
Laplace

prefer
Whichform do you

and why?

Solution

remove the velocity c and simplify our work.


Firstwe defineT = ct to

The problem is now

uTT= uxx

u(x,0) = uo(x),

LIT(x, 0) = 0

u(0,t) = 0, u(1,t)=0

xe

(0,1)

T 0

Takingthe Laplacetransform with respect to the time variable gives

u(x,s)xx

= suo(x)

with transformedboundary conditions


We obtain a second-order nonhomogeneous differential equation for
the transform. We already have solved this problem in Chapter 2, and
obtainedthe Green's function. The solution is therefore

in which

sinh(sE)sinh(s(l x))
sinhs

sinh(sx)sinh(s(l 9)
sinhs

3.3Linear

partial Differential Equations: Properties


and Solution

Techniques

315

we expect, G(x, s) is
Noticethat, asboundary-value problemsymmetric in (x, k) because
second-orderrequire a Laplace inverseis self-adjoint. Next we the
invert
for the following
s). We
form

sinh(as) sinh(bs)
sinhs
sinhs has simple zeros at s
I-ITT
i With
Noticethat
in
(3.89)
to
given
obtain
formula
usethe
p (sn) = sinh(nTrai) sinh(nTTbi) = sin(nTra) sin(YITrb)
q' (sn) = cosh(nTri) (1p
inverse is
Thereforethe
(1/1+1sin(nTra)
E
= 00

f (T)
inTtT
Substituting e
gives

COS(n1TT)+ isin(nTtT) and combiningterms

cos
f(T) = 2 E (-1/1+1sin(nTta)sin(nTrb)
n=l

Noticethat the function is now real valued as it must be. Usingthis


resultto invert the Green's function gives

(1P

x)) COS(nTTT) < x


E)) COS(nTtT) > x

11=1 (1) sin(nTrx)

E)) = (1P +1

Butnoticingthat

G(x,

reduces this to

sin(nTtx) COS(nTTT)

=2

Substitutingthis into the solution gives


1

u(x,t) = 2 E sin(nTtx) cos(nTTt) uo)


71=1

Definingthe Fourier coefficients representing the initial condition

an = 2

uo)

Vector Calculus and Partial Differential

316
we have finally

u(X, T) =

an sin(nrrx) COS(nTTT)
n=l

time variable
Returningto the original
gives

u(x, t) =

with the substitution

(3.91)

ct

an sin(nrx) COS(nrct)

not resemble D'Alembert'ssolution


Noticethat this solution does
provided a Fourier series
The Laplacetransform has equation. It is easy to seerepresentation
that the solus
of the solution to the wave
Taking two x derivatives
u
tion satisfies the wave equation. derivatives gives uTT = gives

T
c 2(nn) 2u
(nn)2U,similarly, taking two
wave
satisfies the
equation. Thezero
so utt = c2uxx and the solution because all the sine
terms vanishat
boundaryconditionsare satisfied
satisfied because of the Fouriersex = 0, l. The initial condition is
immediately that the solutionis
ries representation of (x). We see
periodic (in time) with period T = 2/c since all the cosine terms have
also convenient if we wishto
this period. The Fourier series solution is
analyze the frequencycontent of the solution, which is often a quantity
of interest when modeling sound propagation.
D'Alembert'ssolution, on the other hand, provides the nice structural insight that the solution splits into two waves travelingin opposite directions. But then we also require the additional insight fromthe
method of images to enforce zero boundary conditions and extendthe
solution to the (x, t) values where x ct < 0 or x + ct > 1, for which

uo(x ct) or uo(x + ct) is not defined.

3.4 Numerical Solution of Initial-Boundary-VaIue Problems: Discretization and Numerical Stability


and
Chapter 2 introduced numerical methods for solving initial-value
here to
boundary-value problems. These approaches will be combined
solve initial-boundary-value problems

u
+ Lu = f (x), Bu(S, t) = h, u(x, 0) = uo(x)
t

form, we could
91fwe knew enough about the problem to propose a solution of this
is that it is
here
transform
Laplace
the
arrive at this answer more quickly. The value of
solution to apply
prescriptive. Youdo not have to know (or guess) the structure of the
the method.

4 Numerical

Solution of Initial-Boundary-Value

and Numerical Stability


Discretization

Problems
317

L is a differential operator that contains all the


spatial derivaS is the boundary of the domain, and B is an operator
tweS,
. es the boundary conditions. To treat this problem that debe discretized using the
the spatialapproaches of Chapter
dependencewill
ordinary differential equations in
the form of a 2 to
yielda set ofproblem. Then the time-integration
normal
approaches
also intro2 can be used. This approach is
ducedin Chapter We will see that
sometimes
calledthe
a central issue in
OFLINES.
this approach is
METHOD
of time integration, which is
the
now closelycoupled
numericalstability
to
discretization (Press, Teukolsky, Vetterling,
spatial
and Flannery,
the
1986).
Strang,
1992;
Anyof the methods introduced in Section 2.9 can be used for spatial
discretization.In the weighted residual formulationfor one spatial
we look for an approximate solution (x, t); a truncated
dimension,
(discretized)series of basis (trial) functions (j (x); the difference now
is that we allow the coefficients in the series to depend on time. That
is

Notethe similarity of this expression to those arisingin the separation


ofvariablestechnique. We assume for the moment that the basis functionssatisfy the boundary conditions and define the residual or error
by

IN

+ LuN f (x)
t
Theresidual is now forced to be orthogonal to the set of N test functions(Pi;that is (R, (V)= 0, i 1, 2, , N. In the Galerkinmethod the
testfunctionsequal the trial functions so this conditionbecomes

Ifwelet Mij = W, (i), Aij = (14j,

theweightedresidual conditions as

and bi = (f,

canwrite

= -M -1 Ac + M-l b

Thisis a set of linear ODEs (an initial-value problem) for the vector
coefficientsc in the series for uN. We have reduced a partial different

equationto a system of ordinary differential equations,

318

VectorCalculus and Partial Differentia/ Equations

method is used, then there are only IV


tau
Galerkin
the
If
equations, where IVI,cis the number of bound
differential
ordinary
c
boundary conditions add ATI,algebraic

ary conditions. The explicitly solved for the last NL,c valuesequations
of c and
Typically, these can be
ODEs.
the
into
the formulas substituted if we use the collocation approach.
Nowwe
A similarresult arises operators in L by their matrix
approxima_
replace the spatial derivative
operators. This yields
differentiation
tions, the collocation

in the interior of the domain


(Xi) + LNu(xj) = f (Xi)for
u(Xi) = uc(Xi) on the boundaries
obtained by inserting the COIlocation
Here LNis the matrix operator
differentiation operators.
approaches, the PDE has been re_
In both Galerkin and collocation
we know how to solve these.
duced to a system of ODEs.In principle,
stability considerations that
In practice, though, there are numerical
arise because the matrices derive from the approximation of derivative
operators.

3.41 NumericalStability Analysis for the Diffusion Equation


To get an initial idea of the stability issues we face when numerically
solvingPDEs,we look at the diffusion equation in one dimension,
ut = Duxx
in an unbounded domain. Taking the Fourier transform of this equation gives t(k) = k2D(k), for all real values of k. This is a system
of linear ODEswith eigenvalues A = Dk2. If we want spatial resolution of wavelengthsas short as 27T/kmax,an explicit Euler scheme
= 2/ (Dkfnax)to ensure stability. Definwould require At < 2/max
2
Tr
ing emin= kmax as the smallest wavelength resolved, we can rewrite
this stability limit as
-1
AtD
12

< 2r2)

This result shows that, to within a numerical constant, the time step
for explicitEulermust be shorter than the time scale for diffusion over

a distance Cmm.
A similar result holds when finite element or finite difference methods are applied. For simplicity, we will consider a finite difference ap-

Solution of Initial-Boundary-Value
3 4 Numerical and Numerical
Stability

Problems
pjscretization
the diffusion equation,
prooation to
using the
central difference
foruj-l - 211j +
-D
dt

spacing between
whereh is the 2 that the finite mesh points x j and u
chapter
element discretization
from
identical
an
form
using hat funcfor the second
tionsleads to
derivative.The
approximationto this ODEis
forward
Euler

(n) DAt ( (n)

an approach initially developed by von


FolloMng
Neumann,we will
periodic
solution
to
spatially
this
a
equation: u(n) = eikxj= eikjh
seek
(The
arbitrary.
full
solution
is
is a superposition
wherek
unbounded
domain, this yields an exact over all k.) In
a periodicor
solution
discretizedproblemin a bounded domain it works very wellto the
when
kL> 1 where L is the domain size. Substituting into the equation
gives
above

2DAt

2DAt

coskh eikjh

(n)
2DAt

2DAt

+ hQ
cos khl is the growth factor, which for nuHereG = 11 VV< 1. When k = O,G 1, whichmakes
mericalstability must satisfy
physicalsense because k 0 corresponds to a constant function, which
doesnot decay by diffusion (there are no gradients). As k increases, G

decreases,taking on its most negative value when kh


stabilityat this value of k requires that

2DAt

IT.To maintain
(3.92)

Indeed,one common indication of numerical instability in a solution is


theobservationof "sawtooth" patterns with a length scale closeto h.

Equation(3.92) is the key result of numerical stability theory for


parabolicdifferential equations and is sometimes called the diffusive
COUrant-Friedrichs-Levy
(CFL) condition10. It mirrors the result we
foundabove using Fourier transforms. to within a constant, the time
given in the
The true CFLcondition was derived for convection problems and is
followingsection.
10

VectorCalculusand Partial Differential Equations

320

than
step At must be smaller

the time 112/1)required for diffusion

the time step if high spatial reSOIUtion


is
a very severe restriction on boundary layers.
with
problems
required, as in
means that for problems where
dif.
This severe stabilityrestriction
high),
not
implicit
is
number
integration
fusion is important (Peclet
second-order
The
Adams-Moulton
techniquesare almost alwaysused.
difference
approach used here
method (AM2)is popular. For the finite
AM2becomes
u (11+1)

IDAt

(n)

2u

2 h 2 k j-l
method. The linear system that
This is called the CRANK-NICHOLSON
must be solved at each time step is tridiagonal so it can be factored
quickly.

3.4.2 NumericalStability Analysis for the Convection Equation


Wejust considered diffusion, so it makes sense now to look at convection. The transient convectionequation in one dimension (also called
the FIRST-ORDER WAVE EQUATION)
(3.93)

where v is a constant velocity. The Fourier transform of this is =


ikv. Nowthe eigenvalueis purely imaginary. Recall that imaginary
eigenvaluespose a problem for many time-integration schemes; many,
including forward Euler and RK2,are never stable for problems with
imaginaryeigenvalues.
Usingthe central difference formula
u
x
Equation (3.93)becomes

U(Xj+1)U(Xj-1)

duJ
vAt (n)
dt
2h uj+l
The same right-hand side arises in the finite element approximation

with hat functions. The forward Euler approximation is

(n)_ vAt (n)


2h ku j+l

(3.94)

3.4 Numerical solution of

Initial-Boundary-Va1ue
Problems,

whichis sometimes called the


forward-time
cretization. It is first-order accurate
in time center-space
and
(FTCS)
To analyze stability we again
second-order
disseek a solution
accurate
cn

nitudegreater than one when k h sinkh


for the convection equation, as we
guessed from
above,which revealed that the
eigenvaluesof

use a different approximation for the


spatial
wemight expect that we should

eikxj

the Fourier
the convection
analysis
operator

only use derivative.In


computing
the
when
informationfromparticular
solution at the
next
physicalproblem, convection carries
time
the valuestepafter
all,in the
of
u
should ideally
downstream,
so
valuesupstream of it. Applying this
only be

idea, we replace determinedby


above by a forward or "upwind"
the
difference. For v centraldiffer> Othe forward

(n) vat

Thisgives the growth factor

Thestability condition IGI < 1 will hold if

vAt
vAt as
DefiningC =
the COURANTNUMBER,
the stability conditionbe-

comes

C < cmax

(3.95)
wherein this case Cmax 1. This is the COURANT-FRIEDRICHS-LEVY
Physically,
CONDITION,
CONDITION.
often simply called the COURANT
it tells us that the time step must be smaller than the timeit takesfor
convectionat speed v over one mesh unit h. Byreplacingthe central

difference,which has second-order accuracy in space, with an upwind

gained
difference,we have lost an order in spatial accuracybut have
small

One
stability.And anyway, the method is still first order in time.
velocitycan
the
where
problems
Complicationof this method is that for

Vector Calculus and Partial Differential

322

Equations

to take care that the appropriate


changesign, it is necessary
upwind
If downwind differencing is used, the

difference is used.
approximahon
unstable.
is always
without use of upwind differences.
Stability also can be gained
The
method is a simple modification to the FTCSdiscretizas
LAX-FRIEDRICHS
at point xj is replaced by the average
tion where the present value
of
j 1
the values at points j + 1 and

u (n+l) 2

(n)
u j+l +

u
211 t j+l u j 1

(3.96)

By applying the average, this change effectively introduces a small

amount of smoothing or "numerical diffusion" into the time-integration


process. This can be seen explicitly by rewriting (3.96) so that it has

the form of (3.94)with an additional remainder term that indicatesthe


difference between the two methods
(n)

vAt

(n)

u j _1)

(.n)
2u(.n) + u J+I)
(3.97)

The remainderterm has very nearly the form of a central difference


approximationto the second-derivativeoperator, and in fact thisex-

pression is precisely the FTCSapproximation to a convection-diffusion


equation with artificial or numerical diffusivity 2At
h2

uxx
ut + VuX
2At

This diffusion term is enough to stabilize the method: using the von
Neumannanalysis the stability criterion is found to be very similarto
what we found for the upwind scheme but is now insensitive to the sign
of v

IvlAt ICI<I
=
h
All the methods developedso far for the convectionequationare

first order in time, so even if the stability condition is satisfied,the soon


scheme builds
lution may not be very accurate. The LAX-WENDROFF
01+1/2)
the Lax-Friedrichsscheme to yield second-order accuracy. Let uj1/2
on
be intermediate values at the midpoint t + At/ 2 of the time step and
"half-meshpoints" xj h/2. Lax-Friedrichs,(3.96) is used to generate
these intermediate values
(11+1/2)

j1/2

(n) h _ VAt l/ (n)

211 kujl

3.4 Numerical Solution of

Initial-BoundQFYVQ1ue

Thissolution is used in a modified

Problems

FTCSstep
to

generate
the

j +1/2

solution

u(+1/2)
Eliminatingthe intermediate values,
this can
be rewritten
in the
more
n

211

-u(

(VAt)2

j-l

2u (n)

Thisis almost identical to 3.97; the


difference is
diffusivtyhas the value v2At/2. The
that now
the
stability
the
since
now,
and
method
1
is second
conditionis artificial
be set very close to the stability limit order in time, the againICI<
time step can
for many purposes. Lax-Wendroff and and still Yieldenough
related methods
accuracy
are thus widely
pure convection does not change
the
tion,convection only carries the initial amplitude of an initial
condition downstream. condithe methods described here, some amplitude
In allof
damping
precisely
when
except
k = O or C 1. we
occurs (IGI< 1)
care most about
this damping
whenkh is small, corresponding to length
scales
that arelargecompared to the grid size, i.e., IGl should be very
close to unity for all
scales of interest. If we care about length scales
length
close to h, then we
have
h too big; h should

made
always be chosen to be much
smallerthan
thelength scales over which the true solution varies.
Taylor-expanding
2 around kh = 0 yields
IG1
IG12 = 1 - (1 and

IG12 = 1 - c 2(1 - c 2) (kh)4


4

for Lax-Friedrichs and Lax-Wendroff, respectively. The latter is substantially better, since the deviation from IG12 1 scales as (kh)4
rather than (kh) 2

3.43 Operator Splitting for Convection-DiffusionProblems

limitsof the
Thecases above represent the low and high Peclet number

generalconvection-diffusion equation

ut + VuX = Duxx

and Partial Differential Equations


Calculus
Vector
324

would use central differ


equation
_
for this
the
for
convection
scheme
Lax-Wendroff
explicit
A simple diffusionterm and
encesfor the
(n) -2u(n) + Uj+l
DAt kujl
(n+l) _ u (n)
u
(vAt) 2 (n) 2u(jn) +
(n)
)
+
u(n)
(n) Uj -1
-- kuj-l
vAt

large, the diffusive term controls


very
is
number
factor that scales as
growth
a
Unlessthe peclet
to
becauseit leads
We could use an imthe stability, It-I from the convective term.
but this entails solution of a large
rather than the
problem,
whole
the
every time step if the problem is
plicit method on
(at
problem
non-self-adjointmatrix preferable to use an implicit method only on
nonlinear).It wouldbe is self-adjoint. A popular solution is called
which
the diffusivepiece, an explicitmethod is used for the COnvective
SPLITTING;
OPERATOR
for the diffusive ones. For example,Laxmethod
implicit
an
terms and
convective terms and Crank-Nicholson
the
for
used
be
Wendroffcan
executed in two steps:
often
for the diffusive.This is
applied, to give an intermediate solution
are
terms
convective
The
1.

(n)\
(n) vAt ( (n) uj-l)

(vAt)2 u (n)
2h2 k j-l

values instead
2. Crank-Nicholsonis applied, using the intermediate
of the values at step n

IDAt

2 h2 \uj-l
IDAt (
2 112\uj-l

+ uj+l
(n+l)

In methods like this, because the diffusion terms are evaluated implicitly, the stabilitylimit is set by a Courant condition on the convective
terms. In fact, one might also get away with an unstable (e.g.,FTCS)

method for the convectionterm, relying on the implicit treatment Of


the diffusionterm to stabilize the overall result. There is not generally
a good reason to do this.

3.5 Exercises

3.5

Exercises

325

Exercise3.1: Gradient formula from


gradient
a cubic volume with
one

consider

definition
corner at
the origin
(X,y, z) = (Ax, Ay, AZ). In this case the
and
integral
definitionthe
opposite
of the
corner
gradientgradat

dS
Becausewe are going to shrink the volume to
zero, we
can make
the truncated
Taylor-

y
z
wherethe derivatives are evaluated at the origin.
Combine these
to derive
the formula
arguments
hold
same
for
The
the

other two terms,


so
c
grad c = ex
c
x + ey y
z =

Exercise3.2: Derivatives of unit vectors in polar


(cylindrical)
coordinates
Bytaking limits in polar coordinates, derive the formulas
L=o

er

-Q=o

for the derivatives


of the unit

eo

Donot refer to Cartesian coordinates in your derivation.

Exercise3.3: Divergence of the flux in polar coordinates

Derivean expression for the divergence of a flux in polar coordinates,


V q,
q is an arbitrary vector. Do not use Cartesian coordinates in your derivation.in which

Hint: the answer is

1 qe
(rqr) +

r r

Exercise3.4: Gradient and Laplacian in spherical coordinates


RepeatExample 3.1 and find expressions for V and V2 for sphericalcoordinatesshown
in Figure3.3. Do not refer to Cartesian coordinates in your derivation.Then derivethe

resultusing the h and g formulas provided in the text. Whichmethoddoyouprefer


and why?

Hint:the answers are in Table 3.1.

Exercise3.5: Fundamental identities in vector calculus

(hereu, v, and
UsingCartesian tensor notation, derive the following identities

vectorsand

is a scalar).

(b)

(C)V (uu) = U Vu + UV u

w are

Vector Calculus and Partial Differentia/

326

Exercise 3.6: Cross-product


(a) Verify that Cijkklm
results.

identities

vu

(v w)u

(d) (u x v) XW = (u
= (vu uv)

(f) V X (V X V) =

jl. Now use this to derive


the follo
Wing

iljm

VV.u-V.
(b) v x v x u =

(e) (u x v) x w

Equations

llv112) - (v.

Exercise 3.7: A special case of Leibniz's rule

DeriveLeibniz's rule for the special case where the volume V is a cube whosesize
is constant but is moving with velocity q. In other words, explicitly show that the
contribution from the motion of V becomes Js mq ndA.

Exercise 3.8: Adjoint of curl


Find the adjoint of the curl operator with Dirichlet boundary conditions.

Exercise 3.9: Volume as surface integral


(a) If A is a constant vector and r = llxll, then show using Cartesian tensor notation
that
and
(b) Show that

(c) Use this result and the divergencetheorem to derive a formula for the total
volume T = JV dV of a region V in terms of an integral over the surface S of the
volume.

Exercise 3.10: Curl theorem


Use the divergence theorem and results of vector algebra to show that

VxvdV=

n x v dS

Exercise 3.11: Poisson equation in a no-flux domain

Consider the Poisson equation

v 2u = f
T' n
in a volume T with (no-flux)boundary
condition n Vu = O on the boundary S Of

is the outward unit normal vector on the

boundary.

3.5

Exercises

(a) Use the divergence theorem to show


that a

(b) Iff

v for some vector v,


what

necessary

condition

3.12:Helmholtz decomposition
Exercise

condition
for the

must v

existence

satisfyfor
the

resultof

Underrather general conditions it is possible


to write a
vector field
q(x) as
of
Problem
3.5, find
Usingthe results
independent
equations for
and v in

Exercise3.13: The Stokes equations for


viscous flow
Thestokes equations for the velocity u
and pressure

termsof

p in a viscous
flowdriven
bya

Theseequations can be written in matrix-vector form


as
where

If u = 0 on the boundary S of the flow domain V, show


that the Stokesoperator
A is

then

(AU,V) = (U,AV)
wherethe inner product is given by

u v dV+

pq dV

Exercise3.14: Differentiating functions of a matrix and matrix determinant


Derivethe following two differentiation formulas.
(a) Usethe polynomial expansion of a matrix function to showthat

dt

scalar functionf().
in which g() =
is the usual derivative of the
For the special case of f (A) = In A for A nonsingular, we obtain

dt

dt

Partial Differential

and
Calculus
vector
328
nonsingular

A,

differentiate

to
(b) For previouspart
the
of

3.15: Euler

(1.35) with respect to scalar t and use the

show that

det(A) tr A-l A
d detA =

dt

expansion

continuum.
of the
given by
coordinates is

(3.99)

formula

reference
representthe
u
Letcoordinates
Exercise

Equations

position of a point in a deformable

)dVx

dVu = det

of the
is the determinant

Jacobian matrix of the transformation.

matrix differentiation formula of thepre.


in which Jacobianis nonsingular, use the
sumingthe

establish that
viousexerciseto
detCux
expansion (or dilation) formula.
Euler
the
as
whichis knoval

profile in tube flow


Temperature
3.16:
Exercise
Check the following points.
in (Bird et al., 2002, p.384).

ReadExample12.2-2
Equation 12.2-21 into 12.2-23. Then exchange the orderof
(a) SubstituteY from
integral can be performed. Then,makea
integration,and show that the inner
change of variable to obtain

t-1/3 e-t dt

whichis equivalent to 12.2-24.


(b) Evaluate the derivatives

2
x 2

(c) Verifythat the temperature profile in (a) satisfies the differential equationin
Equation12.2-13.Usethe chain rule and the results from (b).
(d) What is the numerical value of r (2/3)?

Exercise3.17:The error function and some useful integrals

The error function is defined by

erf (z) =

e t2dt

Notethat

e t2dt=

The complementary
error function defined
by

erfc(z) = I -- erf (z)


erfc(z) = 2

00
e t2 dt

3.5

Exercises
329

function and the complementary


the error
error function.
sketch
(a)
function f (x)
considerthe

cos(2tx)dt
Differentiatef(x) and then integrate by parts to show

dx + 2xf(x)

that f satisfies
the dif-

initial condition for this ODE?


Whatis the
the ODEand show that
Solve
(c)

(d)Lett

au and x = b/(2a) and show that

(e) Integrate

Tr

exp
2a

cos(bu)du

the previous equation with respect to b on the interval


to,

the order of

integration and show finally that

-a2u2sin(u)

m. Change

du

3.18:Other useful integrals


Exercise

the following function with respect to x


Differentiate

e2aberf (ax + ) + e-2aberf (ax andderivethe indefinite integral (Abramowitz and Stegun, 1970, p. 304)

2x2_

4a

e2aberf (ax + ) + e-2aberf (ax

+ const. a *0

Usetheindefiniteintegral to derive the definite integral


-a2x22
dX =

b
e-2aberfc(

ax) e2aberfc( h + ax)l


(3.100)

Fromthis result, show that


x22
ax =

bO

(3.101)

Thisintegralarises in transport problems in semi-infinite domains (seeExercises3.19


and 3.23).

VectorCalculus and Partial Differentia/

330

useful Laplace
Some
3.19:
Exercise

transforms

transform pairs are useful for solving transient


Laplace
Thefollowing

f(t)
k>0

e kv/

(a) Establishthe first entry by

erfc

taking the Laplace transform of the function

f (t) = erfc
Use the definition of the Laplace transform, switch the order of integration,and
use Equation 3.101.

(b) Establishthe second entry by differentiating the first f (t) with respect to t.
(c) Establish the third entry by differentiating the second f (s) with respecttos.

Exercise3.20:A transform pair for reaction-diffusion problems


The followingLaplacetransform pair is useful in solving problems with simultaneous
diffusion and first-order reaction (Carslaw and Jaeger, 1959, p. 496)
e-kvs
(s cx)v

f(t) = 1 at e-kerfc

'Xerfc

Derive this result by using the convolution theorem and the last entry in the tablein

Exercise 3.19. You will also require the integral (3.100).

Exercise 3.21: Integral representations of


The followingintegral representation of Ko proves useful in applying Laplacetransforms to solve the diffusion equation
1

20

(3.102)

The following argument provides a derivation.


(a) Denotethe integral by

fo(x) =
1

20

Differentiate with respect to x and show


1 d
dfo

x dx

dx

= 2x2

3.5

Exercises

(b) Next use integration by parts to show


-(x2t+I)dt =
X2

331

substitute this result into the

equationand
showthat
fo

Thereforefo is of the form

satisfiesthe

fo(x) = allo(x)
+ a2K0(x)
with some constants al , 612.
(c) Given the integral defining f)

what value
does

(d) Nextuse l'Hpital's rule to show that

approachfor
largex?

lim fo(x)

In(x)
(x) -In(x) as x
It is known that
O(see
p. 375)),so we conclude that
= 1 and fo(x) =(Abramowitzand stegun,1970,
Ko(x).

Exercise3.22: More useful Laplace transforms

Usethe integral representations

of the modified

Bessel function Ko
derived in Exercise3.21 to derive the following Laplace transform pairs.

f(s)

f(t)
1

e-
1

1<1(kv), k > o

e-TE

Thesetransforms are also useful in solving transient heat-conductionand diffusion

equations (see Exercise 5.9).

Exercise3.23: Time-dependent heating of a semi-infinite slab


Considera slab of infinite thickness, density p, heat capacity Cp, and thermalconduc-

tivityk with a surface at x = 0. The boundary conditionsare

(a) Definethe following scaled variables

VectorCalculusand Partial Differential

332

energy equation
Show that the

reduces to
x 2

with boundary

Equations

conditions

T>0

9(0, T) = 1

parameters in the problem, but there is also no


Notice that there are no
natural
problem.
length scale for this
transform of the PDEand show that
(b) Take the Laplace

e-xv
What assumptions did you make?
using Exercise 3.19 to obtain
(c) Take the inverse transform

(x, T) = erfc
Plot 6(x, T) as a function of x on 0

10 for T = [0.01, 0.1, 1, 10, 100, 1000].

(d) Show that the proposed solution satisfies the PDE and BCS.

Exercise 3.24: Partial fraction expansion


Weoften teach inversion of Laplace transforms by so-called partial fraction expansion

For example, given

(s

b)(s c)

Notethat a b c ensures a, b, and c are simple zeros of the denominator polynomial.


The function f (s) is first written as a summation of simpler fractions

(s

b)(s c)

(3.103)

and the coefficients A, B, and C are determined. Then the inverse is simply
f (t) = Ae at + Bebt +

(a) DetermineA, B, and C in the partial expansion approach and determinef(t).


(b) Apply (3.89)with P(s) = 1 and Q(s) (s a)(s b)(s c), and find f(t)
using (3.89). Whichmethod do you prefer and why? Notice that (3.89)canbe
applied when the denominator q (s) is more general than a polynomial as shown
in Example 3.14.

3.5 Exercises
Exercise3.25: Transient heat

333

conduction in
a finite
slab
.
we have a one-dimensional slab Withends (-kVT)
at uniform temperature To. Just after t
located
= 0, the at x
temperatureTl and held at this

two ends
are
we Wish
to findimmediately
the transientraisedto
(a) Write the PDE and (three)
boundary
solution
at x = L, x -L, and t = O.
conditions for
How many
this situation,
parameters appear
i.e.,
(b) Choose nondimensional
in this conditions
temperature,
problem?
spatial

temperature.

position, and

TI-T()

Express the PDE and BCSin these


nondimensional

tk

time variables
as fol-

variables. How
manyparam-

(c) Take the Laplace transform of the PDE,


apply the boundary

cosh(vGz)
S cosh

conditions,and show

(d) For what s values in the complex plane is


9(z, s) singular?

(e) Invertthe transform and find 9(z, T).


Hint: the answer is
(1)n

(n + 1/2)TT cos((n + 1/2)Trz) e

(3.104)

(f) Show that (z, T) satisfies the PDEand boundary conditions


at z = 1. Does
the solution satisfy
the initial condition? How would you check
this?

(g) Whatis the steady state,

(z), i.e., take the limit of

as T 00.

Exercise3.26: Heat conduction in a cylinder and a sphere


Let'schange the body in Exercise 3.25 from a slab to a cylinderand a sphereand see
whathappens. Again assume the body is initially at uniform temperatureTo.Just after
t = 0, the outer boundary at r = R is immediately raised to temperatureTl and held
at this temperature. We wish to find the transient solution T(r, t) for theseproblems.
i.e.,
(a) Write the PDE and (three) boundary conditions for the cylindrical body,

in this
conditionsat r = R, r = 0, and t = 0. Howmanyparametersappear

problem?

(b) Choose nondimensional


lows

time variablesas
temperature, radial position, and
tk

T-To

pCpR2

fol-

Howmanyparamvariables.
nondimensional
Express the PDE and BCSin these

eters appear in this problem?

and
Calculus
Vector

Partial Differential

Equations

334
1

cylinder

0.8
1

0.6

slab

0.8

0.4

0.6

0.2

0.4
0.2
O

0.2

0.4

0.6

0.2

0.8

0.6

0.8

sphere

0.8

0.6
0.4
0.2
0
O

0.2

0.4

0.6

0.8

of slab, cylinder, and sphere from


Figure 3.11: Transient heating (3.106). Dimensionless temperature
(3.104),(3.105), and
(E, T) versus at T = 10-4, 10-3, 10-2, 0.1, 0.5.

(c) Take the Laplacetransform of the PDE,apply the boundary conditions and find

s) for the cylinder.Youdo not need to invert this transform.

(d) Writethe PDEand (three) boundary conditions for the spherical body, i.e., condi-

tions at r = R, r = 0, and t = 0. How many parameters appear in this problem?

(e) Choosethe same nondimensional temperature, radial position, and time variables as follows

T-TO
Tl-To

tk
pCpR2

Expressthe PDEand BCSin these nondimensional variables. How many parameters appear in this problem?

(f) Take the Laplacetransform of the


PDE,apply the boundary conditions and find
for the sphere. Youdo not need to
invert this transform.

Exercise3.27:Transient
solutions for slab, cylinder, and sphere
Wewish
to plot and compare

for the slab, cylinder,

the temperature profile 9(;, T) versus

and sphere geometries.

at differentT

Exercises

335
(a) The

transform for the cylinder is given by

o,s) =

slo(v')

zeros of the denominator.


Findthe may want to use the Bessel
function relations
Hint: you
(Abramowitz
and
(ir)/i
Stegun,
1970, p. 375, Io(r) = Jo(ir) and
Jl
(r)
11
9.6.3.).
(3.89)and show that the inverse is given by
Use
(b)
1

(3.105)

(c) The

transform for the sphere is

9,s) = sinh(vf9
Sinhvr
Findthe zeros of the denominator Sinhv". Note that the denominator
has a
doublezero at s = 0 because both s and Sinh vanish
at s = 0.

(d) Becauseof the double zero, we cannot use the inversionformula(3.89),which


assumes simple zeros. But notice the following fact. If the Laplacetransforms
f(s) and @(s)satisfy
(s) = f(s)
s

then their inverse transforms satisfy

g(t) = f(t')dt'
Therefore define

= s,s) = sinh(v'9
Sinh

Use(3.89)to invert this transform and show


(1)t1+1

n=l

(TITT)
sin(nrr9e -n

(e) Performthe time integral and show that

Noticethat the following series is the Fourier sine series of the linear function
(Selby,1973, p. 480)

(1)71+1

so we have

n=l

sin(nTT9 =

(l)tt+l

sin(nTTE) e-

n2Tt2T

(3.106)

vector

calculus and

Partial Differential Equations

336

T) versus at several for the slab


profile 0(;,
temperature
results should resemble Figure 3.11
of the geometries. Your
plots
slowest? Give a physical eXp1anation
(f) Make and sphere
quickest? The
cylinder,
the
up
geometryheats
results.
for these

c'

separation of variables
in a sphere by
diffusion
Transient
spherical geometry described in Exercises
Exercise3.28:
problem in the
the information in

diffusion
using
the transient by separation of variables,
it
solve
3.26and 3.27.

Fourier
Exercise3.29:

Example 3.8

series

1 on the interval x
function f(x)
e
coefficients for the
..
5x,
cos
series
3x,
cos
x,
Find the Fourier the odd cosine terms {cos
[-77/2,TT/2Jusing

f(x) =

an cos(2n + l)x

n=0

(3.104) in Exercise 3.25.


the initial condition of
Usethis result to checkorthogonality property f_ r 12cos nx cos mxdx = -L
Hint:first establish the
discussed in Section 2.4.1.
by taking inner products as
1,2,.. .. Then obtain the an

oati(

Exercise3.30: Plancherel's formula


Plancherel'sformula states that
2TTJ

If (x) 12 dx =

dk

the expression on the right. In


Beginwith the expressionon the left and from it derive
00
complex.
e
can
be
j
(k)
and
=2
general,both f (x)
l).

Exercise3.31:Green'sfunction for a fourth-order problem


(a) Usethe Fouriertransform technique to solve the ordinary differential equation

d4G

d2G

Usea computeralgebraprogram or a math

that is required to find G(E).

handbook to perform the integral

(b) ThefunctionG from the

previous problem is the Green's


function for the ordiL = d4 - 272+ 1.
Use this Green's function to solve
Lu = f(x),u(x)
0 as x oo,where f
(x)
= 1 if 0 < x < 1 and 0 elsewhere.
Usenumericalintegration
to approximate the
solution for IXI < 10.
nary differential operator

Exercise3.32:A square
with one

solve v2u = o in
a unit

u = 0 on x

heated wall

square
0, x = I and y domain O < x < 1, O < y < 1, with boundary conditionS
= 0, and u

= I ony = 1.

tat

tkk

3.5

Exercises

337

3.33: separation of variables for the wave equation

variables to solve
separationof

the

utt = c2uxx
followingboundary conditions

u(x,()) 1 -x,
that your

u(O,t) =

x > 1/2'

= O,

SOIutioncan be put in the D'Alembert form


u = Fl (x --ct) +
F2(x + ct).

Separation of variables for a partially


heated sphere
Exercise3.34: variables to solve
for the

of
steady-state temperature
Useseparationbottom half the surface temperature
distribution in a
is kept at T = O,
spherewhose Use the transformation X =
1.
cos (t) to convert the and whosetop
=
T
at
is
half
equationin the
to Legendre's equation. Note that the
eigenvalues
polarangledirection
of Legendre's
+ 1) for positive integers n. The
=
corresponding
equationare polynomial
eigenfunctionis
Pn(X). Explicitly find the first four
terms in the expansion.
theLegendre
equation in spherical coordinates (r, O,(b) is
Laplace's
1

r 2 sinc

sin 4)

Y2 sin2

2T
02

Exercise3.35:The Helmholtz equation


Considerthe wave equation

utt = c2 V2u

00
in the domain y > O, < x < 00,with boundary condition u(x,y = O,t) = f (x)eiwt
Thisequation governs sound emanating from a vibrating wall.

l. Byassuminga solution of the form u(x,y,t)


equationcan be reduced to

, showthat the

c0 2 v c 2 V 2 v = O

withboundary condition v (x, 0) = f (x). This is the HELMHOLTZ


EQUATION.
2. Findthe Green's function G = Geo+ GBfor this operator with the appropriate
boundaryconditions, using the fact that the Besselfunction Yo(r) ?lnr as

3. Usethe Green's function to solve for u(x,y, t).

Exercise3.36: Transient diffusion via Green's function and similaritysolutionapproaches


InExample3.12 we found that the transient diffusion problem

Gt = DGxx
has solution

e -(x-9214D(t-T)

Wehavechanged notation here to emphasize that this solution is the Green's function

fortransientdiffusionwith delta function source term f (x, t) =

"(t

- T).

vector Calculus

and Partial Differential

Equations

338
analogouS to (3.71), to find the
with a result
solution
along
result,
this
Use
(a)
problem
t)
ut = DUxx + f(x,
the initial-value
condition u(x, 0) = uo (x).
initial
and
00< x < 00
in the domain x > O with boundary
O
=
f
case
condition
the
=
condition u(x > 0, 0) 1. Use an image or symm
(b) Nowconsider
initial
in the unbounded domain,
u(0,t) = 0 and
this into a problem
where
convert
to
argument
in Exercise 3.17 may be useful. This
information
The
solution
can apply (3.72). transforms in Exercise 3.23.
is found by Laplace
of SIMILARITYSOLUTION.That
again by the method
is, 0b_
problem
this
problem
is
the
combination
(c) Solve
the
in
scale
=
length
serve that the only
convenient), and seek a solution

(the factor of 2 is

arbitrary but

u(x,t) = u(n)

the governing equation and applicationof the


Substitutionof this form into differential equation.
chain rule leads to an

ordinary

in a circular domain
Exercise3.37:Schrdingerequation

"quantum corral"an arrangement of atoms on a surface


The wavefunction for the
by the Schrdinger equation
designedto localizeelectronsisgoverned

w _ v2q,

t
Useseparationof variables to find the general (bounded) axisymmetric solution to this
problemin a circulardomain with (P = 0 at r = 1. Hint: if you assume exponential
growthor decayin time, the spatial dependence will be determined by the so-called
modifiedBesselequation. Use the properties of solutions to this equation to showthat
there are no nontrivial solutions that are exponentially growing or decaying in time,
thus concluding that the time dependence must be oscillatory.

Exercise3.38:Temperature profile with wavy boundary temperature


Solvethe steady-state heat-conduction problem

uxx + uyy

in the half-plane00< x < 00,0 < y < 00,with boundary conditions

A + Bcos cxx = A + 2(eiax + e-iX) and u(x,y) bounded


as y
00. Use the Fourier
transform in the x direction. How far does the wavy
temperature variation imposedat
the boundarypenetrate out into the material?

Exercise3.39:Domainperturbation
analysis of diffusion in a wavy-walled
slab
SolveV2T = 0 in the wavy-walled
domain shown in Figure 3.12. The top surface is at
y = 1, the left and right
boundaries are x = 0 and x = L, respectively, and the bottom
surfaceis y = ecos 2TTX/L,
where e < 1. Find the solution to O(E) using domain
perturbation.

3.5 Exercises
339

=O

Figure 3.12: Wavy-walled


domain.
Exercise3.40: Fourier transform for solving heat
conductionin a
strip
solvethe steady-state heat-conduction problem
uxx + uyy = O
in the infinite strip 00< x < 00, 0 < y < 1, with boundary
conditionsu(x,0) =
uo(x), u(x, 1) = 111(x). Use the Fourier transform in the x direction
to get an ordinary
differentialequation and boundary conditions for (k, y).

Exercise3.41: Separation of variables and Laplace's equation for a


wedge

Useseparation of variables to solve Laplace's equation in the wedge 0 < e < a, 0 <
r <
boundary conditions u(r, 0) = 0, u(r,

1,with

= 50,u(l, O)= 0.

Exercise3.42: Laplace's equation in a wedge


Againconsider Laplace's equation in a wedge, but now fix the wedge angle at tx= Tt/4.

should
Usethe method of images to find the Green's function for this domainwhere
the images be, and what should their signs be? A well-drawn picture showingthe posi-

tionsand signs of the images is sufficient. The first two images are shown.Theydon't
completelysolve the problem because each messes up the boundaryconditionon the
sideof the wedge further from it.

+ position of point source

Partial Differential
Equations
VectorCalculusand

340

form of
Exercise3.43: D'Alembert

the wave equation

= U(x -- at) where a


of the form u(x, t) WAVEEQUATION is to be deter_
solutions
for
(a) Bylooking
general solution of the
the
that
show
mined,
2u
2 2u
t2

X2

u(x,t) = f (x -- ct) + g(x + ct)

is

where f and g are arbitrary.

condition u(x, 0) =
find the solution with initial
Iv(x),
(b) Use this solution to unbounded domain. Pick a shape for w (x) and sketch
the
u/t(x, O) = 0 in an
solutionu(x, t).

Exercise 3.44: Heat equation in a

The solution to the heat equation

semi-infinite domain

ut = Duxx

uo (x) is
subject to the initial condition u(x, 0) =

u(x,t) =
find the analogous solution for
Use this solution and an argument based on images to
condition u(0, t) = 0 and
boundary
the same problem,but in the domain x > 0, with
with initial conditionu(x > 0, t = 0) = (x).

Exercise 3.45: Vibrating beam

The transverse vibrations of an elastic beam satisfy the equation


utt + KUxxxx

where K > 0. Use separationof variablesto find u(x, t) subject to initial condi= 0 and boundaryconditionsu(0,t) = u(L,t) = o,
tion u(x,0) =

uxx(0, t) = uxx(L, t) = 0. These conditions correspond to a beam given an initial


= c has
deformation and held fixed at its ends x = 0 and x = L. Hint: the equation
solutions = +8/ 4 and = ic 1/4 where c l /4 is the real positive fourth root of c.

Exercise 3.46: Convection and reaction with a point source


Use the Fouriertransform and its properties to find a solution that vanishes at
the ordinary differential equation

for

dx = au+ (x)
where a > 0. Recall that F -1

1 ealxl

Exercise 3.47: Green's function for Laplacian operator


(a) Find the free space Green's function Goofor the Laplacian operator in three dimensions. It is spherically symmetric.

3.5

Exercises
341

(b) show

a
that if u is solution to Laplace's equation, then
so is vu, as

(c) showthat

Eij

-;uis also a solution, for any

constant tensor

as

E.

3.48: Zeros of sine, cosine, and exponential in the


complexplane
definition of the exponential to complex
extendthe
argument

We

ex +iy = ex (COSy

z e C as follows

i Siny)

usual definitions of real-valued


we take the
ex
cosine
to complex arguments in , cosy, siny for x, y e R. We
and
sine
terms
the
of the exponential
extend
cos z =

iz

sin z = etz

eiz

definitions, find all the zeros of the following


Giventhese
functions in the complex
plane

function sin z.
(a) The
Hint:using the definition of sine, convert the zeros of sin z to solutions
equatione2iz = 1. Substitute z = x + i)' and find all solutions x, y e R. of the
Notice
that all the zeros in the complex plane are only the usual ones on the real
a,'ds.
(Answer:
only
cos
z.
the usual ones on the real
(b) The function
ds.)

(c) The function ez. (Answer: no zeros in C.)

Exercise3.49: A Laplace transform inverse

TheLaplaceinverse for the following transform has been used in solvingthe wave
equation

sinh(as) sinh(bs)
sinhs

a, be R

Findf (t) , and note that your solution should be real valued, i.e., the imaginary number i

shouldnot appear anywhere in your final expression for f (t).

3.50:Wave equation with struck string


Exercise
Example3.15 and the wave equation
= uxx on x e 10, for a string with
Revisit
=
fixedends U(O,T) = U(I,T) = 0, but the plucked string initial condition,

uo(x),

= v(x). Here there is zero initial deformation,but a nonzeroinitial

velocity.

(a) Solvethis equation using the Laplace transform.


(b) Notethat this initial condition requires an inverse Laplacetransform for

sinh(as) sinh(bs)
s sinhs
Showthat this inverse is given by

n=l

nTT

sin(nTta) sin(n1Tb) sin(nTTT)

(3.107)

and Partial Differential Equations


VectorCalculus

ti

342
(c) Denote

coefficients
the Fourier

as
for the initial velocity v (x)

v(x) =

sin(nmx)
n=l

(x) and
0)
(x, 0)
initial condition u(x,
mixed
the
Example
in
as
(x)
uo
3.15.
consider
of
Showthat
coefficients
(d) Next
the Fourier
Let an denote the mixed initial condition is
the solution for
cos(rtTtT) + sin(nTTT))
=
11=1

Exercise3.51: Wave

115:

initial condition
equation with triangle wave

sound and vibration of strings


to describe propagation of
useful
is
equation
on z e [O,L] for a
The wave
the wave equation utt c2uzz
string
Consider
membranes.
string
initial condition,i.e.,
and
the "plucked"
and
0,
=
t)
u(L,
=
t)
with fixedends u(0, and zero velocity at t = 0, u(z, 0) = uo(z),
= 0. The
fixed arbitrary position wave speed.
constant c is known as the
position as T = (c/L)t, x = z/ L, to remove the parameters
(a) First rescale time and
work. Show that the rescaled problem is
c and L and simplify your

LITT= uxx
u(x,0)

= uo(x),

CIT (x, 0) = 0

xe

(0, 1)

(b) Considerthe solution (3.91)given in Example 3.15. Establish that the solution
u (x, T) satisfies the wave equation, both boundary conditions, and the initial

condition.Establishthat the solution u(x, T) is periodic in time. Whatis the

period?

(c) Considerthe string's initial condition to be the triangle function depictedin


Figure2.32 with a = 0.1. Given the Fourier coefficients for this triangle function
from Exercise2.10,plot your solution at the following times on separate plots:

2. T = 0.45, 0.48, 0.49, 0.495, 0.4975, 0.50

3. T = 0.50,
4. T = 0.90, 0.95, 1.00, 1.05, 1.10

5. T = 1.90,

Providea physical description (comprehensible


to the general public) Ofwhatis
happeningas the wave equation evolves
forward in time. In particular, explain
what

the initial condition does just after


T = 0. Explain what happens when

waves arrive at the boundaries

x = 0, 1?

3531

-1

the

soluti(

3.5

Exercises
343

3.52: Numerical solution of the heat equation


Exercise
run) a program to use
(a) Write

Chebyshev

(and

withboundary

conditions

collocation to

ut = uxx

u(0,t) = 0,

solve the heat

equa-

=1

method and compare your solutions


Use the AM2
at a number of
Approximately how long does
times.
different
it take for the values of N
at
temperatureat
0.5?.
---0.9 to reach

(b) Howmany

terms in the exact solution are neededto find

u(0.9) = 0.5? (to

five percent precision)?

the timeat which

Propagation of a reaction front


Exercise3.53:
technique

for spatial discretizationand the


collocation
AdamsUsingthe Chebyshev
MATLAB
a
or
write
Octave
integration,
program
to
solve
the transient
Moultontime
problem)
-diffusion
reaction
2T
X2

Perform simulations for a long enough time that the solution reachesa
using = 0.05.
convergence checks to verify that your spatialand temporal

perform
steadystate, and adequate, i.e., that the solution does not change much whenthe
discretizationsare
resolutionis increased).

Exercise 3.54

(Courant)
analysis to find the growth factor and the stability
stability
Neumann
Usevon
method, (3.98).
conditionfor the Lax-Wendroff

Bibliography
Handbook of MathematicalFuncti0ns.
and 1.A. Stegun.
Washington, D.C., 1970.
M.
Standards,
rial
Bureau of
Basic Equations of Fluid
Tensors, and the
Mechanics.b
Vectors,
1962.
R. Aris.
York,
over
New
Publications Inc.,
Hassager. Dynamics of
Armstrong, and O.
Polymeric
C.
R.
York, second edition, 1987
R. B. Bird,
New
Wiley,
Dynamics.
Vol.1, Fluid
R. B. Bird, W.

&

N. Lightfoot.
E. Stewart, and E.

second edition, 2002.


Sons, New York,

Transport phenomena.

John

Analysis. Charles E. Merrill Books, Inc., Columbus,


Ohio
H. D. Block. Tensor
1978.

Hussaini, A. Quarteroni, and T. A. Zang. Spectral Methods.


C. Canuto, M. Y.
Springer-Verlag, Berlin, 2006.
Fundamentals in Single Domains.
Conduction of Heat in Solids. Oxford
University
H. S. Carslaw and J. C. Jaeger.
1959.
edition,
second
Oxford,
Press,

R. Courant. Methods of Mathematical Physics. Volume Il. Partial Differential


Equations. Wiley,New York, 1962.
W.M.Deen. Analysis of Transport Phenomena. Topics in chemical engineering.
Oxford University Press, Inc., New York, second edition, 2011.

M.D. Greenberg. Foundations of Applied Mathematics. Prentice-Hall,NewJersey, 1978.

O.Heaviside.ElectromagneticTheory, volume Il. The Electrician Printingand


Publishing Company, London, 1899.
N. Levinson and R. M. Redheffer. Complex Variables. Holden Day, Oakland,
CA, 1970.

E. Merzbacher. Quantum Mechanics. John Wiley and Sons, New York,second


edition, 1970.
J. Ockendon, S. Howison, A. Lacey, and A. Movchan. Applied PartialDifferential
Equations, Revised Edition. Cambridge University Press, Cambridge, 2003'

344

Bibliography

345

J. p. Hernandez-Ortiz.
Polymer
osswald and
Processing:
Hanser, Munich, 2006.
Modelingand
simulation.

Introduction to Theoreticaland
pozrikidis.
ComputationalFluid
C.
University Press, New York, 1997.
Dynamics.
OSford
A. Teukolsky, W. T. Vetterling, and
Art of Scientific Computing. B. T. Flannery.
The
C:
Numerical
Recipesin
CambridgeUniversity
1992.
Press,
Cambridge,
Advanced Mathematics for Applications.
prosperetti.
CambridgeUniversity
H. press, S.

press,

and J. G. Ekerdt. Chemical Reactor Analysis


Rawlings
B.
and Design FundaJ.
Hill Publishing, Madison, WI, second
Nob
mentals.
edition, 2012.
and R. C. Rogers. An Introduction to Partial Differential
M.Renardy
Equations.
York, 1992.

New
Springer-Verlag,

Standard Mathematical Tables. CRCPress,


S.M.Selby. CRC
twenty-first edition,
1973.

Brief on Tensor Analysis. Springer, New York,


J.J.simmonds. A
second edition,
1994.

I. stakgold. Green's Functions and Boundary Value Problems. John Wiley&

Sons,NewYork, second edition, 1998.

G.Strang. Introduction to Applied Mathematics. Wellesley-Cambridge


Press,
Wellesley,MA, 1986.

M.S.Vallarta. Heaviside's proof of his expansion theorem. Trans. A. I. E. E,

pages429-434, February 1926.

R.Winter.Quantum Physics. Wadsworth, Belmont, CA,1979.

probability,Random Variables,
and
Estimation

and the Axioms of


Introduction
4.1
Probability

engineers familiar with only deterministic


Forthose
models,we now
makea big transition to random or stochastic models in the
final two
text. Why? The motivation for
chaptersof the
including
stochastic
modelsis simple: they have proven highly useful in many
fields
scienceand engineering. Moreover, even basic scientific literacy of
demandsreasonable familiarity with stochastic methods. Studentswho
to primarily deterministic
descriptions of physical
havebeen exposed
initially
regard stochastic methods
processessometimes
as mysteridifficult.
We
hope
to
and
change
this perception, remove
ous,vague,
perhaps
even make these methods easy to
anymystery, and
understand
use.
To
achieve
to
this
goal,
enjoyable
we must maintain a clear
and
separationbetween the physical process, and the stochasticmodelwe
chooseto represent it, and the mathematical reasoning we use to make
deductionsabout the stochastic model. Ignoring this separationand
callingupon physical intuition in place of mathematical deductioninvariablycreates the confusion and mystery that we are trying to avoid.
Probabilityis the branch of mathematics that provides the inference
enginethat allows us to derive correct consequences from our starting
assumptions.The starting assumptions are stated in terms of undefinablenotions, such as outcomes and events. This should not cause
anyalarm,because this is the same pattern in all fields of mathematics,such as geometry, where the undefinable notions are point, line,
plane,and so forth. Since human intuition about geometryis quite
strong,however, the undefinable starting notions of geometry are taken
in stride without much thought. Exposure to games of chance may pro-

347

Probability, Random Variables, and


348

intuition
the samehuman

Estimation

about probability's undefinable


starting

set or space of possible outcomes which we denote


the
start
We
events, which are subsets of 1. We use the empty
be
'B
and
by '1.Let
event. Let A u B denote the event "either
impossible
an
set to denote
B denote the event "both A and B." Theclose
n
let
and
B,"
or
operations of union and intersection is intentional
analogy with the set
event c 'I, we can assign a probability to that
each
To
helpful.
and
The three axioms of probability can then be
event, denoted Pr(A).
stated as follows.
terms.

I. (Nonnegativity)Pr(A)

0 for all

Il. (Normalization) Pr(l) = I


Ill. (Finiteadditivity) Pr(AUB) =

e 1.

for all A, B e '1

satisfying n B = .
These three axioms, due to Kolmogorov (1933), are the source from
whichall probabilisticdeductions follow. It may seem surprisingat
first that these three axioms are sufficient. In fact, we'll see soon that
we do require a modified third axiom to handle infinitely many sets.
First we state a few immediate consequences of these axioms. Exercise

4.1 providesseveralmore. WhenA n B = we say that events

and 'B are mutually exclusive,or pairwise disjoint. We use the symbol

A \ B to denotethe eventsin set A that are not events in set B, or,


equivalently,the eventsin set A with the events in B removed. The

set A is then defined to be 1 \ A, i.e., 31 is the set of all events that are
not eventsin A. Wesay that two events and B are independentif
Pr(A n B) = Pr(A) Pr(B).
Someof the important immediate consequences of the axiomsare
the following
Pr() = o

Pr(A) + Pr(A) =
Pr(A)
If B A, then Pr() Pr(A)
Pr(A u B) = Pr(A) + Pr(B) - Pr(A n B)
Proof. To establish the first result, note that AU = A and An =
for all e 1, and apply the third axiom to
obtain Pr(A u ) = Pr(A) =
Pr(A) + Pr(). Rearrangingthis last equality
gives the first result

42

Random

Variables and the probability

Density

Function

349

second result note that


from the
establishthe
definitionof A,
we
the second result. Using
this second
then gives gives the
third
result.l
To obtain theresult and the
axiom,then A, then
fourthresult,
can be expressed
Bg
if
as
that
A
=
note
B u (A n B),
B) = . Applying the third axiom
n
(A
n
B
'B), and applying the first axiom then givesPr(A) =
n
Pr(A
givesPr(A)
pr()+
result,
fifth
we
express
Pr(B).
the
both u and
Toobtain
'B
as the union
exclusive events: AU'B = Au(AnB) with
mutually
of
B) u (A n B) with (A n 'B) n (A n B) = An(AnB) = ,
n
(A
. Applyingthe
and
gives
both
to
axiom
third

B) = Pr(A) + Pr(A n B)

Pr(B) = Pr(A n 'B) +


Pr(31n B)
equation for Pr(A n B) and substituting
solvingthe second
into the first
which
is
result,
known
as
the
fifth
addition
law of probability.
givesthe
to
the
first
due
result,
the probabilityof two
Alsonote that
mutually
zero.
is
events
exclusive

4.2 Random Variables and the Probability Density Function

Nextwe introduce the concept of an experiment and a random variable.An experiment is the set of all outcomes '1, the subsets f g '1
thatare the events of interest, and the probabilities assignedto these
events.A random variable is a function that assigns a number to the
possibleoutcomes of the experiment, X (w), w e 1. For an experiment
a finite number of outcomes, such as rolling a die, the situation is

simple. We can enumerate

all outcomes to obtain '1 = {1, 2, 3, 4, 5, 6},

andthe events, f, can be taken as all subsets of '1. The set f obviouslycontains the six different possible outcomes of the die roll,

{l},{2},{3},{4}, {5}, {6} e f. But the random variable is a different


idea.Wemay choose to assign the integers 1, 2, . .. , 6 to the different
events.But we may choose instead to assign the values 1 to the events

corresponding
to an even number showingon the die, and 0 to the
eventscorresponding to an odd number showingon the die. In the
firstcasewe have the simple assignment

1Notice
that we have used all three axioms to reach this point.

Probability, Random

Variables, and

Estimation

350

and in the

second

the
case, we have

assignment

cases, but we have chosen different


same in both
the
is
goals in our
physical
Theexperiment to reflect potentially different
variables
process.
random
to this random
modelingthat led
considerably more complex when we havean
becomes
The situation uncountablymany outcomes, which is the case when

experimentwith
random variables. For example, if Wemeasure
real-valued
the reactor as a random
werequire
reactor, and want to model

a
the temperature in variable of interest X (w) assigns a (positive, real)
process, the random
experiment (0 e 'I. If we let '1 =

R
of the
valueto eachoutcome
should
we
allow for the
immediatelyclear what
for example,it's not
individual points on the real number
subsetsf. If we allowonly the
enough set of events to be useful, i.e.,the
line,we do not obtaina rich

probability of achieving exactly some

real-valued temperature T is zero

to infinite sets of points, e.g.,


for all T e R. The events corresponding
a T b witha < b e R, are the ones that have nonzero probability.

If we try to allow all subsets of the real number line, however, we obtain

a set that is so large that we cannot satisfy the axioms of probability.


Probabilistshave found a satisfactory resolution to this issue in which
the events are chosen as all intervals [a, b], for all a, b e R, and all
countableintersectionsand unions of all such intervals. Moreover,we
modifythe third axiom of probability to cover additivity of countably
infinitelymany sets

Ill'. (Countableadditivity) Let


e '1,i = 1, 2, 3
able set of mutually exclusive events. Then

=
Wecan then assign probabilities to these

be a count-

+Pr(A2) +
events,

satisfy-

ef
ing the axioms. The random variable
X (w) is then a mapping from
(0 e to R, and we have
well-definedprobabilities for the events
x) =
: X(w)
all the foundationalelements x) for all x e R. At this point we have
that we require to develop the stochastic
methods of most use in
science and engineering.
The interested reader
may wish to consult
Papoulis (1984, pp.22-27)
and Thomasian (1969'
pp.320-322)for further
discussion of these issues.

42

Variables and the Probability Density


Function
Random

351

of the random variableso


FUNCTION
that
(x) is the probability that
wesaythat than or equal to x. The the random variable takes
function
less

is a nonnegative,
function and has the followingproperties
due to the
probability

value

asiomsof

if

lim
00

Wenext define
suchthat

=0

< X2

lim

=1

the PROBABILITYDENSITYFUNCTION,
denoted

00< X < 00

(x) ,
(4.1)

discontinuous FE if we are willing to accept generalized


Wecanallow
(delta functions and the like) for
Also,we can definethe
functionS
densityfunction for discrete as well as continuous random variablesif
weallowdelta functions. Alternatively, we can replace the integral in
(4.1)with a sum over a discrete density function. The random variable
maybea coin toss or a dice game, which takes on values from a discrete
setcontrasted to a temperature or concentration measurement, which

takeson values from a continuous set. The density functionhas the


properties
following
00

andthe interpretation in terms of probability


Pr(X1

TheMEAN
of a random variable
or EXPECTATION

is defined as
(4.2)

TheMOMENTS
of a random variable are defined by

xnpdx)dx

Random Variables, and Estimation


Probability,

the
3052d

first
meanis the

moment.

Moments of

about the mean

definedby

second
as the
defined
is
-TheVARIANCE
var)

moment about the mean

+ 2 (E)
($) ---22(E)
square
deviationis the
standard
The
=

root of the variance


2

(F3)

normal or Gaussian distribution is ubiquiThe


Normaldistribution.It is characterized by its mean, m, and variance,
tousin applications.
0 2, and is given

by

pdx)

2TT2

(4.3)

exp

mean of this distribution is indeed m


Weproceedto checkthat the
and that the density is normalized
and the varianceis 02 as claimed, the definite integral formulas
so that its integralis one. We require
dX
e x2
xe -x2 dx

(4.4)

(4.5)

x 2e -x2 dx

(4.6)

Thefirst formulamay also be familiar from the error function in transport phenomena

erf (x) =

e-u du

erf (00) = 1

The second integral follows because the


function e x2is even and the

functionx is odd. The third formula may also


be familiar from the
gamma
function, defined by (Abramowitz
rot) =

e -t dt

and Stegun, 1970, p.255-260)


(n 1)!

Random
4.2

Variables and the Probability

Density

the variable of integration using


t
canging
x 2 e -X2 dx =
2
o

Function
353

x2 gives

x2e-x2dx

ti/2 e-tdt

r(3/2)
calculatethe integral of the normal density as

follows

exp 1 (x m)2

0-2

change of variable
the
pefine

dx

whichgives

p; (x)dx =

du = 1

V-

from(4.4)and the proposed normal density does have unit area. Computingthe mean gives
1

27T2

x exp

dx

usingthe same change of variables as before yields

7T -00

(vQu + m)e-U2du

Thefirstterm in the integral is zero from (4.5),and the secondterm


produces

asclaimed.Finally the definition of the variance of gives


var() =

1
2Tr 2

-00

(x

exp

Changing
the variable of integration as before gives

var) =

u2e-U2du

dx

Probability, Random Variables, and Estimati0h

354

2
var() = (T

and from (4.6)

the random variable


for
notation
Shorthand
and variance (T2 is
bution with mean m
N(m,

a more
In order to collect

having a normal
distris

(T2)

useful set of integration facts for manipu-

we can derive the following integrals by


lating normal distributions,
For x, a e
integration in (4.4)-(4.6).
R, a

of
changing the variable

e {x2/adX

27Tv/

{x2/adX

x2e-gx2/adX

2na 3/2

Figure 4.1 shows the normal distribution with a mean of one and vari-

ances of 1/2, 1, and 2. Notice that a large variance implies that the random variable is likely to take on large values. As the variance shrinks to

zero, the probability density becomes a delta function and the random
variableapproaches a deterministic value.
Characteristic function. It is often convenient to handle the algebra
of densityfunctions,particularly normal densities, by using a close

relativeof the Fouriertransform of the density function rather than

the density itself. The transform, which we denote as


(u), is known
as the characteristic function in the probability and statistics literature.
It is defined by

Wk(t) = f(eit)

where we again assume that any random variable of interest has a


density PEG). Note the sign convention with a positive sign chosen on
the imaginaryunit i. Hence,under this convention,
the conjugate of the
characteristicfunction we(t) is the Fourier
transform of the density.
The characteristicfunction has a
one-to-one correspondence with the
densityfunction, whichcan be seen
from the inverse transform formula
1

27T

4.2

Random

variables and the Probability


DensityFunction

355

0.9
0.8
0.7

0.6
0.5

0.4
0.3
0.2

0.1

-3

-2

-1

distribution, with probability density


pe(x)
Figure4.1: Normal
(1/ 27T2)

variances are 1/2, l, and 2.

m)2/2). Mean is one and

note the sign difference from the usual inverseFouriertransAgain


form.Notethat multiplying a random variableby a constantn =
gives

= (e ita =

independent random variables


Adding

(4.7)

gives

(Pn(t) =
e
e itX1

wn(t) =

(Xl ) dX1

(x2)dx2
oo

(4.8)

normal distribution.
the
of
function
Wenextcompute the characteristic

Probability, Random

Variables, and

Estimati0h

356

function of the normal density


Characteristic
normal density is
Example4.1:
function of the

show the

characteristic

= exp

itm

t 2

function and the normal density


characteristic
the
Thedefinitionof
give

1
00

integration to
of
variable
Changingthe
itm

z = x m gives

27T

COStzdz
e -(1/2)z2/2

itm
2TT2
itmt22/2

in which we used the

definite integral

e-a2x2 cos bxdx

2a

how to derive this definite intediscusses


4.49
Exercise
line.
on the last
because

the
gral. Note also that the integral with

sin tz term vanished

sine is an odd function.

4.3 Multivariate Density Functions

variable but a
In applicationswe usually do not have a single random
vector and let
collectionof them. Wegroup these variables together in a
analogously
random variable now take on values in R't. Proceeding
FUNCTIONF; (x) is
to the single variable case, the JOINTDISTRIBUTION
definedso that

in whichthe vectorinequalityis defined to be the n corresponding

a
scalar inequalities for the components. Note that F; (x) remains
scalar-valued function taking values in the interval [0, 1]

Density Functions
Multivariate
4.3
357
variable case, we
single
the
define the
as in
JOINT
DENSITY
FUNCTION,

derivatives exist,
providedthe
or,

(4.9)
the scalar case, the probability that the
on values between a and b is n-dimensionalrandom
givenby
variable takes

b) =

Pr(a

dxn

al

covariance. The mean of the vector-valued


Meanand
randomvarithe vector-valued integral
simply
is
able
writingout this integral in terms of its components we have
J f

...dxn

J f

...dxn

...dxn

J J

Thecovarianceof two scalar random variables

cov,n)

is definedas

(n - (n)))

Thecovariancematrix, C, of the vector-valued random variable with

components
4, i = 1

n is defined as

Cij = cov(Ei, 9)

varl)
cov2,

covl,
var2)
cov(En, E2)

covl, 41)
cov2,

varn)

Probability,Random Variables, and

358

EstithQti0h

in terms of the components


gives
Again,writingout the integrals
(Xi

---f

(x)dX1 ... dxn


(4.11)

Noticethat Cij = cji, so C is symmetric and has positive elements


the diagonal.Weoften express this definition of the variance With
the

Noticethat the vector outer product xxT appears here, whichis


the usual inner or dot

n x n matrix, rather than

product

x TX,
Which is

Marginaldensity functions. We often are interested in


subset of the random variables in a problem. Consider onlysome

two
of random variables, e
and e Rm. We can consider vectors
the joint
distributionof both of these random variables
(x, y) or we
may
onlybe interested in the variables, in which case we can
integrate
out
the m variablesto obtain the marginal density of
dym

Analogously,to produce the marginal density


of
P17(Y)

we use

dxn

43.1 Multivariatenormal density


Wedefinethe multivariate
normal density of the random
p(x) =
in which m

1
e Rn is the

(x m) T
P -l (x

variable e

m)

(4.12)

mean and P e
itive definitematrix. We
is a real,symmetric,
posshow
subsequently that P is the
matrixof E. The notation
covariance
multivariatenormal densitydetp denotes the determinant of P. The
is well defined only
lar, or degenerate,
for P > 0. The singucase P Ois
discussed subsequently. Shorthand

43

Multivariate

Density Functions
359

variable having a
normaldistribution
for the random P is
covariance
and

we

also

find it convenient

to define the notation

P)

QCx --

that we can

write compactly for the normal with


mean m and co-

so
P
variance

(x m)

(4.13)

is real and symmetric.


Figure4.2 diseormmatar11fXorP
-1

3.5

2.5

2.5 4.0

Figure 4.2, lines of constant probabilityin


the multiASdisplayedin lines of constant

normalare
variate

(x m) TP 1(x m)

Tounderstand the geometry of lines of constant probability(ellipses


or hyperellipsoids in three or more diin twodimensions, ellipsoids

we examine the eigenvalues and eigenvectorsof a positive


mensions)
definitematrix A as shown in Figure 4.3. Each eigenvectorof Apoints
alongone of the axes of the ellipse. The eigenvaluesshowus how
stretchedthe ellipse is in each eigenvector direction.
If we want to put simple bounds on the ellipse, then we drawa
boxaround it as shown in Figure 4.3. Notice that the box contains
muchmore area than the corresponding ellipse and we have lost the
correlationbetween the elements of x. This loss of informationmeans
wecan put different tangent ellipses of quite different shapes inside
thesamebox. The size of the bounding box is givenby

length of ith side =


in which

Aii = (i, i) element of A


SeeExercise4.15 for a derivation of the size of the bounding box. Figure 4.3 displays these results: the eigenvectors are alignedwith the

Probability, Random Variables,

360

0.6
0.4

0.2

Figure 4.2: Multivariatenormal for n

2. The contour

ellipses containing 95, 75, and 50 percent lines show


probability.

x TAx = b

Avi

bA22

bA11
Figure 4.3: The

geometry

of quadratic form xTAx

= b.

eStith

MultivariateDensity Functions

36)
axes and the eigenvalues scale the
that is tangent to the lengths. The
ellipse
desof the box
ellipse are
lengthsof
si
the
to the
The mean and covariance
4.2:
of the
mple
variable
is distributed multivariate
the random
normallyas normal
in (4.12)
exp
E (x

the following facts of


integration. For
1. Establish
ze

withA e

1
z T A -I z dz

1
z T A -I z dz

(scalar)

zz T exp

z T A -Iz dz

(n-vector)
(27T)n12 (detA) 112A

(n x Il-matrix)

(4.14)

(4.15)

(4.16)

2. Showthat the first and second integrals, and the definition


of
mean,(4.10),lead to
Showthat the second and third integrals, and the definitionof
covariance,(4.11), lead to

Sowe have established that vector m and matrix P in (4.12)are

the mean and covariance, respectively, of the normally distributed

random variable g.
Solution

l. To compute the integrals, we first note that becauseA is real and


symmetric,there exists a factorization

A = QAQT

A-I = QA IQT

Random Variables, and


Probability,
Estimation
362

matrix containing the eigenvaluesOf


diagonal
is a orthogonal. To establish the first integral
which
in
z = Qx and change the
real and
is
transformation
Q
variable
and variable
use the
in (4.14)

of integration

exp

1
xTA-I x
IdetQl dx
2

T -1 z dz
Iz A

exp

XL2/i]dx

x /dxt

because QQT I, which makes det(QQT)


I
=
IdetQl
in which
= 1. Performing the integrals gives
(detQ)2= 1 so detQ

2Tr

(27TH/ 2

i=l

and wehave established the first result.


Toestablishthe second integral, use the variable transformation
z = Qx to obtain

zexp zTA Iz dz = Q

1
x exp xTA Ix dx
[2

Noticethat the ith element of this vector equation is of the form


(21 Xiexp xTA -l x dx =
(21

xie-B Xf/ L
dxi

eXi/kdXk = 0

Thisintegralvanishes
because of the first term in the product
Sincethe integral
vanishes for each element i, the vector of integrals is
therefore zero.

4.3

Multivariate Density Functions


363

third integral, we again


To establish the and
change the variableuse the variable transforz = (2x

nation

zz T exp

of integration
in (4.16)

-- zTA-I z dz =
2
1
xx T exp
xTA-I x det
Qdx QT
2

xxT exp

= QVQT (4.17)

1, and the V matrix is


in which, again, detQ
definedto be the
integralon the right-hand side. Examiningthe components
of V
wenote that if i * j then the integral is of the form
L x 2 /i

xjex;

dxj

The off-diagonal integrals vanish because of the odd functionsin


the integrands for the Xi and xj integrals. The diagonalterms, on
the other hand, contain even integrands and they do not vanish
ii

e-Xklkdxk

Evaluatingthese integrals gives

Vii =

Substitutingthis result into (4.17)gives


zz T exp
00

1
zT A I z dz = QVQ T
2

= (2Tr) n / 2 (detA) 1/ 2 QAQ T =

Random Variables, and Estimation


Probability,
364

integral result of interest.


the
established
have
the multivariate normal and the
andwe
density of

probability give
the
Using
the mean
2.
of
definition

xp; (x)dx
1
()

x exp
-00

1
2

integration to z = x
of
variable
the
Changing
1

m gives

(m + z) exp zTp-1z dz
2

m produces unity by
in whichthe integralwith because the integrand (4.14) and the
is odd.
integralinvolvingz vanishes
Nextusingthe probabilitydensity of the multivariate normal, the
definitionof the covariance,and changing the variable of integra-

tion give

(x)dx

(x ()) (x
1

(x
1

zz T exp
1

(2n)

dx

1
z TP -1 z dz
2

(detP) l /2P

Characteristic
functionofan functionof multivariate

as

m)

Il-dimensional
density. The
characteristic
multivariate
random variable
sz,is defined
WE(t) =
eitTx

00

p (x)dx

43

Multivaria

te Density

t is now

Functions

365

an Il-dimensional variable. The inverse transform


(2TT)n

is

it T x

one has the characteristicfunction of the entireranif


Notethat vector available, one can easily compute the characteristic

variable marginal distribution. We simply


set the components
doOctionof any zero for any variables we wish to
to
integrate

over to

t vector

function Q (tx, ty)


characteristic
its

cp(tx,ty)

exp i t tl

t}

x
Y

y)dxdy

in the characteristic function of n's marginal, (Pn(ty) ,


interested
Ifweare
the joint characteristic function to obtain it
tx ---Oin
we set

ty)

x
exp i 0 tT

eityy

(x,y)dxdy

Qn(ty)
4.3: Characteristic function of the multivariate normal
Example
that the characteristic function of the multivariatenormal
Show
N(m,P)is given by

Solution

Fromthe definition of the characteristic function we are required to


evaluatethe integral
00 eitTx

(xm)

dx

Changing
the variable of integration to z = x m gives
e itTm

(2TT)n/2(detP)1/2

00

Probability, Random

Variables, and

366

Theorem 1.16 it can be factored as


by
p
definite,
the variable of integration
positive
changing
and
is
SinceP
to
QA-IQ T,
-1
=
P
so
T
gives
QAQ Tz the integral
Q in
-(1/2)zTp-1Zdz
itTze

---+1
det(Q)
that
after noting
lin gives

div

since Q is orthogonal. DenotingtTQ

n 00

J j dw

I] F 2Aje
exp ( (1/2)

=
to
in which we used (4.95)

evaluate the integral. Noting that

b2J)
b22t

gives
t TQAQ Tt = t TPt

(2 T) n/2(detp)

(4.18)

Substitutingthis result into the characteristic function gives


WE(t) = e itTm(1/2)tTPt

which is the desired result.

Example 4.4: Marginal normal density


Given that

and

are jointly,normally

distributed with mean and

covariance
x

showthat the marginaldensity of

rameters

PX

Pxy

Pyx Py

is normal with the followingpa(4.19)

Multivariate

Density Functions

367

4.3

a first approach to establish (4.19),we could directly


solt1ti0
variables. Let R x mx and = y
Mett10 the y dimensiOn of the and n variables, my, and nx
respectively, and
tegratebe the
ill
the definition of the marginal density gives
Then
lly
rlY -Jr
alid IIS
11

(X)

n/2 (detP)1/2

(2TT)

PX pxy -1

exp

approach, we'll also need to use the matrix inversion


followthis as an exercisefor the interestedreader.
Thisis left
we use the previouslyderived
In the second approach,
2.
Method characteristic function of the multivariatenormal and its
liltsof
the characteristic function of the joint density is given
First
marginals.

exp i

Y] m

- (112) tx
ty

ty

O to compute the characteristic function of E's marginal


Settingty
gives

= exp i [tl

-(112)

Butnoticethat this last expression is the characteristicfunction of a

normalwith mean mx and covariance PX,so inverting this result back


tothedensities gives
1

e (112)

Summarizing,since we have already performed the required inte-

gralsto derivethe characteristic function of the normal, the second


aPproach
saves significant time and algebraic manipulation. It pays

Probability, Random Variables, and


368

one time, "store" them in the


them whenever possible, such charac.
off to
reuse
then
and
as here
teristic function,
marginals.
integrals
do the required

when deriving

of random variables.
Functions
4.32
we need to know how the density of a random
applications
In many
a function of that

random variable
the density of
to
related
is
variable
variable
random
the
into the random
be a mapping of
Let f : Vt -4
that the inverse mapping also exists
variable17,and assume

p; (x), we wish to compute the densityof


Giventhe densityof E,function f. Let X denote an arbitrary

regionof
pn(y), induced by the
set
the
define
Y
and
as
E,
variable
the transform
thefield of the random
function f
of this set under the

e X}

Y = {yly =

such that
Then we seek a function pn(y)

pn (y) dy

(4.20)

for every admissible set X. Using the rules of calculus for transforming
a variable of integration we can write

det f -l (y)

dy

(4.21)

I is the absolute value of the determinant

in which

of the Jacobian matrix of the transformation from


(4.21)from (4.20) gives

Pn(y)

det

f-l(y)

to

dy = 0

Subtracting

(4.22)

Because(4.22)must be true for any set Y, we conclude (a proof by

contradiction is immediate) 3

Pn(y) =

det

f-l(y)

(4.23)

SeeAppendixA for various notations for


derivatives with respect to vectors.
3Somecare should be exercisedif
one has generalized functions in mind for the
2

probability density.

Multivariate
4.3

d tilen

Density

Functions
369

Nonlinear transformation
4.5:
function of the random variable under the
transfornormally distributed
N(m,
02).

for

f %ioll

is invertible and we have that


trosformation
= (1/3)7
gives dE/dn
vative

defl

pn(y) 3 21T

2/3 exp (

= n1/3.Takingthe
m)2/2)

transformations. Given random variables having


n
Noninvertible
components
7k)defined by the transformation = f
(E)
variables

We

= fk)
to find pn

vector
bythe

in terms of PE. Consider the region generated in Rtt

inequality

X(c), which is by definition


region
this
call

X(c) = {xlf(x)

c}

necessarily simply connected. The (cumulative) probNotethat X is not

distribution(not density) for


ability

Fn(y) -

then satisfies

p; (x)dx

(4.24)

Ifthedensitypn is of interest, it can be obtained by differentiating Fn.

4.6: Maximumof two random variables


Example
twoindependent random variables,
Given
and the new random
variable
definedby the noninvertible, nonlinear transformation
= max(E1,
Show
that Il's density is given by

Pn(y)=

(y)

(x)dx +

(y) J

(x)dx

Probability, Random Variables, and


370

X(c)

Figure 4.4: The region

X(c) for y =

c.

Solution

generated by the inequality y


X(c)
region
The
Applying (4.24) then gives

c is

sketched in Figure 4.4.

p; (Xl ,

)dX1 dX2

the probabilitythat
whichhas a clear physical interpretation. It says
variables is less than some
the maximumof two independent random
valueis equal to the probability that both random variables are less
than that value. To obtain the density, we differentiate

POO')= PEI(y) Fk2(y) + FEI(y)p2 (y)

PEI(Y)/

p2(x)dx +

(y)/

(x)dx

43.3 StatisticalIndependenceand Correlation


From the definition of independence, two events A and B are indepen-

dent ifPr(A n B) = Pr(A) Pr(B). We translate this definition intoan


equivalentstatementabout probability distributions as follows. Given
y, then
random variablesE,0, let event A be
x and event 'B be

Multivariate
4.3

Density Functions
371

y. Bythe definitions
of joint
distribution, these events have probabilities:and marginal
x and

Aroility

Pr(31n 6B)=

two random variables


are STATISTICALLY
depe that the
if
independent this relation holds
INDEPENfor all x, y
wesayorsimply
=
PENT
all x, y
(4.25)
4.2 for the proof that an equivalent condition for
statistical
can be stated in terms of the probability
densities
independence
instead
distributions
of

all x, y

that the densities are defined. We say two


and n,

are

UNCORRELATEDif

(4.26)

randomvariables,

cov(E, n) = 0

4.7: Independent implies uncorrelated


Example
and are statistically independent, then they are
uncorprovethat if
related.

solution

of covariance and statistical independencegives


Thedefinition

4.8: Does uncorrelated imply independent?


Example
Let and rl be jointly distributed random variables with probability
density
function
{[1 +xy(x 2 y2)], IXI < 1, | yl < 1,
0,
otherwise

Random Variables, and Estimation


Probability,
372

0.5

0.25

-1

uncorrelated random
density function for the two
Figure 4.5: A joint
variables in Example 4.8.

and pn (y). Are


(a) Computethe marginals p; (x)

and

indepen-

dent?

(b) Computecov,n). Are and

uncorrelated?

(c) What is the relationship between independent

and uncorrelated?

Areyourresults on this example consistent with this relationship?


Whyor why not?

Solution

Thejoint density is shown in Figure 4.5.

(a) Directintegrationof the joint

density produces

2' IXI < 1

Pn(y) = E, Iyl<l

43

Multivariate

Density Functions
373

performing the
term gives

double integral for the

expectationof

xy +

Y2)dxdy

-1

and the

covariance of

and

cov(E,n) =

the product

is therefore

uncorrelated.
and and r) are
independent implies uncorrelated.
This example
(c)Weknow that
does not contradict that relationship. This example shows uncor-

related does not imply independent, in general,but see the next


examplefor normals.
Example4.9: Independent and uncorrelated are equivalent for normals

Iftworandom variables are jointly normally distributed,

Px Pxy

my ' Pyx

Prove and are statistically independent if and onlyif and are


uncorrelated,or, equivalently, P is block diagonal.
Solution

Wehaveshown already that independent implies uncorrelated for any


density,so we now show that, for normals, uncorrelated impliesindependent.Given cov(E, n) = 0, we have

detP = detPx detPy

Probability,

Random Variables, and

Estimation

374

so

can
the density

be wTitten
1

(nx+ny)(detpx detPy

(2TT)2

eXP

For anyjoint

know that
normal,we

so we have

2 S?

py

(4.27)

x TPx I x

exp

exp y TPy 1y

and combining
product
the
Forming

S,

the marginals are simply

(detPx)1/2
= (27T)nx/2

pn(y) =

-1

terms gives

(27T)(nx+ny)(detpx detPy
exp

the inverse of a blockthis equationto (4.27),and using


Comparing
and are statistically indepen-

that
diagonalmatrix,we have shown
dent.

4.4 Sampling
variance
Letscalarrandom variable have density p; with mean m and
xn. By
P, and consider n independent samples of E, denoted Xl, x2, .. , ,
independentsamples,we mean that the joint density of the samplesis
the product of the marginals, which all are identical and equal to p;
(Zl,

,zn) =

(Zl) pxn (zn) =

ampling
5
4
4,

375

Transformation
Linear
4.4.1
about the linear transformations
facts
following
of random
random variable e
variRn with density
variance of random variable
mean and
= A;
var(n) = Avar)AT

(4.28)

formulas as follows. Using the


these
definitionof expecestablish

have
tation,we

that

the definition of variance, we have that


Using
var(n) = var(A9

(Ax (Ax))(Ax

AJ

(x (x))(x

= Avar)A T
Withnormals, we often wish to check if the variance is positive definiteafter a linear transformation. Let P e Rnxn be positive definite
be an arbitrary matrix. The followingresult is often
andA e R
APAT > 0. See
useful:P > 0 and A's rows linearly independent
alsostatement 5 in Section 1.4.4.

Singularor degenerate normal distributions. It is oftenconvenient

toextendthe definition of the normal distribution to admitpositive


semidefinite
covariance matrices. The distribution with a semidefinite
covariance
is known as a singular or degeneratenormaldistribution
(Anderson,
2003, p. 30). Figure 4.6 shows a nearly singular normal
distribution.

Toseehow the singular normal arises, let the scalar random variable
bedistributed normally with zero mean and positive definitecovari-

Random Variables, and


Probability,
376

p(x)

exp

+
1/2(27.2xf

+ 73.8x;

0.75
0.5
0.25

Figure 4.6: A nearlysingular

ance,

normal density in two dimensions.

N(O,PX),and consider the simple linear transformation

in whichwe have created two identical copies of

nents

and

for the two compo-

of n. Nowconsider the density of n. If we try to use

the standard formulas for transformation of a normal, we would have

Py = APxAT and Py is singular since its rows


are linearly dependent. Therefore one
of the

eigenvaluesof Py is zero, and Py


is positive semidefinite and
not positivedefinite.
Obviouslywe cannot use (4.12) for the density
in this casebecause
the inverse of Py does not
exist. To handle these
cases,we first provide
an interpretation that
remains valid when the
covariancematrixis
singular and semidefinite.
Definition4.10
(Densityof a singular
normal). A singular joint normal

to ex

theappe

*stcompute
the

sampling

density

377

of random variables l,

e Rn2,is
denoted

0. The density is defined by


1

(21T) n1 / 2 (detA1)1/2

p; (Xl,X2)

exp --(XI

In this limit,

-ml)

the "random" variable

(4.29)

becomes

mean nt2. For the case ni = 0, we deterministicand


equalto its
have the completely
(x2) =
in which
(X2 m2),
degeneratecase
= nt2, and therewhich describesthe
completelydeterministic case
is no random
that
by
performing
Notice
the
required integrals component
of (4.29)
densities
are
found
to
marginal
be
thetwo
1

(2TT)n1/2 (detA1)1/2eXP

Q(XI

ml)

= (X2nt2)

4.11: Computing a singular density


Example
consideragain the motivating example with the unit normal scalarran-

domvariable

N(O,

= 1 and the linear transformation


1
1

UseDefinition4.10 to express the density pn for this case, and drawa

figureshowingthe appearance of pn.


Solution

Wefirstcompute the eigenvalue decomposition of the semidefinitecovariancePy

andobtain

Py = QAQT

1 1-1 -1
1

Random Variables, and Estimation


probability,
378

pn(y)

singular normal
The
4.7:
Figure
deficientA.

resulting from y = Ax with rank

variable transformation
invertible
the
Nextwe define
= QTn
the covariance
and we can write

of C, Pz, as

4.10. Using that definition gives the


Definition
of
form
the
which is in
density for
1

Finallytransformingback to the variable


Zl =

(Z2)

this q
samplesincreases,

using
1

(Yl + Y2)

andnoting(ax) = (l/a)(x) gives


1 (Yl + Y2)2 (Y1
2

Y2)

To draw a sketch, first we note that pn (Yl, Y2) = 0 for Yl * Y2because


of the delta function. So we have a singular normal defined in the plane,
and the density is nonzero on the line
= Y2. Therefore take a zero

ofmdomvariab
the differe
and

towar

Ofsqu

44

sampling

unit variance
=
to the

379

normal defined on the


axis, and
line, and that is the
joint density rotate it by 43
for

res
Tile

expanded definition of normal distribution enables


important result that the linear transformation us to generthat it holds for any linear transformation, of a normalis
so
normal,transformations such as the A matrix
includingrankgiven
above in which
deficient independent We state this
result
the
as
not
the
following
rowsare
Exercise
to
4.24.
theorem
the proof

anddefer
4.12 (Normal distributions under linear transformation).
eorem
1b
Convariable
normally distributed random
e RI ,
a
sider
covariance PX O and an arbitrary
linear transforwithsemidefinite and transformed
random variable n
R
e
A
= AE. Then
mation with Py APxAT O.

4.4.2 Sample Mean,

Sample Variance, and Standard


Error

in applications we do not obtain nearly enough samplesto obUsually


tainconvergenceto the entire density, and we settle for convergence
toa fewlow-order moments of the distribution, such as the mean and
variance.The SAMPLEMEANis defined as
1

andwe expect this quantity to converge to E's mean as the number of

increases. Indeed if we take expectations


samples

'E(Xi) =

=m

whichmeans that the sample mean is an unbiased estimate of the mean


ofrandomvariable for all values of n. An estimator's BIASis defined
tobe the difference between the expectation of the estimator and the
truevalue,and an estimator is termed UNBIASED
if the bias is zero.
Next,toward defining an appropriate sample variance, we consider
thesumof squares of the samples' differences from the samplemean

Probability,
380

can
Rn)2,which
-- Sit)

Random

Variables, and Estimation

as follows
rearranged
be

Sn

m))

2)

2 2(Xi

--

2n(Rn --

2
+ n(kn m)

2
2 n(Rn ---m)

expectation
Takingthe

gives

(Sn) =

var(x ---nvar(Rn)

1, . , n, and to compute the vari=


i
all
for
P,
=
Weknowvar(Xi)convenientto first determine the variance of vecanceof in, it is stackingthe samples together in a column vector
tor X, obtainedby
mutually independent, we
xn Sincethe are
form
Pij, i, J = 1, , n or in matrix
=
that
have
var(X) =

Using = AX with A = (l/n) [1


(4.28)gives

var(Rn)= Avar(X)A

I and the second part of


1

nP

Substitutingthese into the equation for expectation of Sn gives

Sohere we notice an interesting outcome; if we want to obtain an unbi-

ased estimate of the variance, we


should define the SAMPLEVARIANCE

45

central

as sti

Limit Theorems
381

Sn/(n 1) to obtain
Sn

explainsthe somewhat mysterious definitionof


samplevariance
divisionof the sum of squares by n 1
instead
anticipated. We show later
of n, which
mighthave
that
onemaximum-likelihood
division
estimate of the variance,
by n gives
which
the
is also a good
converges to p as n 00.
estimatebecause it
Although
estimate is not an unbiased estimatefor the maximumlikelihood

finiten, the bias


zero as n 00.
to
decreases error is the standard
deviationof the
standard
samplingdistribuFor
example,
in
estimator.
the scalar case,
an
of
tion
if we consider
above to be an estimator of the mean,
the
mean
sample
we
have
worked
variance of the sample mean is var(n) =
outthat the the STANDARD ERROR
(1/11)2,and

OF THE MEANis

therefore

SE(Rn) =

Whenthe standard deviation of the random variablebeing sampledis


alsounknown, people sometimes replace in the previous expression
an estimate of it, such as the square root of the samplevariance
Fn. Wethen have
SE(n)

Thisquantity does provide a rough measure of the uncertaintyin


dueto the finite sample size. But if we want to say somethingprecise
aboutthe uncertainty in the sample mean as an estimate of mean,we

mustcalculatea true confidence interval for that estimate.Weshow


howto calculate confidence intervals in the discussionof maximum
estimation in Section 4.7.
likelihood

4.5 Central Limit Theorems


Centrallimit theorems are concerned with the followingremarkable
observation:if we have a set of n independent random variablesXi,i =
l, 2,... , n, then, under fairly general conditions, the density py of their
sum

. . . + Xn

Probability,Random Variables,and

382

best to illustrate this observation with a


perhaps
is
It
normal.
example.

10 uniformly distributed random


Example 4.13: Sum of
Variables
independently
distributed
and
random
Consider10 uniformly
variables
random
variable
new
y,
a
which is
Xl(). Consider
the s
of the 10 x random variables

Whatis y's mean and variance? Draw samples of the 10 Xi


random
variables,and compute samples of y. Plot frequency distributions
x and y. Eventhoughthe 10 x random variables are uniformly of
dis.
tributed,and theirprobability distribution looks nothing
like a normal
distribution,discusshow well y is approximated by a normal.
Solution

The x random variables are distributed as x

Px(x') =

U (O,I), which
means

0 otherwise

(4.30)

Computingthe mean and variance gives


1

var(x) =

(x

(1/2)) 2dx 1

12

If we stack the x variables


in a vector

we can write the y

we have that y's

X10

random variable as

Y = Ax

the linear transformation


of x

A = [1 |

mean and variance

are given by

var(y) = Avar(x)AT 5
=
6

central Limit

Theorems

central limit theorem is in force

if
ill

Withonly
10

N (5, 5/6)

383

random
variables

histogramof the 10,000 samples of


It is clear

and y are
that even 10
shown
uniformly
produce nearly a normal distribution
distributed x in Figfor their
random

4.8and 4.9.

ores

sum Y.

Identically distributed random variables


4.5.I
independent random variables,
Xi, i
considern
having mean u and
distribution
identical
variance
of the sum Sn = Xl + X2 02. we are interested
distribution
+
inthe
are
independent,
the
since
the mean and
large.
varianceof sn
by
are
given
n

i=l
n

var(Sn) =

i=l

var(Xi) = n2

00,Wefirst rescale
we want to take the limit as n
Since
the sum
tokeepthe mean and variance finite. Given the formulas for shifting
meanandvariance we choose Zn = (Sn nu)/
and obtain
(Zn) =

var(Zn)

n2

var(Sn) = 1

Theorem4.14 (De Moivre-Laplace central limit theorem). Let Xi,i


l,2,

,n be independent

and

identically distributed with mean u and

variance then Zn tends to the standard normal N(O,1) as n -+00.


Proof.In keeping with Laplace's approach to the problem, we shall use

characteristic
functions to establish this result. Weshallfinduseful
of
thefollowing
bound on the error in the Taylor series approximation

theexponentialwith a purely imaginary argument.

(ix)tn
m=0

(4.31)

Probability,Random

Variables, and

EstithQtion

384

600
500
400

nx 300
200
100

0.1

0.3

0.2

0.4

0.5

0.6

0.7

0.8

0.9

of 10,000 samples of uniformly distributed x.


Histogram
4.8:
Figure

1600
1400
1200

1000

ny 800
600

400
200

Figure 4.9: Histogramof 10,000 samples of y =

i=l

Theorems

central

Llmit

385

simple to establish (see Exercise


is
ound
4.53). we
Willuse it
With
ix
x2
+ O (l
X 13)

(4.32)
denotes
that
the
size
o(lx13)
of the error
13.
time
we first show
term in (4.31)
some constant
that the
is

characteristic

t variance.
for Yi's characteristic function

and obtain
a series

tpansion

(x)dx
(1 + itx (1/2)t2x2 + Ix13
= 1 + i(Yi) +
- 1- (1/2)t 2 + 0(lt1 3)
that here we have assumed (lYi13) is finite, so that it can
Notice
be
into the O(lt13) term. Next, since Zn
(l/vfi)
absorbed
Yi,we
and (4.8) that
(4.7)
from
have

= (1 (1/2)t2/n+

vzn(t) =

Intakingthe limit as n
droppedto obtain

oo the last term is negligible and


can be

wzn(t) = n-.oo
lim
nlim
00
the calculusresult that
Using
gives

lim00Qzn(t)
n

(1
+ ax)l/x = ea withn = l/x
= e-(1/2)t2

Thefinalstep, which unfortunately requires the most effort, is to show

thatifthe characteristic function converges, then the randomvariable


alsoconverges(in distribution). Assuming this is true, we then have

nlim00zn
andtheresult is established.

N(0, 1)

Random Variables, and Estimation


Probability,
386

(Durrett, 2010, pp.114-116)


assumed here
(lYi13)
of convergence in distribution
absolutemomentjustified the claim function. However, the
argUment
havenot in characteristic
Andweconvergence
functions prove so useful. In
characteristic
plied by illustrate why
much more general approach that is not
a
does nicely
pursue
we
function, so we content ourselves to leave
the next section
characteristic
based on the

this proof here.

,0fO

distributions
variables with different
Random
4.52
and Laplace is already a SPectac_
theorem of de Moivre
it is not a compelling reason
Thecentral limit
result. But as it stands,
mathematical
system would be well
ular
unmodelednoise in a physical
to assume that
how
all,
would we deduce
distribution. After
representedby a normal
random effect in a physical system is the result
that someunmodeled
random causes, all of which have iden_
of manydifferentindependent

limit theorem runs deeper. We next


tical distributions? But the central
are identically distributed. This verremove the assumption that the Xi

sion of the centrallimit theorem was developed by Lindeberg (1922).

Weconsider the following conditions on the Xi variables.

the
lion to

Assumption 4.15 (Lindebergconditions). Consider independent randomvariablesXi, i = 1,2, , n satisfying (Xi) = 0 and var(Xi) = 2
(42.The following two conditions hold as n
and let s; =
00
(a) Sn

00

ago

(b) For every > 0, -5

f(Xk2; IXkl > sn)

CLTI

Thenotation (Xk2; IXkl > sn) is shorthand for


taking expectations
of the truncated random variable

f(x 2;

nt

variab
random
flu

> 0, and satisf

> a) =

Noticethat the definition


implies

that (X2; IXI > a)+(X2;


IXI a) =
var(X).Manysufficient
conditions
for the central limit theorem have
been proposed over
the years, but all were
superseded by the Lindeberg
conditions,whichwere
also shown to be
necessary (Feller, 1935; Lvy,
(1998

central
4.5

Limit Theorems

conditions. We have the following


4.16 (Lindeberg-Fe11er central
var(St)

theorem
limit

theorem).

satisfying Assumption
4.15. The

lim sup IFzn(x) ---

With

denoting
consider in-

and
normalized
sum Z

=0

this theorem is given in


Theproof of
Section4.9.

45.3

Multidimensional central limit theorems

Thecentral limit theorem (CLT) can be extended to


e Rd. Consider first
vector-valuedranvariables,
independent,
1, 2, ... , n random
Xi, i
variables with identicallydistributed (IID)
(Xi) =
var(Xi)= E. We assume that > 0 is positive

result.
following

definite.Wehave
the

Theorem 4.17 (Multivariate CLTIID). Let vector-valued


1, 2, ... , n be independent and identically random variablesXi, t
distributed with
(Xi)= u andvar(Xi) = E. The normalized sum

Zn = (l/vl)
distribution
in
to
the
normal
converges
N(O,E).
u)

(Xt-

Again,the IID version is a special case of a more general


version
thatassumesa generalization of the Lindebergcondition.
Theorem4.18 (Multivariate CLTLindeberg-Feller). Considerindependentvector-valued random variables Xi, i = 1, 2, . , n with (Xi) =

andvar(Xi)= Ei > 0, and satisfying the followingconditions


as n

(b)Forevery > 0,
Thenthesum Zn =

( llXi112; llXill > E) -+0


(Xi li) converges in distribution to the normal

Seevan der Vaart (1998, pp. 20-21) for further discussionof this
case.Theorem 4.18 is the mathematical basis for the commonphysical
modeled
aSsumption
that noise in process measurements is oftenwell

Probability,Random Variables, a

388
by a zero mean

normal distribution. The variance

(2

fid

EstihQti0h

often

can b
minedby examiningsamples of the measurement, Which is
tantpart of the process modeling task that is often overlooked
Finally,the history of the term "central limit theorem"
teresting.Apparentlycoined by Poly in 1920 (in German Is also
z
entrap;

eac

bution,wherethe distributionconverges quickly

pared to the tails of the distribution, where the as n increases


convergence
corns
slower(LeCam,1986). Le Cam's article is hi
is
much
ing for anyoneinterested in the fascinatin g
recommended
history of the
theorem.
central reads
linut

4.6 ConditionalDensity Function


and Bayes's
Let and
cific

be jointly distributed
Theorem
random variables
(x,y). we seek the
mth
density function
of given Probabihty
y of Ohas been
observed. we
define the that a
conditional
PE/0(x/y) =

P7(y)

Weexp

Considerarollofa single
die in which
whethertheoutcome
is
even or odd takes on values E or
die. The12
and is
O to
valuesof thejoint
density function the integer value denote
of the
are simply
computed

1/6

p,o (3, E)

1/6

pe,o (5, E)
p,o (6,
E)

1/6
Themarginal
densitiesare
then easily
whichgives
by

g across

p(x)

Pm;(011)

1/6

earesure the die


O

(4.33)

tecondional
density
This
factleads to a

computed;

rows of
(4.33)

we have

biddensity,
which

for

Density Function and Bayes's Theorem


ditjonal
4.6

con

we have

SIOjlarlY

389

for n

pn(y)

summing
givesby

down the columns of (4.33)

accordance of our intuition on the rolling


in
both
of the die:
are
1 to 6 and equal
value
each
for
probability
probabilityfor an
Tllese
.
outcome.
an o dd
is a different concept. The
or
fen the conditionaldensity
conditional

(xly) tells us the density of x giventhat n = y has been


considerthe value of this function
so
observed.

probability that the die has a 1 giventhat we know


tellsus the expect that the additional
information
We
thatit is odd. us to revise our probability that it is on the die
from 1/6 to
oddcausesdefining formula for conditional
being
densityindeed gives
1/3.Applyingthe
116

112

the reverse question, the probability that we have an odd


consider
thatwe observe a 1. The definition of conditional densitygives
given

= 116

pn(011)=

i.e.,weare sure the die is odd if it is 1. Notice that the argumentsto


theconditionaldensity do not commute as they do in the joint density.
Thisfactleads to a famous result. Consider the definition of conditionaldensity,which can be expressed as

or

Because

anddeduce

= pny,x), we can equate the right-handsides


(4.34)

pn(y)

Probability,Random Variables, and

390

theorem (Bayes, 1763). Notice


whichis knovsnas Bayes's
that
sult comesin handy whenever we wish to switch the varia thi
ble

knownin the conditionaldensity, which we will see is a


thatis
key
state estimation problems.
step

Example 4.19: Conditional normal density

Showthatif and are jointly normally distributed as


PX Pxy
mx

my ' Pyx Py

thenthe conditionaldensity of stgiven is


also normal

P" (xly) = n(x, m, P)

in which the mean and covariance

m = mx +

are

my)

Solution

(4.35)

P = PX --PxyPy-1Pyx
(4.36)

Thedefinitionof
conditional

density gives

PEIQ(xly) = PE,t7(x, y)

Because(E,7)is
jointly normal,

PEIQ(xly) =

in which
the

from Example
4.4

PO(Y) = n(y, my,


Py)

and therefore

Substitutingin
the

we know

definitionof

(2n)n/2

argument of

det
the

x
the normal
xy

density from
(4.13) gives
1/2 exp
(1/2)a)
(4.37)

exponent is

xy -1

(4.38)

in

conditional Density Function and


Bayes's
46

Theorem

P PX pxyPIPyx as defined
use
If
in (4.36)
inversion formula
artitjonedmatrix
to express
then we
can use3thle
the matrix
{4.38)as
inversein
Pxy

-PilPyxp-1

Pyx

substituting this

expression into (4.38)


and

multiplying

out terms

yields

expansion of the following


whichis the
quadratic term

[(X Inx) - PxyP1(y -

P-1 [(x -

mx)

we use the fact that Pxy = PIX.


inwhich yields
Substituting(4.36)
into this
expression

noting that for the partitioned matrix


Finally
det

PX Pxy

detpy detp

andsubstituting the two previous equations into (4.37)yields


x
PX pxy
x
n
Pyx

Y ' my

n (y, my, Py)

or

= n(x,m,P)

(4.39)

= n(x,m,P)

whichis the desired result.

Example
4.20: More normal conditional densities
Letthe joint conditional

of random variables (A, B) given C be a normal

distribution
with the following mean and variance

blc) = n((a, b),m,P)

ma

(4.40)

J get;

Random Variables, and


Probability,
Estimati0h
392

the
showthat

with mean

conditional

density of A

given B and C is also normal

c) = n(a, m, P)

given
and variance

by

Solution

density we have that


joint
of
Fromthe definition

the top and bottom of the


Multiplying

fraction by pc (c) yields

b,c) pc(c)

Pc(c)
or

PBlc(bIc)

Substitutingthe distribution given in (4.20) and using the result


in
Example4.4 to integrate over a to obtain the marginal pmc(blc)
blc)da yields
f

a ma
Pa Pab
'
b
' Pba Pb
n(b, mb,Pb)

Nowusing (4.39) and (4.36) gives

m = ma + PabPJl (b
and the result is

= n(a, m, P)
nib)

P = Pa PabPJ lPba

established.

4.7

Maximum-Likelihood
Estimation
we now
turn to one

of the most
terminemodel
basic problems
in modeling: how to deparameters
methodsto
from
experimental measurement. Finding
solveparameter
estimation problems
has had a significant

41

Maximum-Likelihood Estimation

development of
mathematics
acton the

393

that we wish to use to explain


model means simply that y someresponse
aleS1'1near
ox in
wish to determine
which Oisvariables
etersthat we
from
measurements a set of
of X. we often intend to use the identified
of y for
parametersto given
make
thex variables to find the conditions that
Thisapproach may save considerable timemaximizethe
and expense
alternative of trial and error experimental
compared
adjustment
of the x
variables.
finding
the
to
"best"
Inaddition
parameter estimate,
our
uncertainty
we would
in
quantify
the estimate.
liketo
also
Modeling
as
a
data
random
the
the
variable
taintyin
with some fixed uncerthe
key
of
methods
that we can use
is one
probability
density
to
solve
this probuncertainty in measurement leads to uncertainty
lem.
in estimate,
and
sfipulatingthe structure of the measurement uncertainty
allows
us to
find(exactlyin some cases) the uncertainty in the estimate.
thecentrallimit theorem, our first choice for modeling Becauseof
uncertaintyin
is the normal distribution. We then
measurement
have the model

inwhiche is assumed normal and zero mean. The effectof nonzero


meanis assumed to be included in O as additionalparametersto be

estimated.

Thesix canonical linear estimation

problems.

We next look at the

sixversionsof this problem that result from assuming(i)y is a scalar


orvector,(ii) O is a vector or matrix, and (iii) whether we know the mea-

surementerror variance, or if it has to be estimated from the data. The


variable
x will be a vector throughout. The goal in each problemis the
same:find the optimal parameter estimate by maximize the probabilityofthe data, and quantify the estimate's uncertainty, for example,by

determining
confidence intervals. The first five estimationproblems

haveanalytical, closed-form solutions. Number six requires iterative,

andthe
nUmerical
solution for both the optimal parameter estimate
measurement
error covariance estimate.

394

Probability, Random

4.7.1 Scalar Measurementy, Know


n

Variables
, and

Esti hQ

Measurement
Variance

e Rnpis the np vector of ermronmental


Consider n

conditions for the

ith,

1 samples. The probability


density of the
set of

distribution for the measurement error

(27T)n/2nexp

Taking logarithm gives

In 27T + n Ino +
2

This equation is easier to express if we first stack

the

in a vector
and

giving

1
Inp(ylO, a) = U ln2TT+ n Ina + G(y
-XO)
We define the the log of the likelihood as a function of the parameters
O and (Twith the data y regarded as fixed values
1

L(O,(T) = U ln2TT+ nln + G(y

(4.43)

2,
Because we assume that we know the measurement error variance
to
the only unknown in this first estimation problem is O. Therefore,
find the maximum-likelihood estimate, we maximize L(O,U) bydifferentiating with respect to Oand set the result to zero

= x T(y -XO)
O= A x T(y -XO)

(4.44)

47

Maximum-Likelihood Estimation
395

O = (x TX) -1 x Ty

delaya discussion of what to do when X does


not have
until Section 4.8. But we know from
full column
Chapter
1
unique,
and
we have a linear
that the
estimateis not
optimal
subspace
This situation is depicted
of estimates
optimal.
in Figure
allare
that

1.6(b).
of
density
parameters
and parameter
probability
item
of
interest
next
is
the probability confidenceinterval. The
densityof the
be the parameter generating
estimates. Let 00
the
measurements
+
e.
00
Then
X
we
have
so the
modelis y =
= (x Tx) -1 x Ty
e

N(O, 21n)

Usingthe result on linear transformation of a normal, we have

O N(00,

(4.46)

Asshownin Exercise 4.21, for a random variable e


distributedas
normal
with
mean m and covariance P, the probability
a multivariate

that takes on value x inside the ellipse

(x m) TP 1(x m) < b
is given by

y(np/2,b/2)
r(np/2)

inwhichthe complete and incomplete gamma functions are definedby

(Abramowitz
and Stegun, 1970, p.255-260)
F(np) =

t np- l e -t dt = (np 1)!

y(np,x) -

tnp-le-tdt

Thefunction X2 inverts this relationship so we have that

Probability, Random variables, and


396

for

defining
into the equation

(x --

the ellipse

m)

x2(np, cx)

mean and
in the values of the region covariancegives
substituting
for the
Finally,
elliptical confidence
maximum.
a-level
the following

likelihoodestimate

xTx

T
( 00)

(O 00)

x 2(np, a)

(4.47)

parameter vector, the elliptical region is


For a large-dimensional these cases we may wish to approximate

the
bersometo present. In
box
bounding
that
smallest
the
contains
regionwith
the
confidence
4.15, this box is given by

ellipse. As shown in

Exercise

|-oo/

1/2

(x 2(np,

limits with the following


whichis commonlyreported as plus/minus
notation
in which

= (x2(np,

1/2

Note that the parameter uncertainly interval does not dependon


the measurementsamples when we know the measurement error
variance.Wecan compute c before we do the experiment, based solely
on the chosen Xi. Only O depends on the experiment. And if we do an
increasing number of experiments, X TX = Ei=l XiXiTincreases linearly

withthe numberof samples n, so the confidence interval c decreases

as n -1/2. So one method to reduce uncertainty in parameter estimates


is to replicate experiments.

Marginalparameter estimates. Another way to condense themulti-

variate density is to compute its marginals. Since e is distributed as a

normalin (4.46),we compute marginals


as in (4.19) giving

kelihood Estimation

41

Maximum-Ll

397

0(-levelconfidence levels
compute
on each of the
then
np unigiving
normals
O = 00
srariate

bich

= (X 2 (1,

that the

and

formulas are different. The first


is the

boundmultivariate
0(-level
true
confidence
region;the secfor the
of the cx-level

confidence intervals for all


a collection
simply
is
multivariate estimate. Let's call this latter regionthe
the
marginalsof to distinguish it from the bounding box. Students the
box"
often
"marginal it is difficult to present a high-dimensionalellipse,whichof
ask,"Sinceplus/minus results should be reported as the
confidenceinthesetwo research presentation?" This questionhas no satisfactory

tervalin a

is to know

and communicatewhatyou
The important point
The bounding box certainly contains more than the
reporting.
are
region in its interior.
since it contains the true
probability
level
box does not have this property. The interpretationof
marginal
The
box is the same as the interpretation of any marginaldenmarginal
the
obtained many samples of the parameter estimatesfrom
sity.If you the ith interval of the marginal box would contain an
manydatasets,
all the different samples of the ith parameter estimate.
levelfraction of
about the probability of the jointly distributedparameter
statement
No
this characterization. We include the following
from
follows
estimate
these distinctions.
clarify
help
to
example
confidence region, bounding box, and marginal
Example4.21: The
box

random
Assumethat the two-dimensional
N(m,P)with

variable is distributedas

(a)Plotthe multivariate density.


densities,
(b) Compute and plot the two marginal
fidenceintervals.

and their 95%con-

box, and plot


marginal
the
and
(c)Compute the bounding box
ellipse.
confidence
95%
density
alongwith the joint

them

Probability, Random

Variables, and
Estimation

398
0.6
0.4

0.2
0

0.6

0.4

0.2

024

-2

0.4

0.2

-2

-2

Figure4.10: The multivariatenormal density (top right). The two


marginal densities and marginal 95% confidence regions (shaded) (top left and bottom right). The joint elliptical 95% confidence region (shaded), bounding box
(outer),and the marginal box (inner) (bottom left).

(d) Take 1000independent samples of E, and determine the number


inside the ellipse, the bounding box, and the marginal box. Approximatelywhat confidence levels can you assign to the bounding box and marginal box?
Solution

(a) The multivariate density


is shown in the top right of Figure 4.10.
The 95%confidence
ellipse is given by

(x This ellipseis shown

- m)

= 5.99

in the bottom left of Figure


4.10.

4.7

Maximum-Likelihood Estimation

(b) The

399

two marginals are


1

The marginal densities of

and top left of Figure 4.10,

(1/2)

and

are

shown
respectively. The in the bottom right
intervalfor the

x 2 (1, 0.95) =
3.84

Xl e
[1.77, 3.771

= 3.84

e [0.614,
3.391

These intervals are shown as the


shaded regions in
the bottom
(c) The ellipse's bounding box is given by
2

x 2(2, 0.95) = 5.99

(1/2)

= 5.99

e [-2.46,
4.461
e [0.269, 3.731

The ellipse, bounding box, and marginal box


are shownin the
bottom left of Figure 4.10.
(d) Generating 1000 samples of

and counting the fraction of sam-

ples within each of the three regions gives

ellipse = 0.956
bounding box = 0.981 marginal box = 0.920

47.2 Scalar Measurement y, Unknown MeasurementVariance


Wenow consider the measurement error variance 2to be unknown.

Wehavethe same model as in the previous section


ei

N(O, 0 2)

Probability, Random Variables, and Estimation

400

is unknown, we maximize the likes


variance
measurement
and O and estimate
Whenthe
(4.43) over both (T
in
given
the same as in (4.44)both
is
derivative
lihood function
O
data. The
,and
quantitiesfrom the SNithrespect to (Tgives

differentiating(4.43)

L(O,(T)

I xT(y -XO)

L(0, (T) = _ Y + -3 (y

to
Equatingthe derivatives

XO)

zero and solving simultaneously gives

(xTx)-1xTy
1
= (y
n

(4.48)

-XO) (4.49)

maximum-likelihood parameter estimate is Unchanged


the
that
We see
case, and the maximum-likelihood estimate of
from the known variance
of the residual over

the varianceis the mean of

the square

the samples.

estimate of variance is close to but


Notice that the maximum-likelihood given by the formula
(for n np)
the sample variance s2

not equal to

n np
2

n np
Weshowsubsequently that the sample variance is an unbiased estimate
of 2so the maximum-likelihoodestimate of 2is biased. But this
bias is small for a large number of samples compared to parameters
n > np.
Giventhe same result for O as in the previous problem, the probabilitydensityof Ois unchanged from the previous problem. Wenext
For this it is convenientto
determinethe probabilitydensity of
first considerthe singular value decomposition of the X matrix. We
assume that this n x np matrix has independent columns so the rank
is np. As discussedin Chapter 1, a real n x np matrix with independent
columnscan be writtenas the product of orthogonal n x n matrix U
and orthogonalnp x np matrix V, and diagonal np x np matrix E

x = [UI U2
0

4.7

Maximum-Likelihood Estimation

relationships result

the
ill

U! UI

40)
from

orthogonality

In np
UITU2
0 npxnn
UIUIT + U2U2T
= 1n
VT

Onnpxnp

V V VT
value
singular
decomposition
Usingthe
(SVD)for X,
we find by

(x Tx) -1 x T =
1

substi-

Vs -I UT

= UIUT
= (J2U2

Theserelations allow us to express the estimate and


residualin terms
measurement errors as
of the
O - 00 = VE-l UTe

Y -XO = U2UTe

(4.50)

Usingthese relations we can express the followingquadraticterms


as
(0 - 00)T (xTx) (- 00) = eTUIUTe
(y

- XO) = eT U2U2Te

Theserelations provide an essential insight. The error e obviouslyaffectsboth quadratic terms, but its effect in the sum of the squaresof
theresidual (the sample variance) is through U2and its effectin the
parameterestimate's distance from the true value is through UI. Becausethese two matrices are orthogonal to each other, the effectof
themeasurement error is independently distributed in these two quadraticterms. We make this statement precise subsequently. First it is
helpfulto establish that the following two random variables, Zl, z2 are

independent
statistically

= UT e

Giventhat e

= U2Te

transformation of a
linear
on
result
the
and
N(O, 21)

normal,the pair

is distributed as

0
O

Intip

Probability, Random Variables, and Estimation

402

and the covariance is diagonal ,


and
We also know that their
Since
independent.
quadratic
z2 are statistically
as chi-squared
products are distributed
normal
the pair is jointly

Z2 Z2

Xnp

Zl

Xn-np

f,
ti

the chi-squaredand chi densities, and also


Exercise4.33 discussesX; is n.

of
shows that the mean
deduce quickly the earlier claim that the sarn_
Fromthat fact we can estimate. Summarizing our results
on
ple varianceis an unbiased
ple variance thus far
1

n -Tip

n np

(zT2Z2)

Taking expectation gives


2

f(s 2) - n np
2

n np
2

and the result is established.

As shownin Exercise4.3, if two random variables are statistically


independent,then all functions of the two random variables are also
and zlZ2 are
statisticallyindependent. Thereforewe know that
statisticallyindependent.The ratio of two chi-squared, statisticallyindependentrandom variablesis defined as the F-distribution

confli
Z(451)

xTx

n np

np)

np

The F-distribution can be shown to have density

hi*thesamplev
expre

r ( nt2m ) 1

(zn)nmm
(zn + m) n+m

41

Maximum-Lik

elihood Estimation
403

provide further discussion


of the F-distribution.
definitions of Zl, in terms of

4.35 and 4.45

the

00)

Oand

2
(O

(O 00)

give

x2n-n

00) T (x T x ) (O 00)
YIP)

(4.51)

provides the basis for the confidence


intervalson
lastdistribution
estimates. Summarizing our results so far, the
parameter
estimates and the measurement variancedensities
parameter
estimate
forthe
are

(4.52)

(4.53)

Xnnp

distributions are inadequate to construct confidence


these
that
Notice
estimated parameter Obecause they both depend on the
the
on
levels
measurement variance 2. One might be tempted to reunknoWn
02 in the normal density for with the maximumplacetheunknown
estimate and obtain the confidenceintervalsfor e from
likelihood
idea is in the right spirit, but is not quite correct. We
thatdensity.That
region by considering the distributionin
obtainthe correct confidence
Noticethat the ratio of the two quadratic terms has dividedout
(4.51).
O
thecommonterm 02. We define the function F (n, m,
toreturnthe argument of the cumulative F-distribution that achieves
value

(4.54)

Thenthe ellipsoidal confidence intervals for the parameter estimates


followfrom (4.51)
( e 00) T

xTx

F(np,n

(O 00)

Wecanuse the sample variance s2 in place of 02 in this formula to give


aslightlysimpler expression

xTx

(O

00)

up

Random
Probability,

we

the
can obtain

bounding

Variables, and Estimation

in the previous
as was done
intervals
box

section

1/2
TIP,

in which

and the previous case is that


between this on the measurements. The
difference
also depends confidence ellipse is there_
Thesignificant interval c the
confidence
of
the same; given
herethe as the center (0) interpretation remains
00,
which is not a
well
sizeas
the statisticalthe true parameter
But
forerandom. experiments, generated confidence ellipse for the
manyreplicated lie within the
c decreases with
randomvariable,

previous
denceas the
samples n

confidence interval
the
case,

number of

tbe

Parameters Correspondy, Different

Measurement Covariof
Measurements, Known
4.73 VectorDifferent
ing to
ance R

fremeasurement case. This case arises


vector
the
Wenext consider
linear models between a vector of
empirical
quentlywhen identifying
or response variables Yi. We
output
of
vector
a
type has its own
input variablesx and

considerfirst the case

vector of parameters
1

in which each measurement

sRisknown,W
Itis

perhaps

Rewritinf

describing it
1

el
(4.56)

weuse
dices,Taking th!

Yp i

givesa ma

ei
Theenvironmental
variableXi is assumed to have q components, Xie
Rq,and e RPxq,and we assume q < n. In this model we have
np = pq modelparametersto estimate. Notice that this model is not
restrictedto onlyp independent versions of the model given by (4.42),
Thegeneralizationallowedhere comes from the covariance matrix R.
Toreducethis case to the (4.42),
we would add the further restriction
that R = 21.Wewill see
that allowing the different measurements

ikelihood Estimation

41

Maximum-L

405

to be correlated does not prevent us from solving


this estiassc%elated).
variables

1, ... , n, and, given the


Xi,
we
have for the probability deterministic
(r)and the n
density of the

measurements

1
(2TT)np/ 2

taking
or,by

(det R)n/2

exp ( E (Yi 9xi)TR-1


(Yi

logarithm

nP In 27T+ IndetR +

E (Yi xi)TR-1 (Yi


ext)
i=l

the log-likelihood as a function


Weagaindefinedata Yi, i = 1, 2, ... n, regarded of the parameters
as fixed values
andRwith the
In 27T+

n
IndetR + E (Yi o
xi)TR-1 (Yi

SinceR is known, we take the derivative of L with respect to the matrix

It is perhaps easiest to perform this derivativeusing component


notation.Rewritingthe expression for L in components gives
= LZln2TT +

IndetR + (Yir rjXij)T

(Yis
(4.57)

inwhichwe use the Einstein summation convention for repeated indices.Takingthe derivative of scalar-valued function L with respect to
givesa matrix derivative
(+)mn
1

(YirPerforming
the sums over the deltas, noting R is symmetric,and collectingterms gives

R ms (Yis slXil)Xni

Probability,Random Variables,and

406

If we convert this back to

statement we have

the vector/matrix notation of the


n

Settingthis matrix to zero and solving gives the maximum-likelihood


estimate for the parameters G)
-1

in which we assume that the matrix E i XiXiThas full rank. Again,


discuss what to do when this rank condition fails later in Section we
4.8
Notice that the value of the measurement error covariance R
vant in the estimationof G)in this problem also. It is often is irreleto arrange the variablesso that the summation is performedconvenient
by matrix
operations.Arrangingthe data vectors in the following matrices

allowsus to express the maximum-likelihood estimate

= yx T xx T) -1

as
(4.58)

Next we determinethe probability density of


the estimated parameter 9. We denote the parameter value
generating the data as 90,
the measurements are given by
so

Y = 0X+E

E = [el

Theestimate and its transpose are


therefore

= 98 + (xx T)
Wefind the transpose
convenient because we now
matrix 9 Tin a vector
wish to stack the
giving
01

0 1 02

02

47

Maximum

-Likelihood Estimation

407

to both sides of the


transposed
vec T =

fromthe

form of the

+ (1

definition of E we see

el,l
el,2

el,n

ej,i jth measurement, ith sample

vecE
ep,l
ep,2

of these normally distributed random variables,


Giventhis arrangement
the density
wehavefor
vecET
in which

RII

RIP

RII

RIP

RPI

Rpp
RPI

Rpp

Usingthe result on linear transformation of a normal, we have


vec T vec{
in which

N(O, S)

S = Re (XXT) 1

(4.59)

Probability,Random Variables, and


408
Using the

plify this

Kronecker

from Section 1.5.3, we can


product formulas

follows
covariance as

Sims

result for S, is the matrix analog of the vector


this
with
Equation(4.59),
result in (4.46).
the elliptical confidence region for vecT
density,
normal
Giventhe
4.7.1

in Section
can be found as

S -1 (vecT vec(T)

(vecT

x2(np,

a)

(4.60)

Interlude
orthogonality
Let's put the tools of

and Kronecker products to good use

in statistics, namely that the sample


and provea fundamentalresult
normal distribution are statistically
meanand sample variance from a

independent.

Theorem 4.22 (Meanand variance of samples from a normal). LetXie

RP,i = l,...,n be n independent samples from NW, E). Definethe

samplemean and the maximum-likelihood estimate of the variance as


n

n
ThenX is distributed as N(u, (1/n)E) and independently of E, and ny is

distributedas

and identically as N(O, E)

Proof. Stack the n

in which the Zi are distributed independently

vectors next to each other in a matrix

Wenext construct an orthogonal transformation of this matrix. LetI


be l/vl times an n-vector of ones so
that Xl = V. Next consider

the null space of I T.


From

the fundamental

theorem

of linear algebra,

that is an n I dimensional
space. Collect an orthonormal basis in the

aximum-Lik
M

(11

elihood Estimation
409

Bn-l. Then construct the


matrix
following
1)
orthogonalB

atri$

BnT 1
IT

BT _

BBT

e the

BTB = 1

transformed random variables

pefill

XB T

X. The samples Xi are distributed as


ill

vecX
more compact
in
or

xn
notation

vecX N(FLI

transformationgives for Z
The

vecz = (B 1)vecX

vecZ

inwhich

these expressions gives


Rearranging

P = (BBT

the orthogonality relations we have


From
BBT = 1
1
so

vecZ =
zn-l

Probability, Random

Variables, and

Estimation

410

variables Zl,
conclude that the
we
covariance
From the independent. Computing E gives
statistically

',zn are

xxT)
(XiXiT

1
--- (X X T nxx T )

1
T T
(ZB BZ -- znzn)

n
1

Z ZT

znznT)

nl

ZiZi

distribution for E. Since E is a function of


which establishes the stated
of only zn, E and x are independent.
function
a
is
x
and
zn-l
...
onlyZl,
(1/n)E), and the theorem

X
SinceX = zn/ffi, we have that

N(u,

is

proved.

statistical texts using a dazzling


This result is established in many
the mystical. The proof given
varietyof arguments, some bordering on
method given by Anderson
above is a compact expression of a standard
(2003, p. 77).

4.7.4 Vector of Measurementsy, Different Parameters Corresponding to Different Measurements,Unknown MeasurementCovariance R


When R is also unknown, we maximize L in (4.57) with respect to both

and R. The G)derivative has been given previously. Differentiating


(4.57)with respect to R is facilitated by using the following fact about
the trace of a matrix product

tr(B) = tr(BA)

whichfollowsimmediatelyfrom the definition of trace and expressing


the matrix product in components

tr(AB) = AijBji = Bji Aij

47

Maximum

-LikelihoodEstimation
411

twice on a product of three


matrices gives
tog
tr(ABC) = tr(BCA) = tr(CAB)
allows us to rewrite the following scalar
identity
term
xi)TR-1 (Yi Xi) = tr ((Yi - 9xi)TR-l
(Yi 9Xi))
= tr (R-1 (Yi - 9Xi)T)
the following fact in differentiating the trace of
use
we
a function
Next
this result

of a

matrix

dtr(f(A))

g(x) = df(x)

= g(AT)

dx
the usual scalar derivativeof the scalar function
whichg is a derivation of this fact.
f. See
for
Applying
this result and using

Exercise4.4

thefact

that R is symmetric gives

= -R -2 C

of the determinant and the log of the determinant


Thederivative
are
derivation)
a
for
4.5
(seeExercise
d det A

d In detA

detA

= (A

TheR derivative of (4.57) is therefore

+ -R -2

= -R
2

- oxi)T

this matrix equation to zero, using the estimate of 9, and solvSetting


inggivesthe maximum-likelihood estimates for this problem
-1

E(Yi

TheestimateR is an unbiased estimate of the measurementvariance


R.Thedistribution for nR can be shown to be a Wishartdistribution
(seeExercise4.51), which is a generalization of the X2distributionto

Probability,Random

Variables, and Estimation

412

The Wishart distribution can be


1928).
case (Wishart,pp. 252-255)
multivariate
the
(Anderson, 2003,
be
to
1tr(R -l W)
W) 2P
(det

=
pw( W)

the
in whichrp is

(4.61)

function defined by
gamma
multivariate

rp(z) = TT

1))

probability density pw() is a POsitive


the
of
Notethat the argument
is zero for W not positive definite.
probability
The
W.
definitematrix
Measurementsy, Same Parameters for all Measureof
Vector
47.5
Measurement Covariance R

ments, Known
in which the different measurement types
Nextwe consider the case
parameters. The model is
are affectedby the same set of

el

e2

ep

In this model, all of the different components of the measurement


are affectedby the same, single vector of parameters

O. Considern 1 samples, i = 1, ... , n, and, given the deterministic


variablesO and the n Xi, we have for the probability density of the
measurements

exp(-

i=l

-Xi)

np
2

i=l

-Xi0) (4.62)

41

Maximum

raking

-LikelihoodEstimation
473

with respect to O gives


derivative
the
2
n

E 2XiT R -1 Yi ---2Xi T R -1 Xi0

(Yi- Xi0)

equation to zero and solving for O


Vector
gives the maximumsettingthisestimate
likelihood

Ixt)

(4.63)

it can make sense to estimate Owith a single


problem,
sample
Inthis
the
number
choose
of
can
measurements
we
p
1)if
significantly

number of parameters np. For a singlesample,


the
than
the
larger
formula is
estimate
parameter

= (X TR -1 x)-

X TR -1 y

(4.64)

of a weighted least-squares problem using R-1 as


whichis the solution
Compare this expression to (4.45).
theweight.
this is the first estimation

problem for whichthe


Noticealso that
estimate of the parameter O depends on the comaximum-likelihood

of the measurement error R. We see next that this dependence


variance
us from solving the final estimation problem in closed form.
prevents
Wenext calculate the probability density of the estimate. We denote
theparametervalue generating the data as 00, so the measurements are
givenby

Yi = Xi()+ ei

andsubstitutingthis result into the estimate equationgives

ExTR -1et

el

+ (EXTR -1Xi)

[XTR-I
en

Usingthe result on linear transformation of a normal, we have


O - 00

(4.65)

Probability, Random Variables, and Estimation

414

in which

XTR-I

s = (ExTR -l Xi

R-IXI
-1

R-l xn

S = (EXT R-l xt)

-1

Giventhe normal density, we can compute the elliptical confidence re_


gion as in Section 4.7.1
eo) T S -1 (0 - 00)

(0

(4.66)

The bounding box intervals follow as in Section 4.7. I. Notice that when-

ever the variance of the measurement errors are known, the maximumlikelihoodestimate is normally distributed and the elliptical confidence
intervals are given by X2(np,

4.7.6 Vector of Measurementsy, Same Parameters for all Measurements, Unknown Measurement Covariance R
The final case is the one that arises most often in mechanistic modeling

of chemical and biologicalexperiments. To determine the unknown


R, we maximize L(O,R) over R in addition to O. Using the results of
Section 4.7.4 we can take the derivative of (4.62) with respect to R giving

n
2

Setting this result to zero and using the result of the previous sec-

tion gives the followingset of necessary conditions for the maximumlikelihood estimates

(ExTk -1Xi)

ExTk -1Yi (4.67)


i

(4.68)

tion

covariance such as Ro =
I. One
then

estimates

by SOIVinga sequence of
standard
thereis no guarantee that this

the iterate

by

estimation
procedure
a crudeinitial guess like Ro = 1 lies
converges. problems.
But
Onemay
outside the
find
regionof
that
convergence
Maximum-Likelihoodand Bayesian

Estimation
background
this
in
maximum-likelihood
liketo compare the approach to
another classestimation,we
would
knownas Bayesian estimation. As
we saw in of popularmethods
the previous
in the maximum-likelihood approach,
we
sections
maximize the

OMLE
= argmaxp(y; 0)
Althoughin the MLE sections we wrote

probabilityof

(4.69)

to indicatethat
a parameter, here we use instead p(y; 0)
e was
to emphasize
that e is an
unknownparameter, not a random variable.

In the MLE
approach,e is
a random variable, not O, and we assess the
confidenceintervals
In Bayesian estimation, on the other hand,
for 0.
itself
is modeledas
a random variable. The information that we have
about e beforethe
experimentis denoted by p (O). In the experiment,
we imaginedrawing
well
as
e
as
of
the
a value

measurement errors to createthe


data =

x! e + ei, i = 1, ... , n. With the measured y available,we


thenmaximize

p(ly)over O to obtain the estimate

= argmaxp(\y)
Theconditional density p (OIy) is known as the POSTERIOR
density,i.e.,
the density for O after the experiment, and the density p (0) is knownas

thePRIOR,
i.e., the density before experiment. In Bayesianestimation,
weassess how much the measurement of y has changedourknowledge
aboute. From Bayes's theorem we can express the posterioras

Variables, and
Estimation
Probability,Random
416

same functional form as p(y; 0) in


the
exactly
is
not depend on O
Noticethat p(y10) Sincethe denominator does
the MLEapproach. we estimate O by the following equivalent
Bayesianestimation
mization
argmaxp(ylO)p(O)
(4.70)
OBE =
estimators
difference in the

(4.69) and (4.70) is the presence of

approach. In the absence of knowledge


Bayesian
the
in
the prior p(O)
that p(O) is a uniform distribution. This is
assume
often
we
about O,
prior. Since p (O) does not depend on Owith
noninformative
calledthe
MLEand BEestimates are identical in this
the
prior,
the noninformative
case.
Bayesian estimation is a useful way to sumThe posterior density of
about the parameter Ogiven the available
knowledge
of
state
the
marize
available the posterior density, confidence
experiments. Sinceone has
determined directly from p (Oly). Box
levels on random variable O are
discussion of Bayesian estimation. In
and Tiao (1973)provide further
problem of state estimation, we
Chapter 5 when we address the
use the Bayesianapproach.
The only

4.8 PCA and PLS regression


Principalcomponentsanalysis (PCA)and projection onto latent structures (also known as partial least squares) (PLS)are two methods used
to developempiricallinear models between a vector of predictor or
environmentalvariablesx, and a vector of responses y. This is the
same linear model discussed in Sections 4.7.3 and 4.7.4, so we can
viewthese methods as alternatives to the maximum-likelihood estimation approachpresented in those sections. The focus of these methods
is on determiningestimates of the linear model that can handle situations with possible collinearities in the x variables, and missing or
erroneousinformation,such as unknown error structure. Collinearities in the data can make the maximum-likelihoodestimator highly
sensitiveto outliers and nonnormal errors. Because the measurement
error structure is regarded as unknown or at least unreliable, robustness of the estimated model to unmodeled effects is the goal, rather
than statistical optimality as in the maximum-likelihood methods.
As in Section4.7.3, let p-vector y and q-vector x be related by the

linear modely = Ox + e, and we wish to determine the parameter


matrix (+)e [RPxq
given data on y and x. We use Xi,Yi,i = 1, 2, , n
to denote the availablesamples. We assume
n > q (often n > q) so

and PLS regression


4.8 PCA

477
that we have more equations than
unknowns,
a well-conditioned estimation problem.
It is which is necessary
customaryto
for
definedata
1

XT

to

Y e NIXP,x e Rnxq, and the


model is Y =
X9T
use a more standard notation

model
the linear

we let B =

Y=XB+E

e Rqxp+ E. In order
,
and we have

Wewish to estimate parameters B from


measurements X
and Ywithout
knowledgeof the statistical structure of E.
Given
what
we already
aboutleast squares from Chapter 1, a natural
know

minimizesome measure of the size of the approachwouldbe to


residualmatrixE
choicesof B. If we choose the sum of the squares
overall

of all the elements


matrixE as our measure, we have (the square of)
of
the so-calledFrobenius
matrix
normof the

Soour first candidate for estimating matrix B is

min IIYIt is not difficult to show that the solution to this problemis the following

Bis =

= xtY

withthe usual pseudoinverse that we have seen in the standardvector


least-squares problem in Chapter 1. Notice that by taking the transpose, this is also the maximum-likelihood estimate givenin (4.58)for

the case in which the measurement error in y is assumednormally


distributed with covariance R, whether the covariance is known, or un-

knownand must be estimated from the data.

Also, we already know that XTX has an inverse if and only if the
columns of X are linearly independent; see Proposition1.19.Sincewe
maynot have control over the experimental conditions,we oftenmust
dependent
contend with datasets in which X has dependent or nearly
X. In such
columns,i.e., we have near collinearity in the columnsof

Probability, Random Variables, and Estimation

418

011

the maximum-likelihood

estimate Bis is unreliable and sensitive


the

small errors in

in the data or
to small changes
cases,

assumedmodel

structure.

idea what to do about this issue given


clear
a
have
SVD. But we also
decomposition (SVD).We first replaceX
value
singular
background with
more rows than columns
USVT,and since X has
its (real) SVDX =
we obtain
>

E VT
E = diag(1, ,q),

first q columns of U, and U2 containsthe


the
contains
UI
which
in
Multiplyingthe partitioned matrices gives
remainingn q columns.
X = UIEVT
which E has several small singular values
Next to handle the case in
columns that are nearly collinear,we
correspondingto matrix X with
singular values to zero. ASsume
approximateX by setting any small
and q C small singular values that
we have C large singular values,
of X may be q, but with small
are nearly zero. In this case, the rank
drop to rank C.We
perturbations to the data in matrix X, it can easily

andti

have

tom

that

= UcEcv{ + UqEqv{
X

UcEcv cT

(4.71)

int(
of

Using this lower-rank SVDin place of X then gives the followingmore


robust least-squares estimate
= VcEi 1U/Y

(4.72)

The ill-conditioningcaused by inverting E with all q singular valuesis


overcomeby inverting only the largest C singular values. Thus the SVD

s
estimateis less sensitiveto errors in the data than the least-square

or maximum-likelihoodestimate. Realize also that only the maximumlikelihoodestimate is unbiased. By suppressing the small singularvalues, we introduce a small bias in BSVD,
but greatly reduce the variance
in the estimate.

and PLS regression


PCA
8

orthogonal matrices T,
known as
the
only
first
loadings.
C principal the scores,
components and p knOwn
are
columns of T and P, respectively.
retained as the
The

Principal

component

BPCR= PI

regression

so the correspondence with the SVD


approach is
in pcR are the product of the singular
as follows.
values
and the left The scores
tors Tc = UcEe. The loadings are the
singularvecsubstitutingthese relationships into right singular vectors,
the formula
Pc = ve
for

showsthat

B PCR = BSVD

andthe two approaches are equivalent. so


one advantage
the SVDas part of linear algebra is that you
of learning
have also learned

PCR.
pLSR. A potential drawback of the PCR
approach is that
only the predictorvariables are evaluated. The principal
componentsare selected
to maximize the information about matrix X.
But there is no
guarantee
that these components can represent the responses
Y. To improvethe
predictive capability of the model, the PLSregression
(PLSR)
adds a very
interesting wrinkle. In this approach, one does not start
with the SVD
the
SVD
of
with
XT
Y, which includes information
of X but
about both

X and Y and the correlation between them. Notethat XTYe


whichis a small matrix regardless of the numberof samples,n. So

computing the SVD of a matrix of the dimension of XTY,which is done


repeatedly in PLSR, is a fast computation. The components, calledlatent variables, are obtained recursively as follows (Mevikand Wehrens,

2007).The first left and right singular vectors

and VI, are used to

obtainthe scores ti and WI, respectively,via


ti = Xul = Elul

WI = YVI = FIVI

and Y,respectively.
in which the matrices El and Fl are initialized as X
t{tl. Wenowdefine
=
tl/
ti
normalized
usually
TheX scores are then

420

Probability, Random Variables, and

the two loadings, PI and

Estimation

using the same score tl

PI = El ti
Next the data matrices are deflated by subtracting the informationin
the current latent variablevia
T
Ei+l = Ei -- tipt

Fi+l = Fi titli

in place of X TY
The next iterate starts VNiththe SVDof Ei+1Fi+1
and the

process is repeated. As in PCR,the number of latent variables C q is


chosen as the number of iterations of the algorithm. The left singular
vectors ui, the scores ti, and the loadings Pi and qi for i = 1 2
are stored as the columns of the four matrices U, T, P, and Q. We
do not require the right singular vectors Vi. Finally we compute the R
matrix from
-1
R = U(P TU)

and use the low-rank approximation, X TRT. The PLSsolutionis


T
then the least-squares solution of Y = TR B giving
= R(T TT) I TTY = RQ T

Cross validation. In both PCRand PLSwe need to decide how many


principal components or latent variables to retain in the model. The
most widely accepted method to make this decision is known as cross
validation. In cross validation the dataset is divided into two or more
sets; one set is used for fitting the parameters, and the other set is used
to evaluate the predictive power of the model using the remainingdata
that have not been used in the fitting process. The validation error, de-

fined as Ev = YvXvBc, in which c is the estimated model parameter

matrix using the fitting dataset (X, Y) and the chosen number of principal components or latent variables, C. To determine the best valueof
C to use for estimating B, one finds the C that minimizes llEvIl}.This
value of C is large enough that the model fits the data accurately,but
not so large that the model has been fit to the noise in the data. We
demonstrate the cross validation technique with the following example.

Example 4.23: Comparing PCRand PLSR


Consider a dataset with five predictor variables, x e R5, to modela
vector of two responses, y e R2. The dataset has 200 samples. The

4.8 PCA

and PLS regression

421

4
3
llEpcRllF

16

12
llEpcRvllF

4
0
1

4
5

Figure 4.11: The sum of squares fitting error


(top)and
error (bottom) for PCR versus
the numberofvalidation
components C; cross validation
principal
indicates
that
four principal components are best.

dataare available in file pca-pl s-data. dat on the website


www.che.
wisc.edu/Njbraw/pri nci pl es.
Wewould like to estimate the coefficient B in the model
Y = XB.
Comparethe results using PCR and PLSRfor the regression. Show
the
prediction error in Y for the number of principal componentsor latent
variablesranging from one to five (full least squares). Whichregressionmethod provides the best fit with the smallest number of principal
components/latent variables?
Solution

Firstwe divide the 200 samples into two sets, and use the first 100
samples for estimating the parameter matrix B, and the second100
samplesfor cross validation. For principal componentanalysis,we
compute the SVD of the 5 x 100 X matrix. The five singularvaluesare
E = diag(15.1, 3.26, 2.72, 2.67, 0.0226)

Probability, Random Variables, and

422

Estimation

8
7

6
5

4
3

Il

IlF

4
5

Figure 4.12: The sum of squares validation error for PCR and PI-SR
versus the number of principal components/latent variables C; note that only two latent variables are required
versus four principal components.

We see that X has four large singular values and one near zero, indicating that the rank of X is nearly four. Next we estimate BPCR
using
(4.72)for V = 1, 2, 3, 4, 5 and calculate the sum of squares of the fitting
error, IlY XBpcRllF.The results are shown in the top of Figure 4.11. It
is not surprising that the fitting error decreases with increasing number of principal components. As we see, the fitting error contains little
information about how many principal components to use. After estimating the parameters, we then compute the output responses for the
validation data and compute IlYvXvBpcRIIF
in which Xv, Yv are the
predictor and response variables in the validation dataset. This validation error is plotted in the bottom of Figure 4.11. Here we see that
we should use four principal components in the model, in agreement
with the SVDanalysis of X. Using the unreliable smallest singular value
in the regression causes a large error when trying to predict response
data that have not been used in the fitting process.

'

regression

0.4
423
1.2

pcR -0.4

0.8

-0.8
-1.2

-1.2

-0.8

00

00 0

-0.4

0.4
0.8

1.2

PLSR -0.4

-0.8

1.2

0.8

9000

0.4

00

-1.2

-1.2 -0.8

PLSR

00

0.4

0.8

1.2

Figure 4.13: Predicted versus


measured
dataset.

Top: PCR using outputs for the


validation
four
Bottom: PLSR using
two latent principalcomponents.
variables.
Left: first out-

Next we implement the PLS regression


algorithm as described
above
for C = 1, 2, 3, 4, 5 latent variables. The
validation error is shownin
Figure4.12 along with the validation error of PCR.
Noticethat onlytwo
variables are required

latent
to obtain the sameerror as fourprincipal
components. This reduction in model order is the primarybenefitof

the PLSR approach.

By evaluating

the SVD of XTY instead of only X, we

obtainthe latent variables that can explain the responsesY,notjust

the variables with independent

information in X, whichis whatPCR

provides.

Next,in Figure 4.13 we present the predicted responsesversusthe


prediction
measured responses for the validation dataset. A perfectthesedata
Notethat
wouldbe a straight line with a slope of 45 degrees.
displaysthe predictive
plot
this
so
process,
werenot used in the fitting
withtwolatent
model
PLS
the
that
Capabilityof the model. We see
as the PCRmodel

capability
Variableshas roughly the same predictive

Probability, Random Variables, and


Estim

424

Qti0h

0.8

008880
0
00

PCR

-0.4

00

-0.8

-1.2 -0.8 -0.4

0 0 00 0

0.4

000 0

0.4

00

0.8

0.8

1.2

00 0

PLSR

00
000 0

-0.8

-1.2 -0.8 -0.4

0.4

00

0 08 000
0 &00

0.4

0.8

PLSR

1.2

Figure 4.14: Effect of undermodeling. Top: PCR using three principal components. Bottom: PI-SR using one latent variable.

with four principal components. Finally, in Figure 4.14 we makethe


same comparisonif we use only three principal components and one
latent variable. Notice that we obtain significantly worse predictionsof
the validationdataset, indicating that we have undermodeled the data
by choosingtoo few variables for the regression.

Bynow there is an extensive literature including many books and


research monographs on model regression with PCR and PLSR.Many
researchers have documented the usefulness and robustness of these
techniquesto identify linear empirical models in numerous applications. The understanding of PCR is reasonably complete, since it is
based on the SVDof the single matrix, X. By contrast, the understanding of PLSRis not as complete. PLS was introduced by H. Woldin the
1960s in the field of econometrics (Wold, 1966). The use of PLSin
the fields of analytical chemistry and chemometrics was pioneeredby
KowalS. Wold, Martens, and Kowalski.
The tutorial by Geladi and
(2001)
ski (1986)and historical reviews
by S. Wold (2001) and Martens

Appendix----Proof of
the

central

unmarize the approach and

eoreth

whichmake it easy for the user


to try
out these
As demonstrated in the
approaches
example,
ratherthan X is useful for finding
and
starting
With
the
that have the most
the
Predictive smallestnumber
yet provided a complete
capability.
oflatent
analysis of
But
we do not know, for
the method PLSR
research
example,
and
has
its
or
in
whether there
malestimator
what sense salient
might be,
methods.Adding to the complexity,
is an
as Yet PLSR
undiscovered,
severaldifferent
algorithms have been developed.
better
The
alternative
appearance
algorithmshas in turn generated
some confusion of manydifferent
clarifymatters, connections between
and controversy.
the
To
properties of
different algorithms have been

several
established. But
until some of the
properties of PLSR are uncovered,
optimality
research on the
PLSRapproach
likelycontinue. In any field, a

valuable technique
that alsodefies
explanationis a prime target for further
easy
research.

4.9 Appendix Proof of the Central Limit


Theorem
In this appendix we provide a complete proof of Theorem
4.16.Wefol-

lowthe basic approach outlined in the stimulatingpapersbyLeCam


(1986)and Pollard (1986). Moreover, in this version of the centrallimit
theoremwe not only establish convergence to the normaldistribution
as n -Y 00,but also develop an approach that leads to boundsvalid
forfiniten on the distance of the sum's distributionfromthenormal
distribution. This version of the central limit theoremand,moreimportantly,the techniques used to establish it are widerangingandworth
variables.As
knowingfor researchers making extensive use of random
that noneof
mean
we
which
by
Youwill see, the proof is elementary,
alreadyfamiliar
not
are
that
the steps require any advanced techniques
also that thismateNote
long.
to the reader. But the proof is rather
of anyother
understanding
rial can be skipped without affecting the

Probability,

Random Variables, and Estimation

40

426

the text.
section in

proof.we

start

two
by considering

sums of independent random

var(Xk) = var(Yk) = al. The zero mean


and
O
=
have nonzero mean
(Yk)
original
which(Xk) not restrictive. If the
variables Rk = Xk - IQ. Next
assumptionis the zero mean, shifted
considerinstead
follows

2)

define Rk as

Rk EXJ+EYJ,
so that
RI

+ Xnl

Notice from this definition

that Rk and Xk as well as Rk and Ykare also

shortly why the Rk variables


independentfor k = 1,2, . , n. We see
are useful.
the form we choose here
Wealso require an approximation theorem;
F.W. Scholz (2011).
is motivated by a nice, unpublished note of

and
expectations

Theorem4.24(Taylor'stheorem with bound on remainder). Letf be

a boundedfunction on R with three continuous, bounded derivatives.


Considerthesecond-orderTaylor series with remainder

this ineqU?
Establishing
Fsnand FTn,

toachievethisgoal.

Theremaindersatisfiesthe following bound for all h e R


sup Ir(x,

xeR

Kf min(h 2, lh13)

(4.73)

fixed

The term lh13is expected from the standard Taylor expansion, but

includingthe term 112gives a better bound


for large h, which we shall
find useful subsequently.
Exercise 4.54 discusses how to prove this

theorem, which is not difficult.

Indicates

th

4.9 Appendix

Proof of the

central

so we assume that f has


three

Limit
Theorem

continuous,
bounded
derivatives

f(Xk + Rk) = f (Rk)


+ Xkf' (Rk) + X:
2

f" (Rk) +
Y(Rk,Xk)
performing a similar expansion
for f (Yk
+ Rk),
taking
expectations,

+ Rk)
(112)

f(Yk + Rk)) =
((Xk)
(Rk)) +
wherewe have used the fact that
(AB)
dependent random variables. Noting =
that the first
+ Rk) - f(Yk + Rk))
I

for A and B
two terms incancel,

+ g(Yk))

where we used the fact4 that

and

(4.74)

to compress the
g (X) = min(X2 ,
and defined
notation.
Next
comesthe reason
for introducing the Rk variables. Notice
that
f(Rk + Xk) and f (Rk + Yk)leaves only two differencingthe sum of
terms
n

f(Rk +

-f(Rk + Yk) = f(Rn +Xn) -f(R1


+YI) = f(Sn) -fan)

Taking expectations and then absolute values and using


(4.74)then
gives
-

+g(Yk))

(4.75)

Establishingthis inequality is the first major step.

But we wish to bound the distance between the two cumulativedistributions Fsn and FTn, so we next choose an appropriate functionf G)
to achieve this goal. Consider the step function fl(w;x) depictedin
Figure4.15, in which w is the argument to the functionand x is considered a fixed parameter. Using fl (w; x) we have immediately
= Fsn(x)

fl(Sn)
The function fl ( ) is known as an indicator function,because
So this is
indicates when the random variable Sn satisfies Sn x.
4Sincef (x)

density px(x)
If (X) I for all x, multiply by the

and integrate.

Random Variables, and Estimation


Probability,
428

Figure 4.15: The

indicator (step) function

and its Smooth

f(w;x). A piecewise fifth-order polyapproximation,


derivatives up to third order;
nomialgives continuous

see Exercise4.57 for details.

the kindof functionwe seek, but, of course, fl does not have even

a boundedfirst derivative,let alone three bounded derivatives as re-

quiredin our development.So we first smooth out this function as

depictedin Figure4.15. Exercise 4.57 gives an example of a piecewise


polynomialfunctionf with the required smoothness. Moreover,
there
existsan Lo > 0 such that Kf = 20L-3 is a valid upper bound
in (4.73)
for everyL satisfying 0 < L Lo; see (4.97). We will require
this bound

shortly.
Computing ff(Sn) gives

Psn(w)f(w;x)dw =

psn (w)dw +

Psn
and, subtracting the

Fsn(x)-FTn(x) =

Psn (w )f (w; x)dw

x)dw

analogous expression
for Tn and rearranging

-fan))-f(Tn)) +
f (Tn)) +

bnL

(Psn(w)

gives

(w))f(w;x)dw

(w)f(w;x)dw

ndix ----Proof of the Central Limit


Theorem
maxw PTn(tv.

429

choose

control over the constant bn.


(4.73) then gives the followingboundabsolutevalues

\ FOX) --

Kf

this
L small,

side

+ bnL

is the second

step. Notethat

make K f large, so making the sum on the


right-hand

require a judicious choice of L.

choose the Yk to be N (O,

, and

scaled sum Tnl sn =

mean and unit variancefor n,


a stan8k I
distribution tunction
This gives
denoted Z
al,
norm
ar
for (4.76) the value bit = maxwpz(w = \ I

variable Zn = Snlsn =
n for this choice of Yk.
independent 01
also
scaled Xk, and also has zero mean
is a sum of
Applying (4.76) to these variablesgives
n.
sariance for

the interval ot integration

side, we
right-hand
evaluatethe
discussedbetore
as
g (Xkl

\Xkl
>

sn\

the sum

gives

enoughn itis

smaller

Probability, Random Variables, and Estimation

430

of n. So as n

00we have that


n

26

normally distributed Ykvariables satisfy


Wealso can show that the
the
4.56 for the
Exercise
See
do.
Xk
the
if
steps. So
Lindebergconditions
00
we have that as n
n

46

+ g(Yk/sn))

(4.78)

Nextwe choose L, and therefore Kf, as follows


(46) 1/4

+ g(Yk/sn))

L= (

To use the bound in (4.97),we require L Lo. Therefore setting 0


Ld/4 > 0, we have from the previous inequality that for every < 0,

Kf = 20L-3

(20

3/4

Substituting these values for Kf and L into (4.77) and using (4.78)gives
sup IFzn(x)
x

ce l / 4

with c (1/2)5-3/4 + 1/ F
0.71. Since this bound holds for all
0,we have established that

lim00 sup IFzn(x)

=0

and the proof is complete.

4.10 Exercises
Exercise4.1: Consequences of the axioms of probability
(a) If B g A, show that Pr(A \ B) = Pr(A)
Pr(B).
(b) From the definition, the events
and B are independent ifPr(AnB) = Pr(A) Pr(B).
If and B are independent, show
that A and 'B are independent.

4.10 Exercises
Exercise4.2: Statistical independence
random variables
showthat two

and

431

condition

in
are statistically densities
independentif
and onlyif
all x, y

(4.79)
Exercise4.3: Statistical indpendence of
functions
considerstatistically independent random variables, of random variables
e [Rmand
randomvariable e RP and 13e RQas =
e
Vt. Define
statisticallyindependent for all functions f(.)
and O.g(n). Showthat and
are
statistical independence of random
Summarizing
variables
(E,
n) implies
dependence of random variables
for all f(.) statisticalin-

Notethat f G) and g() are not required to be invertible.

andg(.).

Exercise4.4: Trace of a matrix function

Derivethe following formula for differentiating the trace


of a

dtr(f(A))

g(A T )

functionof a square

g(x) =

dx
in whichg is the usual scalar derivative of the scalar functionf.

(4.80)

Exercise4.5: Derivatives of determinants


ForA e

nonsingular, derive the following formulas


d det A
d IndetA
= (A-I ) TdetA
= (A-I)T

Exercise4.6: Transposing the maximum-likelihood problem statement

consider again the estimation problem for the model givenin (4.56),but this time

expressit in transposed form

YiT = xT + ei

(4.81)

(a) Derive the maximum-likelihood estimate for this case. Showall steps in the

derivation. Arrange the data in matrices


Yl

be expressed as
and show the maximum-likelihood estimate can

T)-IT

formulathat is analogousto
estimate
an
gives
way
this
Expressing the model
what other problem?
and givethe analogous
estimate
the
for
(b) Find the resulting probability density
result corresponding to (4.59).
and why?
(c) Whichform of the model do you prefer

Random Variables,
Probability,
432

and

Estimation

covariance
mean, and
and n. Calculate the joint density

marginal,
variables,
Joint,
random
the means,
4.7:
Exercise
discrete-valued
densities,

and covari'_

two
marginal
each die.
we consider
both
the values on
(x, y),
are
and and
dice,
two
and is the sum of the two values
die
throw
one
on
(a) we
is the value
dice,
throwtwo
inverse function
(b)cwe
the
of
density
the random variable be definedby
Probability
let
and
R
Exercise4.8: random variable e
Considera scalar
function
the inverse

O < a < 1, what is the density of n?

11
uniformly on [a,

distributed
your answer.
(a) If is
allow a = 0? Explain
we
if
well defined
(b) Is n's density

operator
as a linear
Expectation
defined as a linear combination of the
Exercise4.9:
variable x to be
the random
(a) Consider
variables a and b
random
Show

(b)

independent for this statement to be true?


statistically
to
need
Doa and b
x to be defined as a scalar multiple of the
variable
random
the
Nextconsider
random variable a

in which is a scalar. Show

(x) = cx(a)

by the linear combination


(c) Whatcan you concludeabout (x) if x is given
i

in which Vi are random variables and

are scalars.

Exercise4.10:Calculatingmean and variance from data

Weare samplinga real-valued,scalar random variable x(k) e R at time k. Assume


the randomvariablecomes from a distribution with mean and variance P, and the

samples at different times are statistically independent.

A colleaguehas suggested the following formulas for estimating the meanand

variance from N samples

4.10

Exercises

(a) prove

433

the estimate of the mean is unbiased for all N, i.e.,


show

(b) prove the

= R,

all N

estimate Of the variance is not unbiased for


any N, i.e., show
'E(PN) * p,
any N

result above, provide an improved formula


for the variance
(c) Using the
all
that is unbiased for N. How large does N have to be before these two estimate
1%?
estimates
of P are within

The sum of throwing two dice


Exercise4.11:
the

probability density for the


what is
Using(4.23), want to place your bet? How oftensum of throwing two dice? On what
do you expect to win if you
numberdo you
bet on
outcome?
this
Makethe standard assumptions: the probability density for each die is uniform
values from one to six, and the outcome of each die is
independent of
overthe integer
die.
other
the

product of throwing two dice


Exercise4.12: The

the probability density for the product of throwing two


dice?
Using(4.23),what is
want to place your bet? How often do you expect to win On
if you
whatnumber do you
outcome?
this
bet on

Makethe standard assumptions: the probability density for each die is uniform

overthe integer values from one to six, and the outcome of each die is independent of
the other die.

Exercise4.13: Expected sum of squares

Givenrandom variable x has mean m and covarianceP, show that the expectedsum
of squares is given by the formula (Selby, 1973, p.138)
(XTQx) = mTQm + tr(QP)
Recallthat the trace of a square matrix A, written tr(A) , is definedto be the sum of the

diagonalelements

tr(A) = E Aii
i

Exercise4.14: Normal distribution


Givena normal distribution with scalar parameters m and
1

exp

21T2

Bydirect calculation, show that

var) = 0 2

(4.82)

Probability, Random Variables, and

434

Estimation

R = arg max
in whicharg returns the solution to the optimization problem.

Exercise4.15: The size of an ellipse's bounding box


Here we derive the size of the bounding box depicted in Figure 4.3. Consider
positivedefinite,symmetricmatrix A e Rnxn and a real vector x e Rn. The a real
for which the scalar xTAx is constant are Il-dimensional ellipsoids. Find the set of x
length of
x TAX = b

Hint: consider the equivalent optimization problem to minimize the


value of xTAx
such that the ith component of x is given by = c. This problem
defines
the ellipsoid
that is tangent to the plane Xi = c, and can be used to answer the
original question.

Exercise4.16: Conditionaldensities are positive definite

Weshowed in Example 4.19 that if

and

are jointly normally distributed


as

17

mx

PX

Pxy

my ' Pyx P
then the conditionaldensity of
in which the conditional
mean is

and the conditional

given is also normal

N(mxly, Pxly)

mxly = mx +

my)

covarianceis
Pxly = PX PxyPy-1Pyx

Giventhe joint density


is well defined, prove
densities are also well
the
defined, i.e., given P marginal densities and the conditional
> O, prove PX >
O, Py > O, Pxly > O,

Exercise4.17: Transform
of

Showthe Fourier

the multivariate

transform of the

normal density

multivariate normal

density given in (4.12)is

(P(u) = exp iu Tm
-- uTPu

4.10

Exercises

435

difference of two exponentially


Exercise4.18: The
distributed

randomvari-

ables
random variables T 1 and T2 are statistically independent and
identically distributed
exponential density
withthe

pefinethe

new random variable y to be the difference

calculate y's probability density py.


Wewishto
a new random variable z = T2 and
define the transformation
(a) First introduce
to
from (Tl, T2) (y, z). Find the inverse transformation from (y, z) to
what is the determinant of the Jacobian of the inversetransformation?
density pT1,T2
(tl, t2)? Sketchthe region in (y,
(b) What is the joint
z) that correof
region
nonzero
probability of
the
sponds to

the joint density

(c) Apply the formula

(d) Integrate

in

given in (4.23) to obtain the transformed joint

T2.

densitypy,z.

over z in this joint density to obtain py.

samples of Tl and T2, calculate y, and plot y'shistogram. Does


(e) Generate 1000
Explainwhy
your histogram of the y samples agree with your result from
not.
why
or

Exercise4.19: Surface area and volume of a sphere in n dimensions


In three-dimensional
by

space, n

3, the surface area and volume of the sphere are given

S3(r) =

4TTY 2

V3(r) = 413Ttr 3

formulas for n = 2, in which case "surface area" is the


you are also familiar with the
circumferenceof the circle and "volume" is the area of the circle
V2(r) = Ttr2
S2(r) = 2Ttr

such that
If we definesn and vn as the constants
Vn(r) = vnrn
Sn(r) = Snrn-l
we have

S2 = 2Tt
S3 = 4TT

= TT
= 413TT

n-dimensiona\case. Compute
Weseekthe generalizationof these results to the
formulasfor sn and vn and show
Vn

Ttn12

pot
10

Probability, Random

Variables, and

Estimation

436

of an ellipsoid in n dimensions
and volume
area
surface
sphere in n dimensions can be
volume of a
extended
Exercise4.20:
and
area
ellipse (ellipsoid,

hyperellipsoid) in
for surface
volume of an
is
defined
ellipse
an
The results surface area and
by the
The surface of
equation
the
to obtain Let x be an It-vector.
dimensions.
and R2 is the
square of the
positive definite matrix
symmetric,
denoted
be
a
R
by
is
size
the
of
set
vtxn
ellipse
e
interior of the
in whichA
the
Let
ellipse"radius."
= {x I x TAx R2}
which is defined by the following
of the ellipse,
volume
the
to compute
we
dx
=

vf(R)

The surface area,

have the following relationship with the volume


(R), is defined to
av(r)
e
= sn(r)
v(R) = s e (r)dr

such that
for stet and
(a) Deriveformulas
e n- 1
e
= snR
sn(R)

(detIS1
r 12

rxr,

for the ellipse.

(b) Showthat your result

subsumes the formula for the volume of the 3-dimensional

ellipse given by

=1

V = rabc

Exercise4.21:Definiteintegrals of the multivariate normal and X2

4.25: Use
Exercise

over an elliptical region


(a) Derivethe followingn-dimensional integral
y(n/2, b)
-xTAxdx = TTt1/2
b= {x I x TAx b}
(detA) 1/ 2 rot/2)

the follo
Establish

(b) Let be distributed as a multivariate normal with mean m and covarianceP


and let denote the total probability that takes on a value x inside the ellipse
(x - m) TP -1 (x - m) b. Use the integral in the previous part to show

r(n/2)

(4.84)

(c) The X2(n, a) function is defined to invert this relationship and give the size of
the ellipsethat contains total probability
X2(n, a) = b

(4.85)

Plot (n/2,x/2)
rot/ ) and x2(n,x) versus x for various n (try n = 1,4), and display
the inverse relationship given by (4.84) and (4.85).

Exercise
4.26: Le
Considera model

inwhichy e

RP

the
measurement

estimate
of giv
assume
is dis

4.10 Exercises
Exercise4.22: Normal distributions under
linear
RPI,obtained by the linear transformation

437

transformations

, consider

the random
variable

which A is a nonsingular matrix. Using the


if
result on
sities, show that
then
N (Am, transforming
linear transformations
APAT). This probabilitydenthat (invertible)
of (nonsingular)
result
normal random
establishes
variablesare
4.23: Normal

Exercise

with singular
covariance

consider the random variable e


and an
matrixPXwith rank r < n. Starting with the arbitrary positive semidefinite
definition of a
singularnormal,covariance
Definition
)1/2 exp [ - (x -

in whichmatrices A e [Rrxr and orthonormal Q e [Rnxn


are
decomposition of PX

value

PX= QAQT = [QI

(22]

- mx)]

mx))

obtained from the


eigen-

QT
2

andAl > O e Rrxr, QI e Rnxr, (22 e Rnx(n-r). on


what set of x is the
density

Exercise4.24: Linear transformation and singular normals

proveTheorem 4.12, which generalizes the result of Exercise


4.22 to
lineartransformation of a normal is normal. And for this statementestablish that any
to hold,we must
expandthe meaning of normal to include the singular case.

Exercise4.25: Useful identities in least-squares estimation

Establishthe following two useful results using the matrix inversionformula

(A-I + c TB l c)
(A -I + C TB A C)

-1

-1
= A-AC T B+CAC T) CA

-1
c TB -1 = AC T (B + CAC T)

(4.86)

Exercise4.26: Least-squares parameter estimation and Bayesianestimation


Considera model linear in the parameters
(4.87)

in whichy e RP is a vector of measurements, O e [Rmis a vector of parameters,

X e Rpxm is a matrix of known constants, and e e RP is a random variablemodeling


themeasurement error. The standard parameter estimation problem is to find the best
e, which
estimateof O given the measurements y corrupted with measurement error
weassume is distributed as

Probability,

Random Variables, and Estimation

438

errors are independently and iden_


measurement
the
case in which
(T ,
the
variance
Consider
With
(a)
distributed
solution are
tically
and
T
(x Tx) I x y
2
squares problem
X011
min lly --

to be sampled

from (4.87) with true parameter value

measurements
formula,
Considerthe using the least-squares

00. Showthat
tributedas

the parameter estimate is dis_

(x Tx)

estimation problem. As_


(4.87) and a Bayesian
of
model
again the
random variable O
(b) Nowconsiderdistribution for the
sume a prior
measurement y, show this density
density of O given
conditional
covariance
Computethe
find its mean and
and
normal,
a
is
(Oly)
give the same result
and least-squares estimation
estimation
Bayesian
other words, if the covariance of the
Showthat
noninformative prior. In
a
of
limit
measurement error, show
in the
the covariance of the
to
compared
large
prior is
T
P
m (x Tx) -1x y

is solved for the general


least-squaresminimization problem
(c) What(weighted) covariance
measurement error

formula for this case.


Derivethe least-squares estimate

sampled from (4.87)with true param(d) Againconsiderthe measurements to be


formula gives parameter
eter value 00. Showthat the weighted least-squares
estimates that are distributed as

N (00, P)

and find Pfor this case.


(e) Showagain that Bayesian estimation and least-squares estimation give the same
result in the limit of a noninformative prior.

Exercise4.27:Least-squaresand minimum-variance estimation


Consideragain the model linear in the parameters and the least-squares estimator from

Exercise 4.26

x TR -1 x)-

X T R -1 y

Showthat the covarianceof the least-squares


estimator is the smallest covarianceOf
all linear, unbiased estimators.

4.10

Exercises
439

Two stages are not


better than
Exercise4.28:
one
often decompose an estimation

can
problem into
we wish to estimate x from
which
stages.
in
case
measurements
consider the
an
intermediate
variable, Y,
x and
of
z,
following
and the
but we
model
have the
betweenY
and z model
cov(el)
QI
z = By + e2
cov(Q) =
(22
the optimal least-squares
(a) write
problem to
surements and the second model. Given
solvefor
y, write
giventhe z
in
terms
R
downthe
of S'. Combine
problem for
meaoptimal
these
two
least-squares
results
resulting estimate ofR given measurements
of z. call this togetherand writethe
the two-stage
estimate
(b) combine the two models together into a single
model and show
the relationship
z = BAX+ e3
cov(e3) = (23
Express (23 in terms of (21, (22 and the
models A,B. What
is the optimal
squares estimate of R given measurements of
leastz and the one-stage
model?call
(c) Are the one-stage and two-stage estimates of x the
same? If yes, prove
no, provide a counterexample. Do you have to
it. If
make any
assumptions about the

Exercise4.29: Let's make a deal!


consider the following contest of the American television game show
of the 1960s,Let's
Makea Deal. In the show's grand finale, a contestant is presented
with three doors.
Behindone of the doors is a valuable prize

such as an all-expenses-paid
vacationto
Hawaiior a new car. Behind the other two doors are goats and donkeys.

The contestant
selectsa door, say door number one. The game show host, MontyHall,then says,
"BeforeI show you what is behind your door, let's revealwhatis behinddoor
number three!" Monty always chooses a door that has one of the boobyprizes behindit.
As the goat or donkey is revealed, the audience howls with laughter. Then Montyasks

innocently,
"Before I show you what is behind your door, I will allow you one chance to change

yourmind. Do you want to change doors?" Whilethe contestantconsidersthis option,


theaudiencestarts screaming out things like,
"Staywith your door! No, switch, switch!" Finally the contestant choosesagain,
andthen Monty shows them what is behind their chosen door.
Let's analyze this contest to see how to maximizethe chanceof winning.Define
to be the probability that you chose door i, the prize is behind doorj and Montyshowed
youdoor y (named after the data!) after your initial guess. Then you wouldwantto

max

foryour optimal choice after Monty shows you a door.

(4.88)

Probability, Random

Variables, and Estimation

440
probability that the prize is behind
and give the
density
this conditional
door j * i.
(a) Calculate original choice, and
door i, your
behavior. Please state the one that is
Monty's
of
model
to specify a
(b) Youneed
Make a Deal.
appropriate to Let's
behavior is the answer that it does not matter
Monty's
model of
model for the game show?
poor
a
(c) For what other
this
is
why
if you switchdoors.

and conditional density


transformation
variable y, and
Exercise4.30:A nonlinear
relationship between the
following
Considerthe

textbook
The author of a famous

random

x and u,

wants us to believe that

pylx(YIX) = pw(Y - f(X))


additional assumptions on the random variablesx
what
state
and
Derivethis result
result to be correct.
and w are required for this

Exercise4.31: Least squares and

confidence intervals

dependence of the reaction rate is the Arrhenius


A commonmodel for the temperature (rate constant, k) is given by
model. In this model the reaction rate
(4.89)
k = ko exp(E/T)
factor and E is the activation energy,
in which the parameters ko is the preexponential
in Kelvin. We wish to estimate ko
scaledby the gas constant, and T is the temperature
constant), k, at different temperand E from measurements of the reaction rate (rate
atures, T. In order to use linear least squares we first take the logarithm of (4.89)to
obtain

In(k) = In(ko) -E/T


Assumeyou have made measurements of the rate constant at 10 temperatures evenly
distributed between 300 and 500 K.Model the measurement process as the true value
plus measurementerror e, whichis distributed normally with zero mean and 0.001
variance

e NO, 0.001)
In(k) = In(ko) -E/T + e
Choosetrue values of the parameters to be
In(ko) = 1
E = 100

(a) Generatea set of experimentaldata for this problem. Estimate the parameters
from these data using least squares. Plot the data and the model fit using both
(T, k) and (1/ T, Ink) as the (x, y) axes.

(b) Calculatethe 95%confidenceintervals for your parameter estimates. Whatare


the coordinatesof the semimajor axes of the ellipse corresponding to the 95%
confidence interval?

(c) Whatare the coordinatesof the corners of the


box corresponding to the 95%
confidence interval?

(d) Plot your result by showingthe

parameter estimate, ellipse, and box. Are the

parameter estimates highly correlated?

Why or why not?

4, 10

Exercises

A
Exercise4.32: fourth moment of
the normal
established the following
Youhave
matrix
distribution
integral
result

involving
the second
m04me4ntl
(2Tr)/2
following
matrix result
(detP)l/2p
Establishthe
involving a
fourth
moment
xx Txx Texp xTp-1

X dx =
(27T)tt/2

00

Firstyou may

(detP)1/2
want to establish the
[2pp +
following result
for scalar
x

I x2

XPexp

2 T2 dx

0
p odd

2 51

p even

Exercise4.33: The X2and X densities

Let Xi, t = 1, 2, ... , n, be statistically independent,


normally distributed
zero mean and unit variance.
Consider the
random variable randomvariYto be the sum

(a) Find Y's probability density. This density is


known as the X2
degrees of freedom, and we say Y X;. Show
density
that the meanof this withn
densityis

(b) Repeat for the random variable


This density is known as the X density with n degreesof freedom,andwe
say

Exercise4.34: The t-distribution


Assumethat the random variables X and Y are statisticallyindependent,and X is
distributed as a normal with zero mean and unit varianceand Y is distributedas X2
withn degrees of freedom. Show that the density of randomvariablet definedas

is given by

pt(z;n) =

rC t +2
1) z

v/' r (t)

71+1

+1)

t-distribution(density)

its
This distribution is known as Student's t-distribution after
Student.
W.S. Gosset (Gosset, 1908), writing under the name

(4.90)

discoverer,the chemist

Variables, and Estimation


Random
Probability,
442
Exercise 4.35:

The F-distribution independently distributed as X2 with n and


are

Y
variables X and Define the random variable F as the ratio,
random
Given
respectively.
degrees of freedom,

Y/m
probability
Show that F's

density is

(zn)nmm

PF(z; n, m) =

tt+m
(zn + m)

Beta function
in which B is the complete
by

This density is known as

z 20 n,ml

(Abramowitz and Stegun, 1970, p. 258) defined

B(n,m) =
the F-distribution

Exercise 4.36: Relation between

(density).

t- and F-distributions

distributed as PF (z; 1, m) distribution with parameters


Giventhe random variable F is
transformation
n = 1 and m, consider the

distributed
Showthat the random variable T is

as a t-distribution with parameter m

= pt(z;m)

Exercise4.37:Independenceand conditional density

Consider two random variables A, B with joint density po(a, b), and well-defined
(alb). Show that A and B are statismarginals PA(a) and n (b) and conditional
ticallyindependent if and only if the conditional of A given B is independent of b
PAIB(alb) f (b)

Exercise4.38:Independentestimates of parameter and variance


(a) Show that O and 2 given in (4.48) and (4.49) are statistically independent.
(b) Are the random variables Oand y XOstatistically independent as well? Explain
why or why not.

Exercise4.39: Manysamples of the vector least-squares problem


We showed for the model

that the maximum-likelihoodestimate is given by (4.64)


(x TR -1 x) -l X TR -1 y

4.10

Exercises
result to solve the n-sample problem

Yp i

stack the

443
givenby

the

following
model

ep

sampleS in an enlarged
vector Y,
and define
the

Ytt

en

corresponding

et N(O,R)

covariance matrix R for


the new E
(a) What is the
measurement error
vector?
corresponding
the
is
formula

(b) what

for e in terms

(c) Whatis the probability density for this O?


(d) Does this

of for this
problem?

result agree with (4.65)? Discuss


why or why not.

Exercise4.40: Vector and matrix least-squares problems

old but good piece of software


A colleaguehas an
that solvesthe traditional
with
problem
constraints
on
vector
the
least-squares
parameters

e N(O,R)
in which y, O,e are vectors and A, R are matrices. If the constraints are not active,
the
produces the well-known solution

code

= (A T R -I A) -I ATR -1 y

(4.91)

Youwould like to use this code to solve your matrix modelproblem


in which Yi, Xi, ei are vectors,

is a matrix, i is the sample number, i = 1,

,n, and

you have n statistically independent samples. Your colleague suggests you stack your
problem into a vector and find the solution with the existing code. Soyou arrangeyour

measurementsas

E = tel

andyour model becomes the matrix equation


(4.92)

Probability, Random Variables, and Estimation

444

6
4

00

900

(Txy

-2
-4

-6
-8
1

00

Figure 4.16: Typical strain versus time data from a molecular dynamics simulation from data file rohit. dat on the website

www.che. wi sc. edutj braw/pri nci pl es.

Youlooked up the answer to your estimation problem when the constraints are not
active and find the formula
= yx T (XX

-1

(4.93)

You do not see how this answer can come from your colleague's code because the

answer in (4.91) obviously depends on R but your answer above clearly does not depend
on R. Let's get to the bottom of this apparent contradiction, and see if we can use Vector

least-squares codes to solve matrix least-squares problems.

(a) What vector equation do you obtain if you apply the vec operator to both sides
of the matrix model equation, (4.92)?
(b) What is the covariance of the vector vecE appearing in your answer above?

(c) Apply (4.91)to your result in (a) and obtain the estimate vec9.
(d) Apply the vec operator to the matrix solution, (4.93), and obtain another expression for vec.

(e) Compare your two results for vec. Are they identical or different? Explainany
differences. Does the parameter estimate depend on R? Explain why or why not.

Exercise 4.41: Estimating a material's storage and loss moduli from molecular simulation
Consider the following strain response mode15

xy(wt) = Gl sinwt + G2cos (0t

4.10

Exercises
445

12

10

8
6

0
2

o
o

o o

0.2

-0.2

0.4

0.6

0.8

Figure 4.17: Plot of y versus x from data


file errvbls.dat
website www.che.wi sc.
onthe
nci es.
in which xy is the strain, GI is the storage modulus,
and G2is the
positive scalars). we wish

to estimate GI
loss
and cr2from modulus(Gl
measurementsof
The strain "measurement" in this case actually comes
from
simulation. The simulation computes a noisy realizationof a moleculardynamics
for the given
material of interest. A representative simulation data set is
provided
in Figure4.16.
Thesedata are given in file rohit.dat on the website
www.che.wisc.eduhjbraw/
pri nci pl es so you can download them.
(a) Without knowing any details of the molecular dynamicssimulation,
suggesta
reasonable least-squares estimation procedure for Gl and G2.

and G2 are

Find the optimal estimates and 95% confidence intervals for your

estimationprocedure.

recommended

Plot your best-fit model as a smooth time function alongwith the data.
Are the confidence intervals approximate or exact in this case?Why?
(b) Examining the data shown in Figure 4.16, suggest an improvedestimationprocedure. What traditional least-squares assumption is violatedby these data?
How would you implement your improved procedure if you had accessto the

molecular dynamics simulation so you could generate as manyreplicate"measurements" as you would like at almost no cost.

Exercise4.42: Who has the error?

Youare fitting some n laboratory measurements to a linear model


i = 1,2,...,n
Yi = mxi + b + eyi

Probability,

Random Variables, and Estimation

variable
that the X

ill

high accuracy and the


is known with

told
havebeen error ey
youmeasurement

file errvbl
are given in
and
Figure 4.17
shown in braw/pri nci pl es.
are
as
Thedata wi sc.edu/-j
the model
che.
www.
assumptions,
these
(a) Given
and intercept
of the slope
estimate
find the best
variable has

confidence ellipse,
probability

s . dat on the website

and also the plus/minus bounds on

and the
estimates.
the parameter
these data.
of best fit to
line
the
data, and
(b) Plot the
you are told later that actually y is known

lab,
confusionin thevariablehas measurement error ex distributed as
x
(c) Dueto some
the
highaccuracyand
ex
in a transformed parameter vector (l)
so that it is linear
model
the
Transform
+ exi

= f(Yi, (PI, (2)


the transformed model?
Whatare f and ( for
write the model as
(d) Giventhese assumptions,

model. Add this line of best fit to the plot of


find the best estimate for thisfrom the previous model. Clearly label which
the data and the line of best fit

line corresponds to which model.

plus/minus bounds for 4).


(e) Compute the 95%confidence ellipse and

(f) Canyou tell from the estimates and the fitted lines which of these two proposed
modelsis more appropriate for these data? Discuss why or why not.

Exercise4.43:Independenceof transformed normals


Considern independent samples of a ar,

zero-mean normal random variable with

variance (' 2 arranged in a vector e =

enl

Con

so that
tneragain

Considerrandomvariablesx and y to be linear transformations of e, x = Ae and

an!

Etntio

Y = Be.

(a) Providenecessaryand sufficient conditions


y are independent.

for matrices A and B so that x and

(b) Giventhat the conditions on


A and B are satisfied, what
can you conclude about
x and y if e has variance 21n
but is not necessarily normally distributed.

and

ellipses

10 Exercises

ercise4.44:The multivariate

sume that the random variables


(a) show that the density of

-distribution

Xe

and

random

variable

ko are

statistically

defined
as

in

deendent

with m e RP a constant
is given
by

nn 02

Exercise4.45: Integrals of the


multivariate
t-distribution
Giventhe random variable t is distributed
as a multivariate and the F-statistic
t definedin
(), centered
4.44,
at

RP

showthat the value of b that gives


probability in the
multivariatet-distribution
is
b = pF(n,p,cx)
inwhichF(n, p, cx)is defined in (4.54).

Exercise4.46: Confidence interval for unknown


variance
Consider again O and 2 from (4.48) and (4.49) and
define the new random variable
Z
as the ratio
00

+ 00

nnp

in which 0 and (Y2 are statistically independent as shown in Exercise4.38.


(a) Show Z is distributed

as a multivariate t-distribution as defined in Exercise4.34.

are
(b) Showthat lines of constant probability of the multivariatet-distribution
ellipsesin O as in the normal distribution.

Probability, Random

Variables, and Estimation

448

using the multivariate t-distributionin


interval
tx-levelconfidence
and show that
an
distribution
Define
(c)
normal
TI Pn
place of the

xTx (- 00) n np
2

( 00)

in agreement

with (4.55).

distributed
two uniformly

random variables

Adding
random variables, X U[O,11 and
distributed
Exercise4.47:
uniformly
that the transformation from (X, Y) to Z
two independent,
X + Y. Note

Given
density for Z =
U[4, 5], find the transformation.
is not an invertible

variance normals
unit
two
of
Exercise4.48: Product scalar random variables distributed identically as N(o, 1).

independent
well defined for all z? If not, explain
Let X and Y be density for Z = xy. Is pz (z)

Find and plot the

why not.

integral in Fourier transforms of normals


useful
A
4.49:
Exercise
transform of the normal density
in taking the Fourier

integral
Derivethe definite

used

-a2x2 cos bxdx

on (00,00). We wish to
exponential version of the integral
the
first
consider
Hint:
show that
eibxdX
a2x2
(4.95)
e

which gives the integral of interest

as well as a second result

a2x2sinbxdx = 0

a*0

argument of the exponential and show that


Toproceed,completethe square on the
a2x 2 + ibx = a2 ( (x

Thenperform the integral by noticing that integrating the normal distribution gives

= Q7r

even when m' = im is complex valued instead of real valued. This last statement can
be establishedby a simple contour integration in the complex plane and noting that
the exponentialfunction is an entire function, i.e., has no singularities in the complex
plane.

Exercise4.50:Orthogonaltransformation of normal samples


LetvectorsXl,X2,...,Xn e RP be n independent samples of a normallydistributed

random variable with possibly different means but identical variance, Xi


Considerthe transformation
n

Yi =

Cijxj

N(mi, R).

4 10

Exercises

matrix C is orthogonal.
are independently
bich
the

that

d deduce the

Let

449
distributed

as Yi -v

relationship between X, Y, and


C given
in the

in which

problem

statement.
4.51: Estimated variance and the Wishart
, en e RP be n independent
distribution
el , '2,

vectors

randomvariable

samples of a
zero mean and identical
normally
variance,
distributed
N(o, R). Define
the matrix
eieT

distributionfor random matrix S is known as the

Wishartdistribution,

is known as the number of degrees of


andintegern components (R is a p
x p matrix) is freedom. Sometimesthe fact
p
have
the
also indicated
that
using the notation
consider the estimation problem of Section 4.7.4 written
in the form

(a) Showthat the

EET Wp(R,n).

(b) Define E = Y OX and show that nR = T

(c) Showthat EET W(R, n q), and therefore that n


Hint:take the SVDof the q x n matrix X for q < n

q).

DefineZ = EV, which can be partitioned as Zl Z21= E VI V21,and show


that EET = Z2Z2 . Work out the distribution of Z2Z2T
from the definitionof the
and
the
distribution
result
of Exercise4.50.
Wishart

Exercise4.52: Singular normal distribution as a delta sequence


Twogeneralized functions f ( ) and g ( ) are defined to be equal (in the sense of distri-

butions)if they produce the same integral for all test functions

Thespaceof test functions is defined to be the set of all smooth(nongeneralized)


for somec > 0.
functionsthat vanish outside of a compact set C =
Showthat the zero mean normal density n(x, U)
1

is equalto the delta function (5(x) in the limit

(x/)2

-+0.

Probability, Random

Variables, and Estimation

450

of the exponential
the Taylor series
for
bound
limit theorem for sums
Exercise4.53: Error used in establishing the central
Of

(4.31)
Derivethe bound
random variables

identically distributed

I-MIn + 1

at x = O and take magnitudes


with remainder term
series
Taylor
turns out to be an equality.
a
the inequality above
Hint: expand eix in
function,
particular
Note that for this

remainder term in Taylor series


the
for
bound
Exercise4.54: Error
Taylor series of a bounded function f having
for a second-order

Derive the bound (4.73)


derivatives
three continuous, bounded

f(x)
r(x,h) = f (x + h)

Kf min(h 2, lh13 )

sup Ir(x,

xeR
is valid for any b > 0
Showthat the followingKf

Kf = max

(4.96)

(i
with M/ ) = SUPxeRf ) (X).

to second order using the standard Taylor


Hints: first expand f (x + h) about f (x) bound. For the second-order bound,
first
lh13
theorem with remainder. This gives the
triangle
the
inequality.
use
and
Choose
h)
r(x,
of
definition
take absolute values of the
and lhl > b. Develop second-order
a constant b > 0 and consider two cases: lhl b
a second-order bound for all
obtain
to
them
bounds for both cases and then combine
by taking the smaller.
bounds
h. Finally, combine the second-order and third-order

Exercise 4.55: Lindeberg conditions


Show that the following are special cases of the Lindeberg conditions given in Assumption 4.15.

(a) The de Moivre-Laplacecentral limit theorem assumption that the Xi are independent and identically distributed with mean zero and variance 0 2.

(b) The Lyapunovcentral limit theorem assumption that there exists > 0 such
that as n 00
1

E f (I X k1

2+

Note that the Lyapunov assumption implies only part (b) of Assumption 4.15.

(c) The bounded random variable assumption, i.e., there exists B > 0 such that
IXil <

Therefore,by proving Theorem 4.16, we have also proved the de Moivre-Laplace


and the Lyapunovversions of the central limit theorem. We have also shown that the
central limit theorem holds for bounded random variables, provided that sn 00

4.10

Exercises

451
1

0.5

0.5
1

Figure 4.18: Smooth approximation

1.5
2

to a unit

stepfunction,
H(z-l).

Exercise4.56: Normal random variables


satisfy Lindeberg
- 1, 2, ...,n be independent
with mean
conditions
zero
l, 2, , n be independent

and variance
normals with mean
op. Let
zero
and
satisfy the Lindeberg conditions listed
in Assumptionvariance
Hint: using the
variables, show that
4.15, then soShowthat if the
for
do the Yi.
n sufficiently
vQ for all i. This result shows that
large
no
and
a significant fraction of the sum's variance as single random variablecan any O,
accountfor
condition for the Yi variables, and use n becomes large. Next
evaluate
the fact that
the Lin-

(maxiq)

Exercise4.57: Smoothing a step (indicator)


function

weconstruct a suitably smooth indicator function


as shown in Figure
4.15.To
thepresentation, first consider the set up in Figure
4.18. Weseek a monotonesimplify
f(z) with three continuous derivatives that increases
function
from
zero
at
z
=
0
to
then
rescale the z-axis to make this
one
z = 2. We shall
function as sharp as weplease. at

(a) Divide the interval in half and consider a fifth-order


polynomialon z e [0,l].
p(z) = ao + al Z + a2Z 2 + a3Z 3 + a4Z4 + a5Z5

To have p (z) and its first three derivatives vanish at z = 0, werequire


ao = al =

= a3 = 0. We will reflect this functionaboutthe y = 1/2 andz = I lines

to provide the matching function q(z) on z e [1, 2], or, in equations,q(z) =

p(2z) + 1. Note that the symmetry implies p( i) (1) =

so that all

odd derivatives are automatically continuous at z = 1, and the evenderivatives

are negatives of each other at z = 1. So we require that the even derivativesat

z = 1 are zero. We therefore have two conditions, p(l) = 1/2 and

find the remaining two coefficients

p(l) = a4 + a4 = 1/2

p" (1) =

Solvethese equations and show that a4 = 5/4 and

+ 20a5= o
3/4

= 0,to

Random Variables,
Probability,

and Estimation

452

is therefore
f(z)
function
candidate
(b) The
(3/4)z5,

1K z < 2

(5/4)Z4

f(z)

and its first


function
Plotthis

uous at Z
given by

Mf

also
these values
check
and

and
three derivatives,
(2) = 20/9

check that they are


M(f3 = 15

on your plots.

(w -x)/L)) = f(u,).
andf(z) =
(1-z/2)L+x
of Figure 4.15. Showthat
rescale.Letu, has the required properties
(c)
f(w) now
Thefunction bounds are scaled by
the derivative

M CI) =

exists Lo > O such that the


scaling with L, there
this
of
that because
(d) show finally
givenby
boundin (4.96)is
(4.97)
every L satisfying 0 < L < Lo
for
20LK- =
polynomial, with a smaller third derivative, see
seventh-order
For an even smoother,

Thomasian (1969, p.486).

PI-SRalgorithm
Exercise4.58: Properties of
described in Section 4.8, show the following properties.

Giventhe PLSRalgorithm
(a) T=XR
(b) TTT=1q

F2 for given Y and T.


(c) Q minimizes Illy TQ T11

Exercise 4.59: Using PCR and PLSR


WriteyourownPCRand PLSRalgorithm and apply it to the data given in Example 4.23.
The dataare availablein file pca-pl s-data. dat on the website www.che. wi sc.edu/

Nj braw/pri nci pl es.

(a) Reproducethe results given in Example 4.23.

(b) Estimateparameter B using both PCR and PLSR


using the number of principal
components/latent variables C = 1, 2, 3, 4, 5. Compare
your estimates to the
valuethat wasused to generate the
data.

000
000

Bibliography

and l. A. Stegun. Handbook


An Introduction to Multivariate
Anderson.
T.W.

National
John

Wiley
essay towards solving a problem
An
in the
T.Bayes.
Trans. Roy. soc., 53:370-418, 1763. Reprinted doctrineof
chances.Phil.
in Biometrika,
35:293-315

and
G.E.p. Box

E.A. Cornish.

G. C. Tiao. Bayesian Inference


in Statistical

The multivariate

t-distribution

Analysis.

associated with

Addison-

a set of

normal

and M. Sobel. A bivariate


C. W. Dunnett
generalizationof
distributiOnwith tables for certain special cases.
student'stBiometrika,
41.153,
1954.
Probability: Theory and
R.Durrett.

Examples. Cambridge
UniversityPress,

W.Feller.tjber den zentralen Grenzwertsatz der


Warscheinlichkeitsrechnung.
p. Geladi and B. R. Kowalski. Partial least-squares regression:
A tutorial. Anal.
185:1-17,

china.Acta,

1986.

W.S.Gosset. The probable error of a mean. Biometrika,6:1-25,


1908.
M.H. Kaspar and W. H. Ray. Partial least squares modellingas
successive
value
singular

decompositions.

Comput. Chem. Eng., 17(10):985-989,


1993.

A. N. Kolmogorov. Foundations

of Probability. Chelsea Publishing Company,

NewYork, 1950. Translation of "Grundbegriffeder Wahrscheinlichkeitrechnung,Ergebnisse der Mathematik," 1933.


L.Le Cam. The central limit theorem around 1935. Statist.Sci.,1(1):78-96,
1986.

P.Lvy.Proprits asymptotiques des sommes de variables indpendanteson


enchaines. J. Math. Pures Appl., pages 347-402, 1935,

453

Bibliography
454
J. W.

Lindeberg.

Eine neue

Exponentialgesetzes in der
des
Herleitung
1922.
Math. Z,

15:211-225,

Multivariate
statistical
Skagerberg.
B.
and
and W. H. Ray, editors
J. Kresta,control. In Y. Arkun
Marlin,
T. F.
and

analysis
CACHE, 1991.
process
in
Control-CPCIV.
methods

J. F.

MacGregor,

world data: a personal


modeliing of real
relevant regression. Chemom. Intell. Lab. syst.,
and
Reliable
of PLS
development
H. Martens.
account of the
component
Chemical Process

and partial
Principal
package:
The pls
2007.
R. Wehrens. J. stat. Softw., 18:1-24,
and
B.-H.
regresion in R.
squares
least
and Stochastic Processes. McGrawVariables,
Probability, Random
Papoulis.
A.
edition, 1984.
second
Inc.,
1935. Statist. sci.,
Hill,
limit theorem around
central
The
on:
D. Pollard. Comment
1986.

Grenzwertsatz der Warscheinlichkeitsrechnung


zentralen
1920.
G.poly. tjber den
Math. Z, 8:171-180,
Momentproblem.
und das
modeling. Comput. Chem.
algorithms for adaptive data
PLS
Recursive
S.J. Qin.
1998.

Eng.,

Tables. CRC
S.M.Selby.CRCStandardMathematical

Press, twenty-first edition,

1973.

6:1-25, 1908.
Student. The probable error of a mean. Biometrika,
A.J. Thomasian. Thestructure of probability theory with applications. McGrawHill, 1969.

A. W.van der Vaart. Asymptotic Statistics. Cambridge University Press, 1998.

J. Wishart.Thegeneralisedproduct moment distribution in samples from a


normalmultivariatepopulation. Biometrika, 20A:32-52, 1928.
H.Wold.Estimationof principal components and related models by iterative
least squares. In P. R. Krishnaiah, editor, Multivariate Analysis, pages 391420. Academic Press, 1966.

S. Wold. Personalmemories of the


early PLS development. Chernom. Intell.
Lab. syst.,
58.83-84,2001.

stochastic Models and Processes

5.1 Introduction

now expert in using (deterministic)


Weare by equations as
models of chemical differential and partial
differential
and biological
capture equations of motion,
equations
systems.
These
conservation
of mass
and energy, and many Of the fundamental principles
useful in analyof chemically reacting systems.
sis and design
Chapters
2 and
mainlydevoted to developing this program. The motivationfor 3 were
stochasprocesses and differential equations is to
tic

incorporate

into the model


the random effects of the internal system (discretemolecules)and
the
externalenvironment on the system of interest. In someapplications
at fine length scales, the random effects are mainly due to the internal
random behavior of the molecules. But even in applicationsat large
scales,the random effects of the external environmentare often quite
important to understand and interpret the (noisy)measurementscomingfrom a system.
In this chapter, we illustrate the usefulness of random variablesand
randomprocesses in the modeling and analysis of systems of interest

to chemical and biological engineers. We find the basic probability and

statistics that we covered in Chapter 4 indispensable tools in carrying


out this program. We study three main examples: (i)the Wienerprocessas a model of diffusion in transport phenomena,(ii)the Poisson
process as a model of chemical reactions and kinetics at the small scale,

and (iii)the Kalman filter for reducing the effects of noise in process
measurements, a fundamental task in systems engineering.Bycovering
representative examples from transport phenomena,chemicalkinetics,
and systems engineering, we hope to both introducerandommodels
and processes, as well as demonstrate their wide range of applicability
in modern chemical and biological engineering.
455

Stochastic

456

5.2

Stochastic
ables

Processes for

Models

and

Processes

Continuous Random

Processes
Stochastic
Time
52.1 Discrete
chapter is an understanding of the struc_
the
of
this partcontinuous time stochastic processes: the stochas_
in
target
our
dymamicsof
differential equations. In building up to
ture and of deterministic
simpler stochastic
with the conceptually
start
tic analogs
to
example
instructive
these,it is equation. Consider the following
difference
c(k)
(5.1)
+ 1) = Ax(k) +
number in discrete time, is a random
sample
the
120is
in whichk e
some fixed and known probability density,

Gil'

to have
variable,assumed . .. are independent, identically distributed sam= 0, 1,2,
interval At, then t kAt. Because of
and
sampling
a
define
ples of E. If we
variable the variable x is also a random
random
the
of
call it a continuous
the influence
can take any value, so we

variable.In generalit
to the integer-valuedor discrete random
contrast
in
variable
random
Section 5.3.

variables we encounter in

statistical properties of the process x(k) due


Wewish to study the
Because the process is linear, an explicit
to the random disturbance

(5.2)

Thereis no difficultyexpressing the solution to the stochastic differenceequation;in fact we cannot determine by looking at the form of

lyits

probability den

the solution if E(k) is a random variable or simply a deterministic func-

tion of time. This is the perfect place to start because everythingis


welldefinedregardlessof whether or not is a random variable. We
buildsome simpleintuition with stochastic difference equations and
thenproceedto continuoustime systems. We shall also see that differenceequations arise whenever we wish to numerically approximate
the solutionto stochastic differential equations, so some facility with
the differenceequationsis highly useful.
TheINTEGRATED
WHITE-NOISE
process provides a starting point for

understanding
manyimportant aspects of stochastic processes. Considera systemwith scalarx,
=
A = 1, zero initial condition and

zy,fweletG
=

x(t)N
p(x,t)

for Continuous Random Variables


processes
were

wiS

to

tile

find

are independent and unit normals w

x(0) = 0
+ Gw(k)
(5.3)
+ 1) = x(k)
density of x(k) versus time for this
the probability

x(l)
p we sequence
w(k)

tile

N(O,1)

x(0) + Gw(0) = Gw(O), so x(l) N(0, G2). Since


x, we have for k = 2
is in dependent of

x(l) + Gw(l)

w(l)

that
Noting

J w(l)
the linear transformation of a normal we
Theorem 4.12 on
and using

have that

NCO, 2G 2 )

x(2)

this process gives


Continuing
x(k)

N(O, kG 2)

the variance of x (k) increases linearly with time and


andwehave that zero for the integrated white-noiseprocess. If we
themeanremains
chooseG = At, then x(k) N(O,kAt) and the system satisfies

its probability density p (x) satisfies


orequivalently
1

I x2

2TTt

Similarly,
if we let G = 2DAt where D is a constant, then

x(t)

N(O,2Dt)

or

p(x, t) =

2 TTDt
Thisis precisely
(3.70) from Chapter 3, which describes the transient
spreadby diffusion
of a delta-function initial condition. Thus we see

Stochastic

sign
first
the
already

of what

turns

the mean

a deep
be
to
out

square

then
variable,
position
(x2(t))
processes,
diffusion
For

Models and Processes

and important

by
displacement is given

square
the mean

displacement increases lin-

the case where the random


to
be extended
can write =
can
we
which
above
Mm, 1),
The analysis mean:
nonzero
term has
above. Now
as
w defined

x(k+

Gm + Gw(k)
+
x(k)
1) =

this becomes
Gm/At
v
Defining

vAt + Gw(k)
+
x(k)
=
x(k+ 1)

position, then the particle travels or


particle
interpret x as a
we
if
as well as diffusing. Letting
Again,
interval
time
vAt in one
"drifts" a distance

G=a

a velocity v
The particle drifts with
early sNithtime, while

so its mean position changes Iin-

also diffusing.

where is drawn from an arbitrary


case
the
to
return
Finally,we
A = 1 and x(0) = o, (5.2)
With
normal.
a
than
distributionrather

becomes

That is, the solution becomes a sum of independent identically distributed (Ill))random variables. In Section 4.5 we learned the remarkable fact that sums of IID random variables converge to a normal distribution. Thus as k -+ 00,x(k) becomes normally distributed even if the

noise that drives it is not. So, for example, if we can only observe the
processx(t) at time intervals that are infrequent compared to At, it
willbe Srtuallyimpossible to know whether the underlying noise was
Gaussianor nottheresulting process x(k) will be. This result is one
reasonwhy, in the absence of further
information, taking the noise in
a systemto be normallydistributed
is often a good approximation.

stochastic processes

for Continuous
RandomVariables

459

Process and Brownian Motion


5.2.2 Wiener
Wenowwish to define the continuous time version of the
discrete time
ffltegratedwhite noise or Brownianmotion just
presented. This proW(t) , is known as a Wiener
cess, denoted
process in honor of
ematicianNorbert Wiener. The property that we retain the mathin taking the
0 is that W(t) is normally
limitas At
distributedwith zero mean
linearly increasing variance or
and

Byanalogywith the results above, a diffusion process x(t)


with diffusivityD and x(0) 0 would simply be

x(t) = N/w(t)

(5.4)

Note that the linear increase in variance with time should


hold for any
starting time s, giving

s0, ts

(5.5)

The increment of the Wienerprocess is denoted


AW(t -s) = W(t) - IV(s)

Consideringdistinct time instants ti, with ti > ti-l, we defineAtt


ti-l and AW(ti)

W(ti) W(ti-l). Increments involving non-

overlapping time intervals are independent. The Wienerincrements


have a number of important properties that followfrom their definitions
(5.6)

(AW(ti)

= B 2AtiiJ

(5.7)

= 0 for n odd

(5.8)

ocAtim for integer m

(5.9)

In Theorem 4.12 we saw that the distribution of a sum of normally


distributed random variables is also normally distributed. A number of
important results for Wienerprocesses follow from this fact. A Wiener
process can be written as a sum of N Wienerincrements for any N
(5.10)

Stochastic Models and Processes


460

>
on ti is that ti ti-l. Accordingly
restriction
only
canbe written as a sum of Wiener
where tN = t and the
process
motion)
a diffusion (Brownian
by 21)
increments multiplied

- to) = E dAW(ti)

Furthermore,

(5.11)

processes WI, W2,W3


Wiener
for separate
2(DI +

(5.12)

diffusion processes is equivalent to a


two
of
sum
the
words,
In other
diffusivity is the sum of the first two.
different diffusion process whose

motion process x(t), we


To visualizea trajectoryof a Brownian
constant time intervals At.
can use (5.11),generatingpoints x(t) at
equivalent to evaluating the
Observing that now AW N(O,At), this is
discrete time process

x((k+ 1)At)=
with w(k)

2DAt w(k)

x(0) = 0

N(O,1) defined as above. Figure 5.1 shows a trajectory of

5
this process for sampletime At = 10-6 and diffusivityD = 5 x 10
Noticethat the roughnessis quite apparent in the top row of Figure
5.1. But by looking at finer time scales, we can see the effect of the
finite step size in the discrete time approximation. The continuous

time Wiener process defined in (5.5)maintains its roughness at all time


scales; Figure 5.2 shows how the path should appear between the samples if we chose the step size properly for this magnification. Unlike
more familiar functions, the Wiener process is very irregular. Thus it

is important to address its continuityand smoothness properties.


The Wiener process is continuous. A crude argument for this state-

ment is that IAWI at, which approacheszero as At -+ 0. A more


refined one is presented in Exercise 5.4. On the other hand, because of
the Atl/2 behavior of AW,we arrive at the perhaps surprising fact that

5.2 Stochastic processes for continuous

Random
Variables

461

400

-200

-800
-400
-1200

0.5

0.45

-220

0.55

-240

-240
-280
W(t)

-260
-320

-280
-360
0.495

0.5

0.505

0.4995

-220

0.5

0.5005

-228
-230

-232

-240

-250
0.49995

0.50000

0.50005

-236
0.499995

0.5

o.soooos

Figure 5.1: A simulation of the Wiener processwith fixed sample


time At = 10-6 and D = 5 x 105. The boxedregion
in each figure is expanded in the next plot to displaya
decreasing time scale of interest. The true Wienerprocess is rough at all time scales and thereforedW(t) Idt
does not exist. The top row shows an adequatesampling
rate to display the roughness of the Wienerprocess. The
middle row shows the time scale of intereststartingto
become too small for the given sampletime. The bottom row shows a time scale of interest muchtoo small
for the given sample time; one can see the samples and

the straight lines drawnbetweenthem.

Stochastic Modelsand
462

Processes

-228

-232

-236
0.499995

0.5

0.500005

Figure 5.2: Sampling faster on the last plot in Figure 5.1; the sample time is decreased to At = 10-9 and the roughness is
restored on this time scale. Thought question: how did
we generatea random walk that passes exactly through
the solid sample points taken from Figure 5.1? Hint: certainly not by trial and error! Such a process is called a
Brownian bridge (Bhattacharya and Waymire, 2009).

the Wienerprocess is not differentiable l


1

At

= f (IAwl)
At
1

At 27TAt

IXIexp

2At/TT

At
7Tv/
This diverges as At -1 / 2 as At -Y 0.
1The results of Exercise 5.8 were applied
in this derivation.

x2

2At

dx

5.2 Stochastic Processes for

Continuous

Random
variables

463
Now let us return for the
moment to
white-noise process, (5.3).
Considering a the discretetime
sampling
integrated
interval At
and let, We Can
rewrite this

Ax = BAW
Under other circumstances
we could
divide by At

and let it

NQdVv

dt

shrink to

dt
we have just found, however,
that
less, we can define a differential of dW/dt does not exist.
the Wiener
+ Ats) - W(t)
increment
process Neverthe-

dW(t) =

when At becomes

as the Wiener
the infinitesimal
dt

This is also known as the white-noise


process. It is not
Nowwe can write (5.13) in
differential form

continuous.

(5.14)
This is the most elementary STOCHASTIC
Withinitial condition x(0) = 0, its solution DIFFERENTIAL
EQUATION.
is (5.4).
5.23 Stochastic Differential Equations

Basic ideas
To motivate and introduce stochastic differential
equations,consider
first the deterministic differential equation

dx
dt

(5.15)

When we wish to augment this model to include some random effects,


one might try

dx
dt

in which n (t) is a random variable, often a normallydistributed,zero


mean random variable, as discussed in Chapter 4.
We have already run into problems with this formulation. Evento
model a "well-behaved" (e.g., continuous) stochastic process like diffusion, we have seen that the random term would have to take on the
form
dW

Stochastic Modelsand Processes


464

Extending what we did abovefor


dB'does not exist.
differentials instead of derivatives
consider
we thus
equation (SDE)in the form
Brownianmotion, stochastic differential
general
and vsTitea
dW

dt + B(x, t)
t)
A(x,
=
dx

integrate
Formally,we can

x(t) = x(0) + J

(5.16)

this to yield

t') dW(t')

t') dt' + J

(5.17)

would be as well if dw
second
dt existed
The
classical.
The first integral is
just write that
would
case
which
in
dW

B(x(t'), t') dW(t') =

dt'
'B(x(t'), t')
dt

and to understand it we need to understand


nontrivial
is
integral
This
of stochastic processes.
a little bit about the calculus
ElementaryStochastic Calculus
Stochasticintegrals of the form
to

are more complex than conventional integrals

because both G and dw

can vary stochastically(think of the case G(t) = W(t)). Nevertheless,


as with conventionalintegrals, we can divide the interval [to, t] into n
subintervalsto tl t2... tn-l t, and choose intermediate time
points Ti such that ti-l Ti ti. Now the integral S is approximated

by the sum

sn = E

- W(ti-l))

In normal calculusthis sum converges to the same value independent


of the choiceof the Ti; in stochastic calculus this is not the case. We

will choose Ti = ti-l, yielding the IT STOCHASTICINTEGRAL


2. Thus

(5.16)is an It stochastic differential equation.

20ther choices are used in various situationsfor example the STRATONOVICH


stochastic integral takes Ti = (ti-l + ti)/2. Stochastic calculus is complex and technical; Gardiner (1990)provides a detailed discussion that is accessible to the non-

mathematician.

for Continuous Random


Stochastic processes
Variables
5.2

465

stochastic integral corresponds to a stochastic


"rectangle
with the function value chosen at the left side of the subinter-

rule, one practical reason for this choiceis that is the onemost
applied in

numerical solutions of stochastic


differenThe EULER-MARUYAMA
scheme
generalizes
the
explicit
equations.

val.
straigY1tforWardly

to the stochastic case, using this rectanglerule approxiEulermethod


mation

+ St) = x(t) +

(t + At)
SW
where

+ At)

N (O,At). This is the standard method for finding

trajectories ofit is not highly accurate, but higher-order schemes for


sINULATION;
very complex to implement (K\oedenand Platen, 1992).
are
spEs
reason for working with the integra\is that,
more fundamental
to (3.17), it corresponds to a noise term that does not
applied
when
mean of x (t) , because its expected value is zero
the
change
to

G(t') dW(t')

taking the expected value of the discretesum and


by
seen
easily
are
W

the Ito integral, G(Ti)and (W(to


for
that
fact
using the
independent

Tilisis

makes
This
O.
=
because 'E (W(ti)matters. if were
the choice of Ti(ti-l)) would notbe independent and
and (W(ti) - W
necessarilybe zero.
of the form
integrals
By considering

and using the Ito


t
to

one can
expression for Sn

show

466

Stochastic Models
and

Processes

This result tells us how to treat higher differentials


involving
dt in general

and so on. If dWi and dWj are different white-noise


processes,e.g.
corresponding to different components of a vector of such
processes

dWidWj ijdt

Unlike in regular calculus, in working with differentials of W (5.20)


keep terms up to dw2. To understand why, simply recall that , onemust

We can use the above observations about stochastic differentials


derive the IT STOCHASTIC
CHAINRULE. Let F be a function of t to
and
W(t). Then
F 1 2F
dF(t, W(t)) =
dt +
dW(t)
t 2 W2
W

For example, if we let F x(t, W(t))


to) +
W
where and B are constants, then application of the chainrule(to)),
gives
us back the constant coefficientSDEdx 31 dt + 'BdW.
Now consider a function f(x(t)), where x(t) evolvesaccordingto
(5.16). The differential of f can be written

df(x(t))

+ dt) f(x(t)))
f'(x(t))dx(t) + f"(x(t))(dx(t)) 2
=

dt + B dW) + f"(x(t))(A dt + 'BdW)2

Noting that dt 2 = 0 and dW2 dt, we have ITO'sFORMULA

(Af' + A 2f") dt + Bf' dw

(5.21)

Example 5.1: Diffusion on a plane in Cartesian and polar coordinate

systems

We can write two-dimensional Brownian motion in Cartesian coordi-

nates as

dx = B dvvx
dy = B dWy

(5.22)
(5.23)

5.2 Stochastic Proces

ses for

where Wx and W
How would we write
As a brief prelude,

Continuous
Random

the same

observe

ariables
467

Process
in

that for
a Particle

starting

(2 (vv2)

and1B

at the

origin,
the'

4Dt

This result easily


extends to
Brownian

Returning to the
specific

motion in

any number

d of diquestion at
ordinate first and keep
in mind
hand, consider
that we
the
may need
to keep radialcoterms up to
2
1 r
ax +
ydy +
x
I 2
2 x2 dx2 + r
+ I 2r

b3dy2
Here all the partials can be
evaluatedfrom
the formulas
Now using the
r = x2 + Y2
SDEsand noting
B2dt, dxdy
0, we have that
that dx 2 = dy 2 =
dr = cos 0B dWx +
sin 0B

dWy + B2 dt

Now, using (5.12) we see that cose


dWx +
process with variance dt. We will denote sin 0B dWy is a diffusion
this process as dWr,
so
B2

+ B dwr

(5.24)

Consider a particle that starts at r = 0. Applying


It'sformula
with
f = r 2 and taking the expected valuewefindthat
= 2B2 dt

as we
Letting B2 = 21) we find that (r 2) = 4Dt in twodimensions,
should.

Nowwe turn to the equationfor O

+ x2
dx + dy
y
x

1 2e
dxdy

2 xy

1 2e
---dy2
+ y2

Stochastic Modelsand Processes


468

drift
cannot be any

termpositive and negative

there
Evaluating derivatives we find
symmetry
likely.
By
must be equally

changesin e

Using(5.12)again

de = (ydWx+ xdWy)
-ydWx + xdWy with r
replace
we can
(5.25)

from sampling
properties
Average
Example5.2:
property of the model rather
"average"
an
in
Often we are interested
stochastic equation. Consider again
the
of
realization
than a single
diffusion process on the plane, (5.22)the
of
model
walk
the random
and compute an estimate of the mean
(5.23). Simulatethe process
time.
square displacementversus
Solution

simulation with the discrete process


Weapproximatethis process for
(5.26)
X(k+l) = X(k) + VAt+ 2DAtP
whereX = (x, y)T, k is the sample number, At is the sample time, and
time is t = kAt. The velocityof the particles is V = (vx, vy)T and the
random two-vectorP is the two-dimensional normal distribution with
zero mean and covarianceequal to a 2 x 2 identity matrix

This choiceprovidesuncorrelatedsteps in the x and y directions. In


the ensuing discussionwe choose At
1 so k t. We also take
0 here so there is no drift, only diffusion. A representavx

tive simulation of (5.26)is given in Figure 5.3.


Wecan approximate average properties by simulating many trajectories or equivalentlymany independent particles, and then taking the
average.Let Xi(k) be the position of the ith particle at sample time k,
which follows the evolution
Xi(k+ 1) =

(5.27)

5.2 Stochastic Processes for

continuous

Random
variables

469

20

-20

-40
-60
-80

-100
-120
140

-160

-80 -60 -40 -20 0

20 40 60 80 100

Figure 5.3: A representative trajectory of the discretelysampled


Brownian motion;D = 2, V = 0, n = 500.

12000

8000

4000

200

mean square
The
5.4:
Figure
500.

--

600

400

1000

800

time k
D
versustime;
displacement

2, V

Stochastic Modelsand Processes


470

The squared

displacement

and the mean

square

particles

is given by
particle
of the ith
(5.28)

given by
is
displacement
(k)

(r2) (k)

the average over many

n large

for the random walk


displacement
mean square
the
coefficient. We use n = 500
shows
diffusion
Figure5.4
the
square displacement
D = 2 for
no drift andsimulation.
Noticethat the mean
Einstein's analysis
particlesfor this time. The simulation agrees with
1990, pp.3-5) as well
growslinearlywith 1905). see also (Gardiner,
(Einstein,
of diffusion
as our analyses

above

(5.29)

(r(k) = 4Dk

Equation
5.2.4 Fokker-Planck
about solving an SDE.We can find particuthink
to
ways
two
There are
the Euler-Maruyama scheme above will do.
lar trajectoriesthisis what
probability density p (x, t).

evolution of the
We can also consider the
white-noise and Wiener processes, we obIn consideringthe integrated
evolution of p (x, t) and the diffuservedthe connectionbetween the
the solution to dx = dW. Because
sion equation. The Wienerprocess is
density p (x, t) for a trajectory
its trajectoriesx(t) x(0) N(O,t), the
equation
starting at x = xo is a solution to the transient diffusion
p
t

2p
X2

p(0, t) =

xo)

(5.30)

withD 1 To generalizethis result, consider the time evolution


of the expectedvalue of an arbitraryfunction f(x(t)), where x(t)
evolvesaccordingto the It SDE(5.16). UsingIt's formula and the

result (Bf' dW) = 0, which is the infinitesimal version of (5.19)

= ((Af' + -B 2f") dt + Bf' dw


= ((Af' + -B 2f") dt

5.2 Stochastic processes for Continuous

Random
variables

rewritten as
This can be

471

Rearrangingand integrating by parts yields

f(x) p(x,t)
t dx

(x Gap (x, t))

2 x 2

dx

Finally, since f is arbitrary, this result can only


hold in generalif

p(x, t)

2 1

x2 2

(5.31)

This is the evolution equation for p(x, t), often calledthe


FOKKEREQUATION(FPE).For a trajectory
PLANCK
starting at x = xo, the
this
equation
is
for
again
p (0, t) = (x xo). The initial
condition
equation
put into conservation form

canbe

p(x,

t)

A(x, t)p(x,t) -

(5.32)

The term inside the curly brackets is the flux of probabilitydensity


and this equation bears obvious similarities to equationswe are famil-

iar with from transport phenomena. It showsus that trajectoriesof


an It SDEhave a drift coefficientA(x, t) and a diffusioncoefficient
)'B2(x, t). This is sometimes called the "short-time"diffu'D(x, t)
sivity, because one can show using It's formula (Exercise5.5)that for

particle at position x' at time t'

2
d(x x')

dt

t=t'

(5.33)

of the trajectoryis (as in the


Similarly, the instantaneous drift velocity

deterministic case)

d(x x')
dt
integrate to unity
The probability density must

p(x,t) dx = 1

(5.34)

(5.35)

Stochastic Models and

Processes

472

is

the curly brackets is


expression inside

there is an important
A = v, but
differs
with
The
the FPE,
for
gousto that
not equivalent to the (gradient) diffus
is
it
position,
transport equation. Exercise 5.2
varies
the
FPE
the
appears in
detail.
D that
in further
differences
analysis to an Il-vector random process
the
these
generalize
... , n. The SDE and FPEs for this
Wealso can
Xi, L -- 1, 2,
in
first term

X, with

the flux

components

are

dXi = Ai(x,

t)dt + Bij(x, t) dWj


2

Xixj
t

of all components
function
a
is
Herep

(Dij(x, t)p)

(5.36)

(5.37)

Xi and time, p = p (Xl,... ,xn, t),

the elements of the diffusion coefficientmaare


LBikBjk
and Dij = 2
of (5.37)from (5.36) makes use of the MULTIDI.
derivation
The
trix

FORMULA
MENSIONALIT

+ BLkBJk
df(x)= At
Xixj

ijdwj

(5.38)

conserved
As in the scalar case, probability is

...,xn) dX1dX2 ... dxn = 1

(5.39)

In vector/matrixnotation the equations are written

dx = A x(t)dt + B(x, t) dw

(5.40)
(5.41)

with
(5.42)

Thisresultindicatesthat D
is symmetric positive semidefinite.For
numerical
integration of multidimensional

scheme extends

straightforwardly.

SDEs, the Euler-Maruyama

5.2 stochastic Processes for continuous

Random
variables

473
Example 5.3: Transport of many
particles
large number of particles, each
suspended
obeying the
in a fluid
equation
dx
are

vdt + N/bdw

in a fluid. How do
we

describe the

Solution
The probability density for an individual
For the many-particle

evolutionof
the

particle

vpx + Dpxx

concen-

evolvesas

system, we define
an n-particle
joint density
func-

... , xn, t)dx1dx2


dxn
probability density
that particles
through n are
1
located at
respectively, at time

through xn,

The concentration of particles at x, c (x,


t), is then
n

xj)fldxt
i=l

(5.44)

The jth term in the sum represents the probability


is located at x at time t, and the sum over all that the jth particle
particles
concentration. If the particle motions are independent givesthe total
n

p(xl,...
Performingthe integral in (5.44)gives

c(x,t) =

which indicates that the linear superposition of each particle'sprobability of being at location x produces the total concentrationat x. If
the particles are identical, pj(x, t) = p(x, t), j = 1,... ,n, this reduces
to

c(x, t) = np(x, t)

gab

Iti

Stochastic Modelsand

474

The evolutionequation for c is

(5,

Processes

36) P
0

therefore

vcx(x,t) + Dcxx(x, t)
The conclusionis that the concentration profile created by many
interacting, identical particles obeys the same evolution equation non.
probabilitydensity of a single particle. Averaging the behavior as the
particles does not "average away" the diffusion term in the of
evolution
equationof the total concentration c(x, t). See Deen (1998,pp.
59-63)
Example 5.4: Fokker-Planck equations for diffusion on a
plane
Example5.1 introduced the stochastic differential equations
for diffu_
sion on a plane in Cartesian and polar coordinate representations.
For
the Cartesian representation, (5.22) and (5.23) have
probability density

' eld

ser

not

factoj

x2

e dic

xY

with normalization (conservation of probability)

on the

e ands

condition

- r,X2 message

p(x,y) dx dy = 1
If we rewrite this equation in polar

only a
here,

we wish
Finally,

fac
by the
Motivated

coordinates we get

guess that
wemight

p (r, O)

and

rr(

Do we get the same result

not?

(5.45)

substit
g this

cc
equationin polar

2n 00
o

of the stochastic

r2

pi

p(r,

dr dO = 1

if we start with the

polar coordinateform
differential equations,
(5.24) and (5.25)? Why or
why

and

Solution

Equations(5.24)and
(5.25)can be

written as the system

B O
dt +
O

dwr
dwe

Stochast
53.1

Introduc

Our
next
netics applic
takin

5.3 Stochastic

Kinetics
475

with regard to (5.36) and (5.37),Xl r,X2


= Oand

1 BOB

Inserting these expressions into (5.37)and


denotingthe

probability

pp(r, O)
2
t
r2
Y2 02 P P (r, O) (5.46)
Thisis not the transient diffusion equationin polarcoordinates.
We begin to understand this difference by writing the
normalization
condition, (5.39)
Pp(r, O)dr de
This differs by a factor of r in the integrand from the conventionalarea
integral in polar coordinates. The reason is simple: in goingfromthe
SDEto the FPE,we did not tell It's formula about the geometryof area

elements on the plane, but only to take an SDEwrittenwithvariables


= e and write the correspondingFPE.Thereis no paradox
Xl = r,
here, only a message to be careful about coordinatetransformations.
Finally, we wish to understand the relationshipbetweenp and pp.

Motivated by the factor of r difference in the normalizationconditions,


we might guess that pp (r, O) = crp (r, O)where c is a constant. Indeed,
making this substitution into (5.46), we recover the transient diffusion
the origin
equation in polar coordinates, (5.45). For a process starting at

(Exercise5.9)are
at t = 0, the normalized solutions
1

4TTDt

and

_r2 / (4Dt)

pp(r, O,t) =

5.3 Stochastic Kinetics

and Time
Length
and
Introduction,
53.1

Scales

chemicalkiand
networks
is reaction
start with a
we
First
Our next application of interest
numbers of molecules.
small
at
place
netics taking

Stochastic Modelsand

476

Processes

define some useful nomenclature.


continuum kinetics example to
reaction
sider the followingtwo-step series
k2

c = c A CB CC
We define the species vector of concentrations
,and
network with

denote the stoichiometryfor the reaction


metric matrix

the stoichio_

-1

Weletvt, i 1, 2, , nr denote the rows of the stoichiometric matrix


written as column vectors

-1

-1

We assume the reaction takes place in a well-mixed reactor and aSSume


some rate law for the reaction kinetics, such as
= klCA

IQ = k2CB

As taught in every undergraduate chemical engineering curriculum, the

material balances for the three species is then given by

C = VTr (c) =

dt

ViYi(c)

(5.47)

The solution of this model with a pure reactant A initial conditionis


shown in Figure 5.5.
Next we consider reactions taking place at small concentrations. Instead of the common case in which we have on the order of Avogadro's
number of reacting molecules, assume we have only tens or hundreds
of molecules moving randomly in a constant-volume, well-mixed,reactor. At such low concentrations, the deterministic concentration assumption makes no sense, and we have to consider the random behavior of the molecules. But we still have to choose an appropriate length
and time scale of interest. Indeed, if we move down to the length scale

of the atoms, we can model the electron bonds deforming continuously in time from reactants through transition states to products. We

choose instead a larger time and length scale so that each reaction that

5.3 Stochastic Kinetics


477
1

0.8

CA(t)

0.6

cc(t)

CB(t)

0.4
0.2
0

Figure 5.5: Two first-order reactions


in
= 1, CBO= CC()= O,kl = series in a batch reactor,
2,
1

takes place can be regarded as a single instantaneous


eventcausinga
discrete change in the number of reactants and
products. At this scale,
we track the integer-valued numbers of reactant and
products, and we
treat the reaction events as random jump processes.
This
length and time scale makes the discrete Poissonprocess choiceof
the natural
description for stochastic kinetics.

53.2 Poisson Process


Just as the Wiener process W(t) is the simplestmathematicalprocess
appropriate for modeling diffusion, the Poisson process Y(t) is the sim-

plest mathematical process appropriate for modelingstochasticchemical kinetics. The POISSONPROCESS


is an integer-valued counting process. Time is modeled as a continuous variable,but the value of the
Poisson process is discrete. The Poisson process is characterizedby a
rate parameter, > 0, and for small time intervalAt, the probability
of an event taking place in this time interval is proportionalto At.To
start off, we assume that parameter is constant. The probabilitythat
an event does not take place in the interval [0, Atl is thereforeapproximately 1 At. Let random variable T be the time of the first event of

Stochastic Modelsand Processes


478

process
the Poisson

have for small


t = O.we then
from
starting

Pr(T > At)


process has independent
Poisson
process, the number of events in disjoint time inter.
Wiener
Likethe
that the
increment assumption coupled
means
independent
ments,which
implies that the probability that
independent. The
change
are
not
vals
does
two consecutive time intervals [O,
the fact that
in
place
not take
does
Continuing this argument to n intervals
event
an
At)2.
(1 --is Pr(T > 2At)
gives for t = nAt
z (1 At)t/At
t)
Pr(T >
At
Takingthe limit as

0 gives

At
Pr(T > t) = e
of T's probability disaxioms and the definition

From the probability


tribution, we than have
Pr(T

t) = FT(t) =

gives the
Differentiatingto obtain the density
PT(t)

At

exponential density

At

(5.48)

chemical and bioThe exponentialdistributionshould be familiar to


logicalengineersbecause of the residence-time distribution of a wellmixedtank. The residence-timedistribution of the CSTRwith volume
V and volumetric flowrate Q satisfies (5.48) with
rate or inverse mean residence time, = V/Q.

being the dilution

Figure5.6 showsa simulationof the unit Poisson process, i.e.,the

Poissonprocess with = 1. If we count many events, the sample path


looks like the top of Figure 5.7, which resembles a "bumpy" line with
slope equal to A,unity in this case. The frequency count of the times to
next event, T, are shown in the bottom of Figure 5.7, and we can clearly

see the exponentialdistribution with this many events. Note that to


generatea sample of the exponential distribution for the purposes Of
simulation,one can simply take the negative of the logarithm of a uniformly distributed variable on [0, 1]. Most computational languages

provide functions to give pseudorandom numbers following a uniform


distribution,so it is easy to produce samples from
the exponential distribution as well. See Exercise 5.14 for
further discussion.

5.3 Stochastic Kinetics


479
10

10

Figure 5.6: A sample


path of the

unit Poisson

12

process.

1000
800
600
400
200
200

400

600

800

1000

250
200
150
100
50

Figure 5.7: A unit Poisson process with more events;samplepath


(top) and frequency distribution of eventtimes T.

480

Stochastic Models
and

The time of the first event also characterizes the


0) for t 0. The probability that Y is still zero at time t

is the
the probabilitythat the first event has occurred at some
sarne
time
as
=
t)
>
1 - Pr(T t).
than t, or Pr(Y(t) = 0) = Pr(T
Therefore
the relationships
= 0) = I-FT(t) = eAt
Wenext generalize the discussion to find the probabilit
the time of the second and subsequent events. Let ran y density
dom
for
denote the time of the second event. Wewish to comput variable
(t2, t). Because of the independent increments
PT(t2
0

Integrating the joint density gives the marginal


PT2(t 2) =

PT(t2 t)pT(t)dt

(t2) = 2t2e -t2

or
(t) = 2te-t.We can then use induction to obtain the
of the time for the nth event, n > 2. Assumingthat Tn-l has density
density
t / (n 2)!, we have for Tn

t)
e-(tn-t)

(VII

AnI

(5.49)

From here we can work out Pr(Y(t) = n) for any n. For Y(t) to be n at
time t, we must have time Tn t and time TTL
+1 > t, i.e., n eventshave
occurred by time t but n + 1 have not. In terms of the joint density,we
have
P Tn+l'Tn(t', t)dtdt'

481

Kinetics
5tocastiC

increments property allowsus to express


independent
as PTn+1,Tn
the previous equation gives
to
evreensity
in
(5.49)
and
t nle At
dtdt'

n)

(5.50)

alternative derivation. The discrete density


an
for
5.13
(5.50),i.e, p(n) = e-aan/n! with
the right-hand side of
Eserc1Soen
the Poisson density. Its mean and varias
known
seeearing
At, is
= At,
5.12). so we have that
(see Exercise
equal to a
Figure 5.7.
with
consistent
ear only as the product At, the Poisson process
is
app
t
and
of the
Because X, now denoted YR(t), can be expressed in terms
Y(t), with the relation
intensity
process, denoted
poison
Y(t) = Y(t)

follows. we
as
is
justification

have just shown


At

we have Pr(Y(t) = n) = tne t/n!,


process,
Poisson
unit
and,for the
on the substitution of At for t. Becausethe increequivalent
whichis
we also have the property for all n 0
independent,
mentsare
t s
= n)
Pr(Y(t) Y(s) = n) = Pr(Y(t s)
(5.5) for the Wiener process.
to
similar
is
which
we consider the nonhomoNonhomogeneousPoisson process. Next
the intensity Mt) is time varying. We
geneousPoissonprocess in which
general case so that the probdefinethe Poisson process for this more
proportional to

is
abilityof an event during time interval [t, t + At]
for At small. We can express the nonhomogeneous process
alsoin terms of a unit Poisson process with the relation
Y(t)

Toseethat the right-hand side has the required property, we compute


theprobabilitythat an event occurs in the interval [t, t+At]. Let z(t) =

Stochastic Models and

482

Processes

Jot (s)ds. We have

Pr(t ST St + At) =

+ At)

= Pr(Y(z(t + At)

YR(t) > 0)
z(t)) > 0)
(s)ds) = 0)

(s)ds

For At small, we can approximate the integral as


giving
Pr(t < T st + At) 1 (1 (t)At) =

(s)ds

and we have the stipulated probability.

Random time change representationof stochastic kinetics. With


these results, we can now express the stochastic kinetics problem in

terms of the Poisson process. Assume ny reactions take place between


s, and denote
chemical species with stoichiometric matrix v e RY1rXn
Let X(t) e ons
its row vectors, written as columns, by Vi,t -

be an integer-valuedrandom variablevector of the chemicalspecies


= 1, 2, ... , be the kinetic rate expressions
numbers, and let
for the ny reactions. We assign to each reaction an independent Poisson

process Yiwith intensity Yi. Note that this assignment gives nr nonhomogeneous Poisson processes because the species numbers change
with time, i.e., ri = ri(X(t)). The Poissonprocesses then count the
number of times that each reaction fires as a function of time. Thus
the Poisson process provides the extents of the reactions versus time.
From these extents, it is a simple matter to compute the species numbers from the stoichiometry. We have that

i=l

(5.51)

This is the celebrated random time change representation of stochastic


kinetics due to Kurtz (1972).
Notice that this representation of the species numbers has X(t) ap-

pearing on both sides of the equation. This integral equation representation of the solution leads to many useful solution properties and
simulation algorithms. We can express the analogous integral equation

5.3

for

stochastic

483

Kinetics

mass balance given in (5.47)


stic continuum

tile

dolli
timeS

Simulation
Stochastic
53.3
ime change representation suggests a natural simulation
t
random strategy for the species numbers X(t). We start with a
The
samplingown initial condition, X (0). We then select based on each
or
kn
chosenor exponentially distributed proposed times for the next rereaction, t 1, 2, . .. , nr. These exponential distributions have inTi,
actions,
As mentioned
to the different reaction rates,
equal
tensities
we obtain a sample of an exponentialFTt(t) =
previously,
sample of a uniformly distributed RVon [0,1], u, and
a
drawing
by

logarithm
rescalingthe

Ti = (l/ri)lnui

the
Wethen select
to fire,

giving

i=

reaction with the smallest event time as the reaction

tl

min

Ti

il = arg min

Ti

1 , nrJ

numbers at the chosen reaction time with


Wethenupdate the species
thestoichiometriccoefficients of the reaction that fires

X(tl) =

Thisprocessis then repeated to provide a simulationover the time


intervalof interest. This simulation strategy is known as the FIRST

(Gillespie, 1977). We summarize the first reaction


METHOD
REACTION
methodwith the following algorithm.

Algorithm 5.5 (First reaction method).


Require: Stoichiometric matrix and reaction-rate expressions, Vi,ri (X),

1,2, ... , nr; initial species numbers, Xo;stopping time T.


1:Initializetime t 0, time index k = 1, and species numbers X(t) =
xo.

Stochastic Modelsand

484

Processes

Figure 5.8: Randomly choosing a reaction with appropriate Probabil_

ity. The interval is partitioned according to the relative


sizes of the reaction rates. A uniform random numberu
is generated to determine the reaction. In this case, since

P2 s u

P3, m = 3 and the third reaction is selected.

2: Evaluate rates ri = ri(X(t)). If ri = 0, all i, exit (system is at steady


state.)
3: Choose nr independent samples of a uniformly distributed RV
Compute random times for each reaction Ti = (1/ri) In ut.
4: Select smallest time and corresponding reaction, Tk = min

ik arg

-rt.

5: Update time and species numbers: tk t r k, X (tk) = X (t) + Vtk


6: Set t = tk, replace k k + 1. If t < T, go to Step 2. Else exit.

Gibson and Bruck (2000) show how to conserve random numbers


in this approach by saving the ny -- 1 random numbers that were not
selected at the current iteration, and reusing them at the next iteration. With this modification,the method is termed the NEXTREACT10N
METHOD.

An alternative, and probably the most popular, simulation method


was proposed also by Gillespie(1977, p. 2345). In this method,the
reaction rates are added together to determine a total reaction rate r =
The time to the next reaction is distributed as PT(t) =
re-rt. So sampling this density provides the time of the next reaction,
which we denote T. To determine which reaction fires, the following
cumulative sum is computed

Pi = E rd r,
Note that 0 = po PI P2
= 1, so the set of Pi are a partition of [0, 1] as shown in Figure 5.8 for nr = 3 reactions. The length
of each interval indicates the relative rate of each of the nr reactions.

So to determine whichreaction m fires, let u be a sample from the

485

Kinetics
5tocastiC

[O,1], and determine the interval m in which

53

pmI u pm
fires m and the time of the reaction T, we then

rli Y
falls

that
reaction umbers in the standard way
n
te

tne
Gj%ante
LIP

species

or simply the
wn as Gillespie's DIRECTMETHOD
(SSA).we
this method

direct method or SSA).


tthe 5.6 (Gillespie's
matrix and reaction-rate expressions, Vi,ri (X),
goritlll
stoichiometric
N
initial species numbers, Xo;stopping time T.
uire:
nr;
,
gea
time index k = 1, and species numbers X (t) =
O,
--t
time
Initialize
1:

and total rate r = Eiri. If r = O,exit


=
rates
Evaluate steady state.)
at

(systemis independent samples, u 1, 112, of a uniformly distributed


choose two Compute time of next reaction T = (l/r) Inui.
3:
1].
RVon [0, reaction, ik, as follows. Compute the cumulative sum,
which
4:selectStg. _1 rj/r for i e [0, nr]. Note po = 0. Find index ik such that
Pik'

and species numbers: tk


time
5:Updatetk, replace k k + 1. If t < T, go to Step 2. Else exit.
6: Sett
results when starting with n A = 100 molecules.
Figure5.9 shows the
the simulation gives a rough appearance
of
aspect
random
Noticethe
molecules versus time, which is quite unlike the deof
number
to the
presented in Figure 5.5. Becausethe number of
terministicsimulation
simulation is discontinuous with jumps at
moleculesis an integer, the
in spite of the roughness, we can already
thereactionevent times. But
series reaction: loss of starting
makeout the classic behavior of the
intermediate
2:

materialA, appearance and then disappearance of the


5.9
speciesB, slow increase in final product C. Note also that Figure
isonlyone simulation or sample of the random process. Unlikethe deterministicmodels, if we repeat this simulation, we obtain a different
sequenceof random numbers and a different simulation. To compute
accurateexpected or average behavior of the system, we perform many
oftheserandom simulations and then compute the sample averages of
quantitieswe wish to report.

Stochastic Modelsand

486

Processes

100

80

nc

60

40
20

4
5

Figure 5.9: Stochastic simulation of the first-order series reaction


starting with 100 A molecules.

5.3.4 Master Equation of Chemical Kinetics


The simulations in the previous section allowus to envision many posSible simulation trajectories depending on the particular sequenceof
random numbers we have chosen. Some reflection leads us to consider instead modeling the evolution of the probability density of the
state. We shall see that we can either solve this evolution equation directly, or average over many randomly chosen simulation trajectories
to construct the probability density by brute force. Both approaches
have merit, but here we focus on expressing and solving the evolution
equation for the probability density.
Consider the reversible reaction
kl

A+B=C

(5.52)

taking place in a constant-volume, well-stirred reactor. Let p (a, b, c, t)

denote the probability density for the system to have a moleculesof

species A, b molecules of species B,and c molecules of species C at time

t. We seek an evolution equation governing p(a, b, c, t). The probability density evolvesdue to the chemicalreactions given in (5.52).
Consider the system state (a, b, c, t); if the forward event takes place,

53

stochastic

487

Kinetics

from state (a, b, c, t) to (a 1,b 1,c + 1, t + dt).


s
move
takes place the system moves from state
system reaction event
+ 1, c 1, t + dt). We have expressions for the
reverse
tlfllle
events
two
these
{ales'of
= k-1C
rl = ab
are

required for
rates
the

the trajectory simulationsof the pre-

But

events occurring at these rates change the probreaction


these
the system is in state (a, b, c, t). This evolution
110Wdensitythat
density is known as the master equation
ability for the probability
equation for this chemicalexample
1kinetics. The master
sections.

system IS

p(a, b, c, t) = (klab + k-1C) p(a,


t
(5.53)

reaction rate for each event is multiplied by the probthe


that
Wesee
that the system is in that state.
density
ability
reaction, we can simplify matters by definBecauSewe have a single
extent of the reaction. The numbers of molecules of
ingEto be the calculated from the initial numbers and reaction exeachspecies are
reaction stoichiometry
tentgiven the
C = CO+ E

Wesee that E = 0 corresponds to the initial state of the system. Using


thereactionextent, we define p (E,t) to be the probability density that
thesystemhas reaction extent E at time t. Converting (5.53)we obtain

p(E,t)
t
(5.54)

Thefour terms in the master equation are depicted in Figure 5.10.


Givenao,bo,co we can calculate the range of possible extents. For
simplicity,
assume we start with only reactants A and B so co 0.

Thenthe minimum extent is E = 0 because we cannot fire the reverse

488

Stochastic Models
and P

reaction from this starting condition. If we fire the

forward

the range of possible extents is 0 E n. We now can


equations stemming from the master equation and
place them
.
In I
po

(30

(31

P2

the

Yo
Yl

(32 Y2

dt
pn-l
pn

Yn-l

Pn-l

(5.5S)

pn

in which pj(t) is shorthand for p (j, t), and (Xj,j, and


lowingrate expressions evaluated at different extents ofYj are thefol.
the reaction
cxj= kl(ao j +
j + 1)
kl(ao j) (bo j) 1<-1(co + j)

We can also write this model as

dt

(5.56)

in which P is the column vector of probabilities for the different


reaction extents
and the A matrix contains all the model parameters.

The essential connectionbetween the stochastic and determinis-

tic approaches to the well-mixed chemical kinetics problem is thatthe


stochastic model's probability density becomes arbitrarilysharpat the

solution to the deterministicproblem as the number of moleculesincreases. Figure 5.11 displays the solution to (5.55)starting with20A

molecules, 100 B molecules and 0 C molecules. The extent of reactionis

scaled by the initial number of A molecules. Notice that the probability

density spreads out rapidly as time increases and there is significant


uncertainty in the equilibrium state.
If we increase the starting number of molecules by a factorof 10,
we obtain the results depicted in Figure 5.12. Noticethe sharpening

489
Kinetics

5toca5tlC

p(E, t)

-1

equation for chemical reactionA + B=C. The


Master
5.10:
at state E changes due to forward
probability density
Figure
reaction events. The rate of change is proand reverse
reaction rate times the probability denportional to the
in that state.
sity of being

time is

density. We can see that the extent versus


probability
is approaching the mass
the by the peak in probability density
You can imagine the sharpness in the density if
tracedout
limit.
kinetics
with on the or
out
near that limit, the
westarted earlier,however, if we are not operating
stressed
may be an important physical behavior to include
fluctuation
random
this behavior, the stochastic approach is
describe
To
inthemodel.the deterministic approach cannot be substituted.
and
essential
(5.56),is a simple linear, constant-coefficient
equation,
master
The
equation, and the solution is
differential
P(t) = e Atpo

master equation directly is its high dimenThechallengein solving the


different species values
sion.Thedimensionof P is the number of
have a single reaction, the
thatthe system can reach by reaction. If we
extentcanrange from zero, its initial value, to a value that exhausts
somelimitingspecies. Denote this limiting species's initial number by
no,thedimensionof the state vector P is then no. But if we have mul-

tiplereactions,we multiply nr by the limiting species corresponding


toallthecombinationsof reactions. The scalingis on the order of the
product
nonr. If we have 1000 initial molecules with 10 reactions, the
dimension
of the master equation P vector is already on the order of

490

Stochastic

Models

10

0.2
0.4
t

0.4

0.6

0.6

0.8

.2

0.8
1

Figure 5.11: Solution to master equation for A +

20 A molecules, 100 B moleculesand 0 startingWith


C
3. Congratulations, you molecules
now understand what is displayed on the cover of
the text.

104. The A matrix already contains 108 elements, although


it wouldbe
quite sparse.

Thus solving the master equation becomes


ally intractable for problems of even modest size. The bestComputationwe canhope
for with these larger models is to sample the master equation
withsimulations. Even simulating enough trajectories to obtain reliable
averages can be quite time consuming, which motivates researchsample
efforts
to develop efficient simulation algorithms and sampling strategies.
Giventhis basic understanding, we now express the general
master
equation for nr reactions with the random variable X (speciesnumbers)

as the state of the system rather than the reaction extents. Given
a
system in state x e ons, reaction i with stoichiometricvectorVican

reach state x from only state x Vi,and can leave this state to reach
state x + Vi. We then have for the evolution of the probabilitydensity

dt

(5.57)

tocastic
5

491

Kinetics

30

20

10

0.2

0.4

0.6

0.2

0.8

starting with
Solution to master equation for A +
5.12:
Figure
200 A molecules, 1000 B molecules and 0 C molecules,
-- 1/200, 1<-1=

PX(x, 0) m (x). Equation (5.57)is the CHEMIwithinitial condition


for a general reaction network. It is also known
EQUATION
MASTER
CAL
equation in the mathematics literature.
as the forward Kolmogorov
Applying(5.57) to the previous example we have

rl(x) = IonA r-l(x) = k-lnc

nc
andmaster equation
dt

(rl (x, t) +

(x, t))px(x, t)

Stochastic Models

and

Processes

or, written to show the species numbers


Px
dt

nc

, t kl (nA + l)px

k -1 (nc + 1) PX

t)

+1

nc 1
(1<1
n A + k-1nc)Px

nc
5.3.5 Microscopic, Mesoscopic, and Macroscopic Kinetic
Models
Next we would like to explore how the discrete stochastic kinetic
of a microscopic system transforms into the deterministic kineticmodel
of a macroscopicsystem that is familiar to undergraduate model
chemical
and biological engineers. Along the way, we derive a model for the
the regime bridging the microscopic and macroscopic levels,which
sometimes called the mesoscopic regime. Our goal is to start with is
the
microscopic chemical master equation and take the limit as the sys.
tem size becomes large. We use the system volume Q for the size
parameter. The procedure we follow is given by van Kampen(1992
EXPANSION.
pp. 244-263)and is known as the OMEGA
The essentials
of the approach are perhaps best explained by taking a concrete(and
nonlinear) example. Consider the bimolecular reaction

In the deterministic macroscopic description, we have a reaction-rate


expression r = kc2, in which c is molar concentration of A, an intensive

variable, and the rate constant k has units of 13/(mol t), so the rate
has units of mol/ (t 13),a rate of reaction per volume,whichis also
intensive. The mole balance for species A in a well-mixed system is the
familiar

dc

dt

= 2kc2

c(0)

co

(5.58)

For these same kinetics, at the small scale, we have the microscopic
chemical master equation
n(n 1)

dt

20

20

(5.59)

in which n is the number of A molecules in the well-mixedsystemof


volume Q. Here n is a discrete (nonnegative,integer-valued)random

493

Kinetics
53

5tocastiC

0.2

0.15

o 20 0.0114

var(E) --

0.1

0.05

0.4

0.2

0.6

0.8

0.06

0.04

Q = 200
var(E) = 0.00176

0.02

0.2

0.4

0.6

0.8

The equilibriumreaction extent's probabilitydensityfor


5.13:
Figure
Reactions 5.52 at system volume Q = 20 (top) and Q =

200 (bottom). Notice the decrease in variancein the

reaction extent as system volume increases.

large, we expect the concentration c = n/Q to


variable.As Q becomes
by the ODE(5.58). It is initially far from clear how we
bewelldescribed
transition from a discrete-valuedrandom
takethis limit to make this
deterministic variable c.
variablen to a continuous-valued
Tomotivate the appropriate analysis, we first look at solutions to
themaster equation for increasing values of Q. Figure 5.13 shows the
finalequilibrium distributions of the scaled reaction extent, E, from
Figures5.11 and 5.12. We have increased the system size from Q = 20
in the top figure to Q = 200 in the bottom figure. We also show the
variancein random variable E in the two simulations. Notice that for a
ten-foldincreasein Q, the variance has decreased by almost this same
ten-foldamount. From these solutions to the master equation we have
someidea what to expect. For a large system, the integer increments in
thenumberof molecules n become so fine that we can approximately
replacethem by a continuous variable c. But we also see randomness

494

Stochastic Models
and

Processes

in the concentration, and although the (relative) magnitude


of
centration fluctuations decreases as the system size increases, the con
it is
zero. In fact, we see that the familiar normal distribution a
not
ppears
of
the
to
fluctuations
distribution
scribe the probability
and the
des
variance

Therefore we are led to hypothesize that we can approximate


n as
a combinationof the deterministicconcentration c and a
random variable to capture the fluctuations. Based on ourcontinuous
numerical

so that the variance in n/Q scales with Q-1, i.e., var(n/Q) =

(5.60)

var).
We are neglecting terms of order 00 and lower in the expansion
of n
in (5.60). Thus we are expressing n/Q as a perturbation solution
in
increasing powers of small parameter Q-1/2. The additional complica_
tion in this case compared to our previous perturbation examples
in
Chapters 2 and 3, is that we are also changing from a discrete variable
n to continuous variables c and E.
The master equation describes the density of random variable
n
P(n, t), and we wish to deduce an evolutionequation for the density
of random variable which we denote FI(;, t). And we also expectthe
analysis to show that the familiar differential equation (5.58)describes
the deterministic variable c. As a transformation of random variables
we are considering the two densities to be related by
POI, t) = P(cQ + 0 1/2, t)
in which we suppress the dependence of n on c. Consider c to be some
known function of time when expressing the transformation between
the two random variables n and
Given this transformation, the partial derivatives are related by Pt =
and is found by differentiating (5.60)holding n constant,
Flt +
which yields
Fot Q1/2

in which represents the time derivative of c (t). Substitutingthis into


the relation for the partial derivativesgives
This is the first step. Wehave the left-hand side of the master equation
evaluated in terms of the new density Il. Next we work on the right-

hand side.

495

Kinetics
5tocastiC
53

simply the transformed Il(E, t), but wc also reis


t)
p(n,
(5.60) for so that we know what

term t). First we solve


flie 4-2,
p(ll to in variable
Llire onds
E(n) = nQ-1/2 -- cQl/2
corresp

+ 2), t) in terms of

and

(5.61)

= 20-1/2. Next we use a Taylor series

E(n+ 2) --to

n +2

t), denoted simplyas

its

-11 +

80-3/2
3!

2!

(5.62)

in the Taylor series determines the order


terms retained
of
number ation for the density Il. We now can easily transform
Tlae approxim
terms in n using (5.60)
of the
remaining

n(n 1)

(11+

+ 1) - c2Q +

+ (3C+

+ 20-1

all of these ingredients by substituting them into the


combine
we
Now
equation (5.59) giving
master

-[4C +
+
k[c2Q1/2

+ 20 -1 ] 11+

+ (3c +

k[c2+ 2c;Q-1/2+ (3C+

+ 30-1 + 20-3/2111;+
+ 30-3/2 + 20-2]

to the second-order term in (5.62).


inwhichwe have kept up
Thethird and final step is to extract from this large equation the informationprovided at the different orders of the expansion parameter
Q.

order 01/2. Collecting the terms of order 01/2 gives (C+ kc2)11; = 0
and,sinceIl; * 0, we deduce

dc

dt

= kc2

(5.63)

whichis the macroscopic equation (5.58) after noting that the usual
macroscopic
convention absorbs a factor of one-half into the definition
of the rate constant, i.e., k = k/ 2.

496

Stochastic Models
and

Processes

Order 00. Collectingthe terms of order QC)gives


+ kc 2
lit = 2kc 11+ 2kc;
which can be rearranged into
(kc 2 H)

This is the familiar Fokker-Planckequation, (5.41),which we


can Write
as an equivalent SDE

= -2kc;dt+

dw

Because this is a linear Fokker-Planckequation (the drift term is


line
in and the diffusivity is independent of g), this equation is
SOmetimes
referred to as the linear noise approximation.
To simulate the model at this level of approximation, we first
solve
(5.63) for c (t) , and then perform a random walk simulation for the

tuation term

flucwhich depends on c (t). We combine these twoparts

for n(t) using (5.60). This description in which c(t) is deterministic


and E(t) is a continuousrandom walk is the mesoscopicdescription.
We see the results in Figure 5.14. The top figure shows the discrete
simulation using KMCfor volume Q 500 and initial conditionof 500
A molecules no 500. Note that the plot has a log scale on the time

axis to more clearly show the evolution at early times. These two simulations display quite similar character. To compare them more quantitatively, we could compute several low-order moments of the densities
by computing sample averages over many simulations.
As a more comprehensive alternative, we compute the correspond-

ing cumulativeprobabilitydistributions at the selected time t 1,


shown as the dashed line in Figure 5.14. We obtain the cumulative
distribution for the discrete model by solving the master equationand
summing
0 <n

no

We can obtain the density for the omega expansion by solvingthe PDE
for Il (4 t), shifting the mean by the deterministic c (t), and integrating
for the cumulative distribution. Or we can instead derive a corresponding evolution equation for g's cumulative density

Il(,

5.3

stochastic

0.01

497

Kinetics

0.1

10

100

0.1

10

100

0.6
0.4
0.2

0.01

Figure5.14: Simulationof 2 A

B for no = 500, Q = 500. Top:

discrete simulation; bottom: SDE simulation.

c(t), t). Exercise5.22


andshiftits mean by c(t), Fc(x,t) =
discussesthis approach in more detail. The results are shown in Figure5.15.The staircase function is the solution to the discrete master
equationat time t = 1, at which time the deterministic concentration
is one-half,i.e., c(t) = 1/2 at t = 1. The steps in x = n/Q are caused
bythezero probability at all the odd integer values of n in the discrete

model.The smooth function is the omega expansion, which we can see


isin reasonablyclose agreement with the discrete model for Q = 500.
Finally,in the limit as Q 00,the fluctuation becomes negligible

compared
to c, and we have the familiar deterministicmacroscopic
description,(5.63) or (5.58). In Figure 5.15, this limit would be observed

Stochastic Modelsand Processes


498

-1

0.8

0.6

F(x, t) 0.4
0.2

o
0.4

0.45

0.5

0.55

0.6

B at t = 1 with no =
A
distribution for 2 equation (steps) versus
Cumulative Discrete master
Figure 5.15:
500, Q = 500.
omega expansion

by the two

(smooth).

function at the value of


to a unit step
converging
functions

Exercise 5.22.
also
See
c(t).
=
x
rapidly growing literature on stochasand
extensive
an

Thereis now
Anderson and Kurtz (2011) is highly
by
chapter
book
current and comprehensive
tic kinetics. The
a
in
interested
recommendedfor those
covered here as well as more advanced
topics
the
of
overviewof most
theorems for Poisson processes, marlimit
central
relevant
topics on:
model reduction.
tingales,and scaling and

5.4 Optimal Linear State Estimation


5.4.] Introduction
Sensorsare how we learn about the world. Our five natural senses

provideus with our first exposure to sensors, i.e., the type built in
by nature. Sincehumans are very curious about the world, people
havebeen hard at work for a long time augmenting the natural senses
by constructingartificialor man-made sensors. Some of mankind's
biggestadvancesin scienceand engineering were precipitated by a
breakthroughin sensor technology,e.g., the telescope, the microscope,
detectorsfor electromagneticradiation outside the visible range, etc.
Oneof the importantthings that we know about sensors is that
theyare limitedand imperfect indicators of the world around us. They

tima/

Linear state

Estimation

499

fteJ1

makes it challenging for us to interpret what the sensor is


as well as the systems that we are
Finally, all sensors,
trying
uncontrolled
to
and
subject
random
are
effects.
ling '
fundamental problems in systems engineeringis to detaking these imperfect measurements of imperfectly
for
methods

ran
Wemay decide in some situation that a change in a
sensor's
disturbance.
indicates that the system has changed. But we may designal
sensor
Sl
a different or disturbance to the sensor itself,
ill
and the system
cide random effect
a
unchanged. Optimally combining these two sources of
completely
is
what the sensor tells us and the other knowledgethat we
formation:
n
the system's behavior, is the task of state estimation.
about
have
these concepts precise, we consider a linear system. Let
make
To
an Il-vector containing all the relevant information about a
be
X
interest
systemof
x + = Ax + Bu

are the input variables that also affect the evolution


Theu variables we control the inputs, they are called
If
actuators, i.e.,
of the system. chemical plant. If the inputs are not
controlled by us,
thevalvesin a
as disturbances, and often given another letter to intheyare regarded
e
dicatethis difference. We use w Rn to represent the disturbances.
Becauseof the central limit theorem, these will be considered normally
distributedrandom variables with zero mean and variance Q. The dynamicmodel is then
x + Ax + Bu + w
Theinitialstate of the system xo is also generally unknown and willbe
considereda normally distributed random variable with mean xo and
varianceP(O). Now we consider the sensors. Let y e RPbe the p-vector
ofavailablemeasurements. Normally p < n indicating that we are not
measuringeveryrelevant property of the system. Becausesensors are
expensive,
often p < n indicating that we have a complex system with
manystates, but are information poor with few measurements. The
sensoris also affected by random disturbances, which we denote by v.
Because
the input u is considered known, we can removeit from the

500

Stochastic Models and

Processes

model for simplicity without changing any important features of


the
state estimation problem. The linear model of interest is then
x + = Ax + w

and the disturbances and unknown initial condition satisfy

If the measurement process is quite noisy, then R is large. If the


measurements are highly accurate, then R is small. Similar COnsidera_
tions apply for the process noise, Q. If the state is subjected to large
disturbances, then Q is large, and if the disturbances are small,Q is
small. Again we choose zero mean for w because the nonzero mean
disturbances should have been accounted for in the system model. The

variance P (0) reflects our confidence in the initial state. If we know how
the system starts off, P (0) is small. If we have little knowledge, we take
P (0) large. Recall the noninformative prior is a uniform distribution,
which we can approximate by taking P (0) very large. In industrial applications, the initial condition may be known with high accuracyfor
batch processes. But the initial condition is usually considered largely
unknown when analyzing a dataset taken from a continuous process.
We require three main results concerning normals, conditionalnormals, and linear transformation. These follow directly from the properties of the normal established in Chapter 4, but see Exercise5.24for
some hints if you have difficulty deriving any of these. Recallalso the

normal function notation (4.13)


n(x,m,P)

exp

(x

m)

which was introduced in Chapter 4, and will be used frequently in the


following discussion.

Joint independent normals. If pxlz(xlz) is normal, and y is statistically independent of x and z and normally distributed

Pxlz(xlz) = n(x, mx,Px)

N(my,Py)

y independentof x and z

timal
5.4

Linear

te

teJ1

State Estimation

501

density of (x, y) given z is


conditionaljoint

px,ytz

(x,ylz) = n(x, mx,Px) n(y,my,Py)

px,ylZ

mx

PX 0

Y ' my

(5.64)

tion of a normal. If x and z are jointly normally dis-

conditional density pxlz(xlz) havingmean m and


with mean Am and varianceAPAT

pylZ

pxlz(xlz) = n(x, m, P)
Pylz(ylz)

normal.
of a joint
conditionalis normal
givenz
x z

thenthe conditional

y = Ax

(5.65)

If the joint conditional density of (x, y)


x

PX Pxy

Y ' my ' Pyx y

density of x given (y, z) is also normal

= n(x,m,P)
in which

-m= mx +PxyPy- (y -my)

(5.66)

P=Px-P xyp-lp
y yx

mean m is itself a random variable because


Notethat the conditional variable y.
the random
it dependson
5.4.2 Optimal Dynamic Estimator
Wehavespecifiedthe random process of interest
x + = Ax + w

(5.67)
(5.68)

withknown densities

N (0, Qw)

502

Stochastic Models
and

Processes

We
next derive the optimal estimator for this process.
this derivation, we will derive the probability densities of As part
function of time. This is the same pattern that we follo the state asOf
wed
a
two sections on Brownianmotion and stochastic kinetics. in the first
the random process (Wienerand Poisson processes), Westarted
derived their probability density equations (Fokker-plank and then we
and
Because we have assumed a prior, the density of x (0),
we
Bayesian estimation. The overall game plan is as follows. are using
state x(0) is assumed normal. Our optimal estimate beforeThe initial
ment is denoted R- (0). The minus sign indicates estimate measure.
before
surement. We obtain from the sensor measurement y (0).
Wethen
compute the conditional density of x(0) ly(0). We

show that is also


normal. The maximumof that conditionaldensity is our
optimal
timate after measurement, denoted R(O). We are combining
surement with the prior to calculate the posterior. Then wethe mea_
random process (5.68)to forecast the state forward one timeuse the
obtain x(l). We show that the density of x(l) (conditionedonstep to
is also normal,3 and the maximum of that density is our
estimate
k = 1 before measurement, R-(1). Then we add measurement at
y(l)
and compute the conditional density of x (1) Iy
y ( 1); its maximum
gives R(l), and we continue the iteration. So now we fill in the
details.

Combining the measurement. We start off at k = 0 with estimate


R- (0) = xo and consider the effect of adding the first measurement.
We obtain noisy measurement y (0) satisfying
y(0) = cx(0) + v(0)
in which v (0)

N(O,R) is the measurement noise. Giventhe measure-

ment y (0) , we next obtain the conditional density PX(0)b, (0)(x (0) Iy (0)).

This conditional density describes the change in our knowledgeabout

x(0) after we obtain measurementy(0). This step is the essenceof

state estimation. To derive this conditional density, first considerthe


pair of variables (x(0),y(0)) given as

x(0)
y(0)
We assume that the noise v (0) is statisticallyindependent of x(0),
and use the independent joint normal result (5.64)to express the joint
3Because we have linear transformations of normals at each step of the procedure,
every density in sight will be normal,

tuna/

Linear

503

State Estimation

X(0)

xo
o

(2(0) o

pair (x(0),y(0)) is a linear transforequation, the


Therefore, using the linear transforprevious (O),v
te
gives
(5.65), and the density of

tne

+R

CQ(O)
Y(O)

the conditional of a joint normal


density, we then use
joint
obtain
(5.66)to
(x(0)ly(0)) = n
result

ill

which

+ L(O) (y(0) CRO)


-1

P = (2(0)
density
the conditional

is normal. The optimal

weseethat is the value o


stateestimate
that is the mean, and we choose R(O) = m. We
normal,
a
For
variance in this conditional after measurement y(0)
the
denote
also
previous equation. The change in
the
in
given
P
with
byP(O)= P measurement (Q(O) to P (0)) quantifies the information
after
variance
measurement y (0). The variance after measureobtaining
by
increase

than or equal to Q(O),which implies that we


ment,P(O),is always less
measurement; but the information gain
canonlygain information by
device is poor and the measurement
maybe smallif the measurement
noisevarianceR is large.
Forecastingthe state evolution. Next we consider the state evolution
fromk = 0 to k = 1, which satisfies

x(0)
x(l) = [A 11
inwhichw(0) N(O,Q) is the process noise. We next calculate the
conditional
density
Now we require the conditional version

Stochastic Models and

504

Processes

We assume that the process noise


(x(0),w(O)).
of the joint density
both x(0) and v(()), hence it
of
independent
is
w (0) is statistically
combination
linear
of
a
is
x(()) and
which
also independent of y(0),
to obtain
v (0). Therefore we use (5.64)
R(O)
x(0)

w(0)
of the linear transformation of a
We then use the conditionalversion
normal (5.65)to obtain
in which the mean and variance are
R-(1) = AR(O) P-(1) =

+Q

We see that forecasting forward one time step may increase or decrease
(O)AT may be smaller
the conditional variance of the state. The term AP
or larger than P(O),but the process noise Q always makes a Positive
contribution.
is also a normal, we are situated to add meaGiventhat
surement y(l) and continue the process of adding measurements followed by forecasting forward one time step until we have processed
all the availabledata. Becausethis process is recursive, the storage requirements are small. We need to store only the current state estimate
and variance, and can discard the measurements as they are processed.
The required online calculation is minor. These features make the optimal linear estimator an ideal candidate for rapid online application.
We next summarize the state estimation recursion.

General time step k. Denotethe measurement trajectory by

y(k) = {y(0),y(1),...y(k)}
At time k the conditionaldensity with data y(k 1) is normal
and we denote the mean and variance with a superscript minus to in-

dicate these are the statistics beforemeasurement y(k). At k = 0,


the recursion starts with R- (0) = Xo and P- (0) = Q(O) as discussed
previously. We obtain measurement y (k), which satisfies

x(k)

tuna/

Estimation
State
Linear

505

v(k)) follows from (5.64)since


of (x(k),
measurement
of x(k) and

1)

n dependent

flie
11

x(k) N

oise

gives the joint density


then
(5.65)

x(k) -N ( CR-(k) ' CP-(k)

y(k)
note
reslilt

ill

l),y(k)} = y(k), and using the conditionaldensity


-{y(k

(5.66)

which

gives

(x(k) ly(k)) = n (x(k),R(k),P(k))

L(k) (y(k) - CR-(k))


R(k) = R-(k) +

from k to k + 1 using the model


forecast
We

x(k)
x(k+ 1) = [A I]J w(k)

w(k) is independent of x(k) and y(k), the joint density of


Because
from a second use of (5.64)
(x(k),w(k)) follows

x(k)

w(k)

anda second use of the linear transformation result (5.65)gives

n(x(k +
in which

AR(k)
andtherecursion is complete.

+ 1))

Stochastic Models

506

and P

rocesses

Summary. We place all the required formulas for im

optimal estimator in one place for easy reference. The iniplementing


tial
th
for k = 0 are
condition
P-(0) = (2(0)
es
R-(0)

The update equations for time k

R(k)

0 are

- CR-(k))

R-(k) +

(k)C T + R) -l

(k)cT +

P-(k)
AR(k)

(3.69)

(5.70)
(3.71)
(3.72)

The full densities of the state before and after measurement

+ l)ly(k))
(x (k)ly(k))

(3.73)

are

n(x(k),R(k),P(k))

These formulas provide the celebrated Kalman filter (Kalman,


One of Kalman's key contributions was to use the state-space1960).
model
to describe the system dynamics. As we see here, after that step,
the
solution of the optimal filtering problem reduces to a few well-known
results about normals, linear transformation, and conditional density.
One of the main practical advantages of the Kalman filter is the ex.
tremely efficient implementation. One can update and store the conditional mean and variance with only a few matrix multiplicationsand
finding one matrix inverse. This efficient recursion makes the Kalman
filter ideal for online state estimation where one would like to findthe
optimal estimate in real time as the sensor measurements become available.

5.43 Optimal Steady-StateEstimator


Notice from (5.70)that the optimal estimator has a time-varyinggain,
L(k) , coming from the time-varying recursion for P (k) and P- (k), given

by (5.71) and (5.73). If we are willing to give up a small amountof


performance during small initial times, we can obtain an even simpler
filter. Assume for the moment that these recursions convergeto a
steady state. The steady state then satisfies
Ps =

- Ps-CT(CPs-CT + R) -I CV

Ps- = APSAT + Q

Linear State
ti1a/

tile

flie

steady-s

Estimation

507

ps from

the first equation into the second equationand

tate filter

gain then follows from (5.70)

(5.72)giving
and
(5.69)
Jfli11g s-(k + 1) = AR-(k) +

- CR-(k))

steady-state filter gain, Ls. Online one has to store only


the
computes estimate R- (k), and implement a few matrix-vector mulandcurrent vector additions after y(k + 1) is measured to obtain
tiplicationsand R - (k + 1). We have an ideal algorithm that combines
estimate,
thenext
storage requirements and extremely fast computation
small
extremely steady-state Kalman filter ideal for many applications
in
makingthe
disciplines.
engineering
many
problem, including state estimator design, we usuIn any design
conflicting, design

many,
allyhave

sometimes

objectives. Optimality is

objective. But we would also like some perforcertainlyone desirable

estimator. For example,if the disturbances


manceguarantees on the
does the estimate error become small as we
to the system are small

formulate this objective as a stability


collectmore measurements? We
questionin the final section. To motivate that discussion, consider the
followingcase: A = I, C 0, i.e., the system is an integrator and we are
notmakingany measurements. Even without disturbances, the system
evolutionis x + = x, and therefore x(k) = xo for all k 0. But (5.70)

givesthat LS 0, so the estimator equation is x = R and therefore


(k) = Xofor all k 0. Since the RVx(0) is not necessarilyat its
mean,x(0) xo, and we see that the state estimate does not converge
to the system state no matter how many "measurements" we make.
Thissystemneeds to be redesigned before we can obtain a state esti-

matorthat converges to the system state. It is clear what is wrong with

thissystemsince C = 0 provides no information from the sensor,but

todetectall such badly designed systems, we introduce the concept of

observability.

508

Stochastic

Models

and P

5.4.4 Observability of a Linear System


The basic idea of observability is that any two
distinct

ocess

states

pler First of all,


input is irrelevantand we can set it to zero.
the
Th

linear time-invariant system (A, C) with zero

th
ances

x + = Ax

Y = cx
with initial conditionx(0) = xo. The solution for
the state
Akxo, and the output is therefore
is X(k)
y(k) = CAkxo

(5.74)
The system is observableif there exists a finite N,
such that
xo, N measurements
1)} distinguish for every
the initial state xo. As
in Exercise 5.26, if we
uniquely
the initial state using n measurements, we cannot cannot determin
determineit using
develop a convenient
observability as follows. For n measurements, the system
test for
modelgives

CA

xo

(5.75)

The question of observabilityis therefore a question of


uniqueness
of
solutions to these linear equations. The matrix appearing
in this equation is known as the observabilitymatrix O

c
CA
(5.76)

From Section 1.3.6 of Chapter 1, we know that the solution to (5.75)is

unique if and only if the columns of the np x n observabilitymatrix

tima/

Linear State

Estimation
509
the system (A,

Therefore,

C) is

section that observabilityis a


next
sufficientcondition
the
stability.
stifllator
this observability analysis in a chemical
wee
illustrate
engineering
for
the following example (Ray,
contest,

we

1981, P.58).

present

observability of a chemical reactor


5.7:
ample isothermal, continuous well-stirred tank
reactor (CSTR)
an

'til

first-order

liquid-P

volumetric

flowrate Qf and tank volume VR are constant. The

f is the manipulatedvariable,
of A in the feed CA
and
concentration
0. Let

down the mass balances for species A and B and show that
Write
(a)

= Acx + Bcu
and Bc for this problem?
Whatare matrices Ac

only species A reactor concentrationwith


(b)consider measuring
sample time At > 0. What is matrix Cc in this case? Is the system

with this sampled measurement observable?

(c) Considermeasuring only species B reactor concentration. Whatis


matrix Cc in this case? Is the system with this sampledmeasure-

ment observable? Provide a physical explanationif this answer


differsfrom the answer to the previous part.

Solution

(a)Assuming constant density, the mass balances for A and B are

dt CB

-(F/V+k2)
(F/v

+ kl)

CB

CAf

510

Stochastic Models
and

Processes

We can convert this continuous time system into a dis


system by approximating the time derivative with an

dx
giving

dt

x + = Ax

+ 1) x(k)

At
y

Ccx
0

(At)kl

(b) For measuring only species A we have Cc 1 0]. We


then check
the observability matrix for the DT system, giving
1

which has rank one. Sincerank(O) < n, the system is not


observ_
able.
(c) For measuring only species B we have Cc
observability matrix

(At)kl 1-

0 1]. This givesthe


1

1<2)

which has rank two for all sample times At > 0. Sincerank(O) =
n, the system is observable.
The answers are different because measuring A tells us how much
total B we have produced, but we have no information about how
much Bwas present initially nor how much was consumed to produce C.Therefore we cannot reconstruct the Bconcentrationfrom
the model and the A concentration. Measuring species B, however, provides information about how much A is in the reactor,
because the A concentration affects the production rate of B.The
B measurement information plus the mass balances enableus to
reconstruct the A concentration. The value of the rank condition
of the observabilitymatrix is that it makes rigorous this kind of
physical intuition and reasoning.
41mproving the numerical approximation does not change the observabilityanalysis

that follows.

opt/

al

Linear

State

51 J

Estimation

Estimator
Optimal
of an
desirable characteristic,but systems engiStability
one
is
filter
such as stability. Stability
a
ut other characteristics
of

to illustrate

R(k) = x(k) R - (k)

stimate error can be givenby substitutingthe


e
the
of
evolution

AR-(k)
w(k)
+
1) Ax(k)
y(k)
system measurement

the
substitutinggives

terms
billing

- CR-(k))
Cx(k) + v (k) and com-

+ w(k) - ALsv(k)
of whether (A - ALsC)is a stable
question
the
is
stability
the unit circle.
Estimator
all its eigenvalues inside

+ 1) = (A

i.e.,has
matrix,
theorem covering the stability of the steadyfollowing
Wehavethe
stateestimator.
iteration and estimator stability). Given (A, C)
(Riccati
5.8
Theorem
> 0, P- (0) 0, and the discrete Riccati equation
R
0,
>
Q
observable,

Then

(a)ThereexistsPs- 0 such that for every P- (0)


limP-(1) = Ps-

1-+00

andPs-is the unique solution of the steady-state Riccati equation

P; = Qw + APS-A' - APs-C'(CPs-C' + R) -I CVA'


amongthe class of positive semidefinite matrices.

512

Stochastic Models

and

Processes

1.5
1

0.5
0

-0.5
-1

-1.5

-2

-1

-0.5

0.5
1

Figure 5.16: The change in 95% confidence intervals for R(klk)


versus time for a stable, optimal estimator. We start
at
k = 0 with a noninformative prior, which has an
infinite
confidence interval.

(b) The matrix A ALsC in which


LS = Ps-C'(CPs-C' + R) -l

is a stable matrix.
Bertsekas (1987, pp. 59-64)provides a proof of the "dual"of this
theorem, which can be readily translated to this case.
So what is the payoff for knowing how to design a stable, optimal
estimator? Assume we have developed a linear empirical model for a
chemical process describing its normal operation around somenominal
steady state. After some significantunmeasured process disturbance,
we have little knowledge of the state. So we take initial varianceP- (0)
to be large (the noninformative prior). Figure 5.16 shows the evolution of our 95%confidence intervals for the state as time increasesand
we obtain more measurements. We see that the optimal estimator's
confidence interval returns to its steady-state value after about only

513

the conditional variances of state given


Recall that
require the measurements. Only the optimal esSo we can assess the information
eas entdepend on the data.
before we even examine the data. But
system
sensor parameters Q and R almost alwaysneed
to be
tes our
e
data before we can perform this analysis.
feedback control to move the disturbed
use
to
plan
if we
re
optimal operating
its
to
control and therefore process performance.
our
better
fundamental topic appearing in many branches
1s a
a large literature. A nice and brief
thessttaattee'estilnation
engineering, and has
and
describing the early contributions to optimal
science
d bibliography
f

state estimation

problem for linear and nonlinear systems.

control
optimal
Ho, 1975; Stengel, 1994). The moving
Many
problem (Bryson and

estimation
estimati0n
horizonsystem nonlinearity and constraints, is presented by Rawlings
address
ch. 4).
and

(2009,

5.5 Exercises
walk with the uniform distribution
Random
5.1:
Exercise
walk simulation
againa discrete-time random
consider

(5.77)

x(k+l)

sample number, At is the sample time with t = kAt. Instead


inwhichx, w e R2,k is
ofusingnormallydistributed steps as in Figure 5.3, let w = 2N/3(u1/2) in which
1

otherwise

VS) with zero mean and unit variance. The Octave or


wethenhavethat w
MATLAB
functionrand generates samples of u, from which we can generate samples of
w withthe given transformation.

(a)Calculatea trajectory for this random walkin the plane and compare to Figure 5.3
for the normal distribution.

Stochastic Models
and

514

Processes

(b) Calculate the mean square displacement for 500 trajectories and
the normal distribution.
Figure 5.4 for

(c) Derive the evolution equation for the probability density

to

(x t)

in the limit as At goes to zero. How is this model differe'nt


th normally

steps?

distribute
d

Exercise5.2: The different diffusion coefficients D and 'D


In the chapter we compared two models for the evolution of concentratio
n
convection and diffusion processes
undergoing
= (v(x,t)c) +

and

= (v(x,t)c) +

(D(x, t)c)

x
in which we consider x, v, and D scalars. The first is derived from conservation
with a flux law defined by N = Dc/x. The second is the Fokker-Planck of mass
equation

dx = v(x,t)dt+

2D(x,t) dw

(a) Show that when the diffusivity D (x, t) does not depend on x, these two
models
are equivalent and D(t) = D(t).

(b) Showthat the Fokker-Planckequation can always be written in the following


convection-diffusion form with a modified drift term

= ((x, t)c) + (D(x,

x
and find the expression for i' (x, t).

Exercise 5.3: The diffusion coefficient matrices D and D


Repeat Exercise 5.2 but for the case in which x and v are Il-vectors and D and D are
n x n diffusion coefficient matrices
c

and
c

Exercise 5.4: Continuityof random processes


We know that the Wiener process is too rough to be differentiated, but is it evencontinuous? To answer this question, we first have to extend the definition of continuity
to cover random processes such as W(t). We use the following definition.

515

5,

ox
r(s)l s E)

for all t, s satisfyingIt sl s (5

integrating the discontinuous white-noise process smooths


that
an
one) establish continUOUS
creates a
to
It Formula and Momentsof Multidimen-

Multidimensional

5.5:

formula to derive

(5.33)and (5.34).

s
UseIto
dimensional form of It's formula, (5.38), for an SDEin the form
multi
the
perivge
dXi = Ai(x, t)dt + Bij(x, t)dWj
(b)
Recall

(c) Use

(5.20).

to derive the multidimensional versions of (5.33)and (5.34):

Lila
this form

d(Xi

X; )

dt

t=t'

d(Xi x'i)

dt

t=t'

- 2Dij(X',t')

= Ai(x', t')

Diffusionequation in one dimension with Laplacetransform


5.6:
Exercise
equation on the line

consider the

Wewishto

diffusion

= DV 2c

O < t, oo < x < 00

t
to an impulse source term at t = 0, c(x, 0) =
calculatethe response c(x, t)

(x).

already solved this problem using the Fourier transform. Here


(a)In Chapter3, we transform. Take the Laplace transform of the
one-dimensional
wetry the Laplace
diffusionequation with this initial condition and show
d 2C(x, s)
dX2

(5.78)

(b)Whatare the two linearly independent solutions to the homogeneous equation?

Breakthe problem into two parts and solve the differential equation for x > 0
andx < 0. You have four unknown constants at this point.

(c)Whichof the two linearly independent solutions is bounded for x 00?Which


of these two solutions is bounded for x -+ 00?Use this reasoning to find two
of the unknown constants.

Stochastic Modelsa
tid

516

d(x = 0+,s)
dx
(e) Use this jump condition to find
(x,s) valid for all x.
(t) Invert this transform and show
1

Processes

d(x = 0-,s)
dx

the last constant and obtain the

-x2/(4Dt)

0 < t,

e
c(x,t) = 2 7TDt

full transfo
rth

00< x < 00

State which inversion formula you are using.

(5.79)

(g) Compute the mean square displacement for this concentration profile.

p(x,t) =c(x,t)

p(x, t)x 2dx


Exercise 5.7: Random walk in one dimension
Prepare a simulation of a random walk in one dimension for D = 2. Start the particles
at x = 0 at t = 0 and simulate until t = 1000.
(a) Showthe trajectories of the random walks for five particles on the same plot.
(b) Plot the mean square displacement versus time for 1000 particles. Compare
this
result to the analytical solution given in Exercise 5.6(g). Describe any differences.

(c) Plot the histogram of particle locations at t = 1000 for 1000 particles. On
the
same plot, compare this histogram to the analytical result given in (5.79).Describe any differences.

Exercise 5.8: More useful integrals


Use the definition of the complete gamma function and establish the followingintegral
relationship
x P e -aXn dx =

For the case n = 2, this relation reduces to


x P e -aX2 dx =

(5.80)

which proves useful in the next exercises.

Exercise 5.9: Diffusion equation in cylindrical coordinates with Laplace


transform
Consider the diffusion equation in cylindrical coordinates with symmetry in the 0 coordinate

c
-
= Y rD
t rr
r 0 < t, 0 < r < 00
We wish to calculate the response c(r,t) to an impulse source term at t

erases
5

transform of the diffusion equation with this initial


Laplace
condition

tile

1 d dC(r,s)

(3)Rdes0W

-(r)

dr

D r dr

(5.81)

linearly independent solutions to the homogeneous


equation?
are
linearly independent solutions is bounded for r 00?Use
two
this
e
determine one of the unknown constants.
tile two

terval containing zero to obtain

d(r, s)

jump
Usethis

a condition on

d(r s)
r
lim
r-0-

condition to find the second constant and obtain the transform

(r,s) = 2TTD

transform and show


this
_r2/(4Dt)
(f)Invert
=
0 < t, 0 < r < co
t)
c(r,
47TDt
inversion formula you are using.
which
state
mean square displacement for this concentration profile
the
(g)compute
r 2c(r, t)r dr dO

5.10: Diffusion
Exercise
transform

the diffusion
consider
coordinates

equation in spherical coordinates with Laplace

equation in spherical coordinates with symmetry in the O and

c
= A @r 2D
t r 2 r

0 < t,

0 < r < 00

response c(r, t) to an impulse source term at t = 0,


Wewishto calculate the
(a) Takethe Laplace transform of the diffusion equation with this initial condition
and show

r2dr

s(r,s) = (r)

4Trr2

(5.82)

) Whatare the two linearly independent solutions to the homogeneous equation?


Q)Whichof the two linearly independent solutions is bounded for r - 00?Use this
reasoningto find one of the unknown constants.

l) Integrate(5.82) across a small interval containing zero to obtain a condition on


thechangein the first derivative

lim r2

lim r2

518

Stochastic Models
and P

rocesses

(e) Use this jump condition to find the second constant and obt
ain the
valid for all r.
full

transfo

rttl

(f) Invert this transform and show


1

-Y2/(4Dt)

0 < t, 0 < r <


00

State which inversion formula you are using.

(g) Compute the mean square displacement for this concentration


profile

(r2) = 4TrJ r 2c(r,t) r 2 dr

Exercise 5.11: Probabilty distributions for diffusion on the plane


This exercise prosides another view of the issues raised in Example 5.4.

Consideragain

subject to a unit impulse at the origin at t = 0.


We consider solving this equation in the plane using both rectangular
COOrdinates
(x, y) and polar coordinates (r, O).
(a) Using rectangular coordinates, let p(x,y, t) satisfy (5.30)
p

2p

2p

y2

X2

with initial condition

p(x,y, t) = (x)(y)

t =0

Solve this equation and show


p(x,y,t)

47TDt

(5.83)

Notice this p (x, y, t) is a valid probability density (positive, normalized).


(b) If we consider the two components (x, y) as time-varyingrandom variables
the probability density given by (5.83)

x
1

47TDt

then we say is distributed as follows


N(0, (2Dt)1)
rectangular
in which I is a 2 x 2 identity matrix. The position random variable in
coordinates is normally distributed with zero mean and covariance (2Dt)I.

519
erases

Nest

ave

define

random
new
a

variable,

for

tan-I (y/x)

17

Y
df-l(r7)

cos O

sin O

r sin O

r cos O

r cos O
r sin O

f-l(n)
7

probability density of a transformed random variflnding the


for
rule
1
Use the
and show
re -r2/(4Dt)
t) =
O,
5
pn(r,
able
4TrDt

denoted PP in Example 5.4. Calculate the marginal pr by


quantity
is the
and show
integration
re -r2/(4Dt)

21)t
are both well-defined probability densities (positive, normalized).
these
Notethat the probability density of the pair of random variables (r, O),and the
Thefirst is marginal density of the random variable r for particles undergoing
secondis the motion.

the

Brownian

Meanand variance of the Poisson distribution


5.12:
Exercise
random variable Y has the Poisson density

that discrete
Given

parameter
0 1,.. ., and

py(n) =
a e R

0, show that
var(Y) = a

5.13:Alternate derivation of Poisson process density


Exercise

the Poissonprocess probability Pr(Y(t) = n) for n 0. Showthat


consider
Pr(Y(t) = n) = Pr(Y(t) n) Pr(Y(t)n+ 1)
Youmaywant to review Exercise 4.1 (a). Using the definition of the event time Tn, show
that

(t)dt
(t)dt PTn+1

Substitute
(5.49)and use integration by parts to show (5.50)
(t)7t At

Pn(y)=

(y))

l. see (4.23).

Stochastic Models
and

520

Processes

Exercise 5.14: Generatingsamples from an exponential distribution

Let random variable u be distributed uniformly on [0, ll. Define ran dom
variable
the transformation
1

Inu

T=

Show that T has density (see Section 4.32)

PT(t) = Re-At
Thus uniformly distributed random samples can easily be transformed
nentially distributed random samples as required for simulating Poisson

Exercise 5.15: State-spaceform for master equation


Write the linear state-space model for the master equation in the extent of
the
describing the single reaction
reaction

A+B2C
(5.84)

Assume we are not measuring anything.


(a) What are x, A, B, C, D for this model?

(b) What is the dimension of the state vector in terms of the initial

numbersof

Exercise 5.16: Properties of the kinetic matrix


(a) Show that for a valid master equation the row sum is zero for each
columnof
the A matrix in (5.56).
(b) Show that this result holds for the A given in (5.55) for the reaction A +
B=c.
(c) What is the row sum for each column of the Ao matrix in the sensitivity equation?
Show this result.

Exercise 5.17: Reactionprobabilities in stochastic kinetics


Consider a stochastic simulation of the following reaction
kl

aA+bB=cC+dD
(a) Write out the two reaction probabilities hi (n j), i = 1, 1considering the forward
and reverse reactions as separate events.

(b) Compare these to the deterministic rate laws ri (cj), i = 1, 1for the forwardand
reverse reactions considered as elementary reactions. Why are these expressions
different? When do they become close to being the same?

521

5,

5.1B

of }ter

ea

ise

the mean concentration

reaction
irreversible
simple
A h B
Te

C tile

of

evolution

r = kitA

molecules and k is a rate constant. The reactor volume


p (TIA,t), the probability that the reactor volume

(n A, t) defined? Call this set N. Write the evolution


(3) equation
pefill

e the

A's
mean of

probability density by
nAp(nA, t)
(nA(t)) =

the evolution of the probability density, write an evodefinitionand(t)). The probability density itself should not appear in
(
mean.
equation for
equation for the
evolution
the usual mass action kinetics
tile

for nonlinear kinetics6

c simulation
Stochasti
5.20:
Lise reversible,second-order reaction
the
consider
C r = klCACB- k-1Cc
A+B
deterministic
(a)solvethe
with

material balance for a constant-volume batch reactor

kl = 1 L/molmin

1<-1= 1 mm

cc (0) = 0 mol/L
CB(O)= 0.9 mol/L
CA(0) = 1 mol/L
concentrations out to t = 5 min.
Plotthe A, B, and C
simulation using an initial condition of 400 A,
(b)Comparethe result to a stochastic
360Band zero C molecules. Notice from the units of the rate constants that kl
shouldbe divided by 400 to compare simulations. Figure 5.17 is a representative
for one sequence of pseudorandom numbers.
comparison
(c)Repeatthe stochastic simulation for an initial condition of 4000 A, 3600 B, zero C
molecules.Remember to scale kl appropriately. Are the fluctuations noticeable
withthis many starting molecules?
6seealsoExercise4.17 in (Rawlings and Ekerdt, 2012)

522

Stochastic

Models

Pto

cesses

0.8

0.6

0.4

0.2

cc

time (min)

Figure 5.17: Deterministicsimulationof reaction


A + B= C
compared to stochastic simulation
starting with 400
A

Exercise5.21: Whathappened to my rate?

Consider a well-mixedcontinuum setting in which we


have
centrations of reacting molecules of two types, A and B, positive, real-valued
con.
Let the concentration of A and B molecules in the volume as depicted in Figure5.18,
of
CBC).
Consider the three possible irreversible reactions betweeninterest be denotedCAO,
these speciesusingthe
elementary rate expressions
A+A

2
rl = klCA
(3.85)

Consider also the total rate of reaction


r = r 1 + r 2+Y3
(a) If the A and B species are chemically similar so the different reactions'rate
constants are all similar, kl =
=
= k, and the concentrationsof AandB
are initially equal, the total rate is given by

r = 3kc2A0
But if we erase the distinctions between A and B completely and relabeltheB
molecules in Figure 5.18 as A molecules, we obtain the new concentrationsofA

523
Exercises

0B
o
o

o
o

CBO

o
O

Species A and B in a well-mixed volume element. Con5.18:


Figure
tinuum and molecular settings.

and the total rate is then


CA= 2CA(),CB = 0
as
B
and
+ loc;
r = klCA
2 + 1<2cAcB

r = 4kCA0
two total
Whyare these

rates different and which one is correct?

analysis of the reaction rates if we reduce the length scale and


(b) Repeatyour molecular kinetic setting in which we have integer-valued n A(),
considerthe
the volume of interest.
moleculesof A and B in

(c) performa

stochastic simulation of the molecular setting using the following

parameters

nAO= 50

=
nB()= 60 nco =
kl = = = k = 10sec-1

=0

code.
Makea plot of all species versus time. Print the plot and the simulation

Exercise5.22:Cumulativedistribution for the omega expansion

density in the omega expansion


Giventhe governingequation for the fluctuation
ll=

2kcE Il) +

(kc2 n)

Definethe cumulative distribution

(a) Derivethe PDEgoverningF's evolution. What are the correspondingboundary


conditionsand initial condition?

Stochastic Models

524

and P

rocesses

VAi

Figure 5.19: Molecular system of volume V containing

moleculesof

(b) Solve this PDEnumerically and compare to Figure 5.15 in the


text.
Increase
holding co = no/Q fixed and describe the effect on F.
Q

Exercise 5.23: Properties of the Maxwell-Boltzmann distribution

Consider the simple molecular system depicted in Figure 5.19 with a


large number
of ideal gas molecules of species A with molecular weight m A. The
system
volume
1, 2, ...n. A velocity vector is
denoted v
vx vy vz with corresponding x, y, z components. These velocities
are consid_
ered samples o a random variable with fixed and known distribution.
The Maxwell-Boltzmanndistribution for the zero mean fluctuation velocity
in an
ideal gas is
2 TrkBT

in which m is the molecule mass, T is absolute temperature, and kB is the Boltzmann


constant, kB = R/NAv. This distribution is a multivariate normal with zero meanand
I, which we write as
variance matrix

Denote the A species mean velocity (drift term) as VA. The A molecule velocitiesare
then distributed as

t'Ai

kBT

VA,1

all i

(5.86)

Starting from the distribution (5.86), derive the following expectations in terms of the
mean species velocity VAand kB,T, m A.

1. (vAi)
2. (VAiVAi)
2 in which v
3. 'E(vAi)

4. (VAiVAi)

= t'Ai VAi = VAi VAi

525

Exercises

24: The normal's

tor.

(a) For

(5.64), use

properties for optimal linear estimation


used in deriving the

linear estima-

the independence of y to establish that

both
and divide

= px,z(x, z)py (y)


sides by p: (z).

we are given that (x, z) is jointly distributed as


(5.65),
(b) For
X mx
PX Pxz

px,z(x,z) = n

linear transformation
Considerthe

Y
z

and show

that

conditional
Nowuse the

Amx
mz

A ox
01
z
APxAT APxz
pzxAT

density formula to obtain pylz.

note that this property is derived in Example4.20.


(c) For (5.66),

Observability, controllability, and duality


Exercise5.25:

of controllability presented in Exercise 1.26. Show that (A, C) is


Reviewthe concept
if and only if (AT,CT) is controllable. This result marks the beginningof
observable
story of the duality between regulation and estimation.
theinteresting

with N measurements
Exercise5.26:Observability

Considerthe linear system

x + = Ax

y = Cx

text that if x(0) cannot be uniquely determined by n


provethe statement made in the
... y (n 1) }, then it cannot be determined by N measuremeasurements{y (0),

ments for any N.

Bibliography

K. J. strm. Introduction to Stochastic Control


Theory.

D. P. Bertsekas. Dynamic Programming. Prentice-H


all,
New Jersey, 1987.

Of
eti%
Academic

Inc.,

Englewood

R. N. Bhattacharya and E. C. Way-mire. Stochastic


Pr
Society for Industrial and Applied Mathematics, ocesses With

Cliffs

Philadelphia, Applications
2009
A. E. Bryson and Y. Ho. Applied Optimal Control.
Hemi
sphere

York, 1975.

Publishing,

W. M. Deen. Analysis of Transport Phenomena.


Topics in chemical

New

engineering.

A. Einstein.
ber die von der molekular
Wrme geforderte Bewegungvon in ruhenden -kinetischen Theorie
Flssigkeiten
der
suspendierten
C. W. Gardiner. Handbook ofStochastic Methods for
Physics, Chemistry,
Natural Sciences. Springer-Verlag, Berlin, Germany,
and the
second edition,
1990

M. A. Gibson and J. Bruck. Efficient exact stochastic


simulation of chemical
systems with many species and many

channels. J. Phys. Chem.


A, 104:

D. T. Gillespie. Exact stochastic simulation of coupled


chemical reactions.J.
Phys. Chem., 81:2340-2361, 1977.
A. H. Jazwinski. Stochastic Processes and Filtering Theory.
AcademicPress,
New York, 1970.
T. Kailath. A view of three decades of linear filtering theory. IEEETrans. Inform.

Theory,

March 1974.

R. E. Kalman. A new approach to linear filtering and predictionproblems.


Trans. ASME,J. Basic Engineering, pages 35-45, March 1960.

526

527
Bibliography

Numerical solution of stochastic differential equaand E. platen.


Kloeden Verlag, Berlin, 1992.
Springer
tions.
v. Interpolation and extrapolation of stationary random seN.

KolrnogoroMoscow Univ., USSR,Ser. Math. 5, 1941.


Bull.

quences.

between stochastic and deterministic models for


The relationship Phys.,
1972.
reactions. J. Chem.

nd J. G. Ekerdt. Chemical Reactor Analysis and Design Fundaa


Rawlings
publishing, Madison, WI, second edition, 2012.
J. B.
Nob Hill
mentals.
edictive Control: Theory and Design.
Ray. Advan
W.H.

ced Process Control. McGraw-Hill,New York, 1981.

Mathematical Control Theory. Springer-Verlag, New York, second


sontag.
E.p.
1998.
edition,

R.

Op timal Control and Estimation. Dover Publications, Inc., 1994.


Stengel.
F.

Kampen.
N.G. van

Stochastic Processes in Physics and Chemistry. Elsevier

Amsterdam, The Netherlands, second edition, 1992.


sciencePublishers,
Extrapolation, Interpolation, and Smoothing of Stationary Time
N.Wiener.The
Applications. Wiley, New York, 1949. Originally
series with Engineering
MITRad. Lab. Report in February 1942.
issued as a classified

MathematicalTables

A. 1 Laplace Transform Table


The Laplace transform pairs used in the text are collected

stated.

f(t)
1
2
3

af(t) + g(t)

Page

cxf(s) + (s)

105

df(t)

dt
d2f(t)

105
s 2f(s)

s f (0)

dttf(t)

n- t

-f(sj

6 tnf(t)

e-aSf(s)
eatf (t)

i
f( - l ) (0)

105
105

105

dsn

7
8

in

TableA
they are
derived or 1
first

105, 225
106

106

t')dt'

106, 223

continued on next page

528

apiace
L
I
A,

from previous
continued

10
11

12
13

529

Transform Table

lini f(t)

lini f(t)

page

f(s)

initial value theorem

limsf(s)

106, 224

final value theorem

limsf(s)t

106, 224

00

00

107

113

fl(t)
(t)

113
1

107
107

s?t+l

16

18

page

107
-1

eAt A e
1

109

107

19

teat

20

sin (0t

(0
s 2 + (02

107

21

cos (0t

s2 + (.02

107

22

Sinh (0 t

s 2 (0 2

23 cosh wt

s 2 (0 2

(0

24

eat sin cot

(s

25

eat COSwt

(s
P(s)
a(s)

26

p(sn)

q (sn) simple zero

n=l

27 E

E anit

28

e-qr

n=l
2

7Tt 3

q(sn) zero of order rn

107
107

+ (02 107
+ (02 107

a(s)

e-k$ k > 0

308
308
330

continued on next page

530

Mathematic
QI

Qbles

continued from previous page

f(t)
1

29

30

e-T

erfc (2ka

ekv/

31

k (N
erfc
ekvaerfc

32
33

34

e-.f
k2

331

sinh(xv'k)
Sinh

35 1-2 E

37

e-

n=l

36

Ko(kv)

sinh(xvk)

(1 ) tl + l rn

n2Tr2t
sin(nrx)
e
nrrx

(l)tt+l

1-2 E
n (n + 1/2)TT

1-2 E
anJ1 (an)

38 2

39

n=l

sin(nrx)

e -(n27T2+k)t

s sinh s + k

333

sinh(xv)

xs Sinh

335

+ 1/2)7TX)
e

cosh(xx/)
S cosh

333

e-ant

10 (x VS)

n+1sin(nra) sin(nrb) cos(nrt)


(1)

n7T sin(nrra) sin(nTrb)sin(nrt)

slo(v)
sinh(as) sinh(bs)
sinhs
sinh(as) sinh(bs)
s sinhs

Table A. 1: Larger table of Laplace transforms.

t Final value exists if and only if sf (s) is bounded for Re(s) 0.

ani =

331

335
314
341

statistical

531

Distributions

Statistical

Distributions

distributions that have been discussedin the


probability
different
in Table A.2 with a reference to the page in the
ummarized
Tile s
are
first mentioned.
are
they
test
t where
Density

Page

p(x)-l/(b-a) xe [a,b]

382

pistribution

uniform

I (x-m) 2

p(x)
normal

multivariate

352

p(x) = -------r-rexp

358

normal
exponential

Axxo,

p(x)

11>0

478
481

Poisson

12

441

rot/2)
chi

chi-squared

p(x) =

n/2-1e-X/2

rot/2) x
(xn)nmm

(xtt+tn)l+tn

p(x)

441

x 20, nl

x0, n,ml

442

n+l

Student'st
multivariate t

x
n

p(x) =
p(x) = (nr)P/2

Wishart

441

m))

1211/2

IR12rp(2)

e-ltr(R-lX) X > 0

412

12

Maxwell

X
p(x) = x 2e-2

Maxwell-

pu(ux, uy, uz) =

524

+19

Boltzmann

Table A.2: Statistical distributions defined and used in the text and
exercises.

447

524

532

Mathematical
Qbles

A.3 Vector and Matrix Derivatives


Definition. First consider s(t), areal-valued, scalar fu
valued scalar, s : R R. Assume the derivative,

wish to extend the definitionof the derivativeto vect


valued functions of vector and matrix-valuedargument?r
e functions
respect to scalars, vectors, and matrices, can be conveniently
With
expressed
using the rules of vector/matrix operations. Other

derivative

ed functions

with
vectors and matrices, produce tensors having more than respectto
two indices
ulas that can be
in matrix/vector calculus. To state how the derivatives expressed
are
into vectors and matrices, we require a more precise notationarranged
used in the text. Moreover,severaldifferent and conflictingthan we
tions are in use in different fields; these are briefly described
A.3.1. So we state here the main results in a descriptive in Section
expect the reader can translate these results into the notation,and

conventionsof

We require a few preliminaries. Nowlet s(x) be a


function of vector x, s : [VI-YR. Assume that all partialscalar-valued
derivatives
s/xi,i 1,2, ... , n exist. The derivative ds/dx is then
definedas
the column vector
s
X1

ds

X2

dx

scalar-vector derivative

xn

The derivative ds/dx T is defined as the corresponding row vector

ds

s
X1

X2

xn

and note that (ds/dx) T = ds T/dx T = ds/dx T. Next let s(A) be a


scalar-valued function of matrix A, s :
R. Again, assumingall
I AIIof the formulas in this section are readily extended to complex-valuedfunctions
of a complex variable.

vector

al

atrix
and M

derivatives
s

Derivatives

ds/dA is then defined as


exist, the derivative
s

A1n

s
A2n

,A11

ds

A21

scalar-matrix derivative

533

JAtl

case, we define ds/dAT as the transpose of this result,


vrector
illthe
(ds/dA)T. These more general matrix derivatives do
ds/dA the two vector derivatives previously defined.
to
specialize
vector-valued function of a vector, f (x). Let f :
the
is
Nestup
of most interest is usually the Jacobian matrix, which
quantity
The
df/dxT, defined by
by
denote
fl
fl fl
B}21
X1

dxT

X2

X/{

f2

xn

X2

fm fm
X1

vector-vector derivative
(Jacobian matrix)

xn

X2

serves as a convenient reminder that the colThenotationdf/dxT


distributed down the column and the row vector x
umnvectorf is
the entries

in the Jacobian matrix. The


the row in
isdistributedacross
of the Jacobian is simply dfT/dx = (df/dxT)T, which is
transpoSe
easyto remember.Note that df/dx is a long column vector with mn
entriescomingfrom stacking the columns of the Jacobian matrix. This
have
isthevecoperator, so we

df

dx

Thetranspose,denoted dfT /dxT, is a long row vector. These vector


of the derivatives are not usually of much interest comarrangements
paredto the Jacobian matrix, as we shall see when we discuss the chain
rule.

Innerproduct. The inner product of two vectors was defined in Chapter I

n
(a, b) = a T b =

aibi

a, b e

534

Mathematical
Qbles

We can extend this definition to linear spaces of m

atrices as

follows

Because tr(C) = tr(CT) for any square matrix C, the


matrix inner
uct can also be expressed as (A,B) = tr(BTA), which is
valid also
in th

Chain rules. One of the most important uses of these


mulas is a convenient expression of the chain rule. Forderivative
scalar-value
d

ds
ds

(YR, ax)

ds

dx

dA) = tr ( {SAT

scalar-vector
scalar-matrix

Notice that when written with inner products, these two


formulasare
identical. The vector chain rule can be considered a special
caseof
the matrix chain rule, but since the vector case arises
frequently
applications and doesn't require the trace, we state it separately. in
For
vector-valued functions we have one additional form of the
chainrule

dx

vector-vector

which is a matrix-vector multiplication of the Jacobian matrix of f

with

respect to x with the column vector dx. Becausedf is a vector,this


chain rule is not expressible by an inner product as in the scalarcase.

But notice the similarity of the vector chain rule with the second equalities of the two scalar chain rules. Because of this similarity, all three
important versions of the chain rule are easy to rememberusingthis
notation. There is no chain rule for matrix-valuedfunctions that does
not involve tensors.
Finally, we collect here the different matrix and vector differentiation formulas that have been used in the text and exercises. Theseare
summarized in Table A.3, with a reference to the page in the text where
they are first mentioned or derived.

vector

and

Derivative

cis

(chainrule l)

dx

ds
--dA)
ds tr

df

Page

Formula

ds dxT

535

Derivatives
trix
Ma

(chainrule 3)

----dx
dxT
g

lix

(chain rule 2)

dgT

(product rule)

+ f

dx

d x Tb = b

dx

d bTx b
6

dxT

d x TAx = Ax + ATx
dx

dx

10

p(A) =
dt

q() = p()

= det(A)tr A-IA), detA*0


11 detA
dt
dt
12 tr(p(A))

q() =

detA = (A-I ) TdetA,

14

In(detA)

15

T
tr(AB) = tr(BA) = B

16

tr(A TB) = tr(BA T)

= (A-I ) T, detA

328
431

detA

13

328

431

431

continued on next page

536

Mathematical
Qbles
continued from previous page
Derivative Formula

Page

tr(ABA T) = A(BT + B)

18

tr(A TBA) = (B + BOA

Table A.3: Summary of vector and matrix derivatives


used in the text and exercises; s, t e R, definedand
x, b
f() and g() are any
differentiable functions, and p() is any matrix
function defined

A.3.1 Derivatives: Other Conventions

Given the many scientific fields requiring vector/matrix


chain rules, and so on, a correspondingly large number derivatives
of different
and conflicting notations have also arisen. We point out
here someof
the other popular conventions and show how to translate
them intothe
notation used in this section.
Optimization. The dominant convention in the optimization
field
to define the scalar-vector derivative ds/dx as a row vector instead is
of
a column vector. The nabla notation for gradient, Vs, is then used
to
denote the corresponding column vector. The Jacobian matrix is then

denoted df/dx.
literature

So the vector chain rule reads in the optimization

dx
dx

optimizationconvention

Given that ds /dx is a row vector in the optimization notation, the first
scalar chain rule reads
ds =

ddSx)T

ds

-
dx

dx

optimization convention

vector

trix
and Ma

Derivatives

537

with adopting the optimization field's conventions


lem
prob
iggest considering the scalar-matrix derivative. The derivative
meaning in the optimization literature as that used
same
bas the
test.
d
s(iS

test

So th

optimization convention

ds dA)

in the chain rules: the scalar-matrixversion

sistency
incon
the
contains a transpose and the scalar-vector and vectorNotice
do not. The burden rests on the reader to recall these
the chain rule and remember which ones require the
forms of
of the notation used in this section is that all
The advantage
transpose.
with a transpose, which is what one might anticipate
appear
rules
chairlchain rule's required summation over an index. Also, in the
the
dueto used in this section, the V operator is identicalto d/dx and
notation
a transpose should be taken. Finally,there is no hint
implies
neither
optimization

and which a row vector. The notation used in this


vector
bea column and ds/dxT, makes that distinction clear.
ds/dx
section,
of physics (transport phenomena, electromagnetism)
theories
Field
Chapter 3, the literature in these areas primarily uses Gibbs
in
noted
AS
notation and index notation. For example, the derivative
vector-tensor
functions with regard to a vector argument x is
scalar
a
of
in the

vs or x
In Cartesian

coordinates

x i
co

Xi

Thederivativeof a scalar s with respect to a tensor argument is similar


s
Thederivativeof a vector function f with respect to a vector x is
Vf

or

where

( Vf)ij =

x ij

Xi

538

Mathematical

Tables

so
is the transpose of the Jacobian. Therefore, the
chain
becomes
rule
T
vf
=

(Vf)
df = dx
dx
Consistent with this notation, one can write the Taylor-series
sion of a vector field f around the origin as
expans
f (x) = f (O) + x Vf + xx : V Vf +

where derivatives are evaluated at the origin and


2fi
- XjXk
(VVf)jki

One must beware, however, that this ordering of indices is


not used
universally, primarily because some authors write the Taylor
expansion

f (x) = f (O)

x + K: xx

where
XJ

Kijk

XjXk

A.4 Exercises
ExerciseA.l: Simpleand repeated zeros
Assume all the zeros of q(s) are first-order zeros, rpt = 1, n = 1, 2, ... , m, in entry 27
of Table A.l, and show that it reduces to entry 26.

Exercise A.2: Deriving the Heaviside expansion theorem for repeated roots

Establish the Heaviside expansion theorem for repeated roots, entry(27) in TableA.l.
Hints: Close the contour of the inverse transform Bromwichintegral in (2.7)to the
left side of the complex plane. Show that the integral along the closed contour except
for the Bromwichline goes to zero, leaving only the residues at the singularities,i.e.,

the poles s sn, n = 1,2,...,m. Since


has no singularities,expandit in a
Taylor series about the root s = sn, Find the Laurent series for f (s) and showthat

the residues are the coefficientsain given in the expansion formula. Note that this
procedure remains valid if there are an infinite number of poles, such as the case with
a transcendental function for Q(s).

539
4

Exercises

relations
Laplacetransform
Table A.l and show that it produces entry 35.
A.3:
in entry 34 of

lirnit k

invalid derivative formulas


Some
AA:

--bTx = b

dx

x Tb

bT

db transposing the scalar numerators in entries 5 and 6, respecdo not find companion forms for these listed in Table A.3

co u
matrix B replaang
b with general matrix B above does not generate
at simply replacing

show th
and
tises formulas
correct

d BTx
ax

xTB

to use the vec operator to express the correct formulas. Next


want
may
you
Notethat the correct matrix versions of these derivatives do reduce to the above forshowthat B b, a column vector.
mulasfor

Companion trace derivatives


A.5:
Exercise
to establish that Formulas 15 and 16 in Table
the fact that tr(AB) = tr(BA)
(a) Use
formulas, i.e., assuming one of them allows you to establish
A.3 are equivalent
the other one.

show that Formulas 17 and 18 are equivalent by taking trans(b) on the other hand, to produce the other one.
posesof one of them

Potrebbero piacerti anche