Modeling and Analysis of Principles For Chemical and Biological Engineers

Modelingand Analysis Principles
for
Chemicaland BiologicalEngineers
Michael D. Graham
James B. Rawlings
p(E,t)
0/
Publis ing
Modeling and Analysis Principles

for
chemical and BiologicalEngineers
Michael D. Graham and James B. Rawlings
Department of Chemical and Biological
Engineering
University of Wisconsin-Madison
Madison, Wisconsin
Publishing
Madison, Wisconsin
and printed
in Lucida using LATEX,
set
was
and bound
book
This
by
Cover
ham
M. and
design by Cheryl
James B. Rawlings, and

Michaelb.
by Nob Hill Publishing, LLC

CopyTight 2013
All rights reserved.
Nob Hill Publishing,
LLC
Cheryl M. Rawlings,
publisher
Madison, WI 53705
orders@nobhi11publ i shi ng.com

i shi ng.com
http : //www.nobhi publ
reproduced, in any form or by
any means
No part of this book may be
the
from
publisher.
writing
without permission in
Library of Congress Control Number: 2012956351

Graham, Michael D.
ModelingandAnalysis Principles for
Chemicaland Biological Engineers \
by Michael D. Grahamand James B. Rawlings

cm.
Includes bibliographical references (p.) and index.

ISBN 978-0-9759377-1-6 (cloth)
cal modeling.
1. Chemicalengineering. 2. Mathemati
I. Rawlings, JamesB. 11. Title.
Printed in the United States of America.
First Printing
May 2013
FSC
www.fscorg
MIX
Paper from
responsible sources
FSC' C002589
To my father and the memory of my mother.

MDG
Tomy graduate students, who have been some of

my best teachers.
JBR
Preface
by modern chemical and biological engineers
undertaken
.
Research
mathematical principles and
of
range
methods
corporates a wide
struggled
to
incorporate
authors
as the
book came about two-semester course sequence for new modernlip
or
graduate
ics into a oneof
aspects
essential
traditional
losing the
mathemah.s
dents, while not
decided
are
we
that
particularly
Topics
important
cal modeling syllabi. traditional texts include: matrix
in
factorizations
but not represented
basic
decomposition,
qualitative
value
dynamics
such as the singular
integral
representations
equations,
of partial
of nonlinear differential
stochastic
and
probability
processes, and
state
differential equations,
the
in
more
many
book.
find
will
reader
Thesetopics
estimation. The
often
have
which
texts,
a
many
bias towardthe
are generally absent in
early 20th-century physics. Wealsobe.
mathematics of 19th- through
substantial interest to activeresearchers
lieve that the book will be of
survey of the applied mathematics COmmonly
as it is in many respects a
engineering practitioners,and
encountered by chemical and biological
certainly absent in their chemicontains many topics that were almost
cal engineeringgraduate coursework.
Due to the wide range of topics that we have incorporated, the level
of discussion in the book ranges from very detailed to broadly descriptive, allowingus to focus on important core topics while also introducing the reader to more advanced or specialized ones. Someimportant
but technical subjects such as convergence of power series havebeen
treated only briefly,with references to more detailed sources. Ween-
courageinstructors and students to browse the exercises. Manyof

these illustrate applications of the chapter material, for example,the
numerical stability of the Verlet algorithm used in molecular dynamics

simulation. Others deepen, complement, and extend the discussionin
the text.
Duringtheir undergraduate education in chemical and biological

engineering,students become very accomplished at numericalexamples and problem solving. This is not a book with lots of numerical
examples. Engineeringgraduate students need to make the shiftfrom
applyingmathematical tools to developing and understandingthem.
As such, substantial emphasis in this book is on derivations and some-
vii
we believe the text contains a healthy mix of funshort proofs.

mathematics, analytical solution techniques, and numerical
Researche
and tools, because these guide analysis and underprinciples,

tures,
they also must be able to produce quantitative answers.
and
standing,
text will enable them to do both.
this
Wehope
methods.
JBR
MPG
Wisconsin
Madison,
Madison, Wisconsin
CO S
Acknowledgments
notes for graduate level analysis
lecture
the
of
out
Thisbookgrew the authors in the Department of Chemical and Biocoursestaughtby
the University of Wisconsin-Madison. We have
at
Engineering
logical
many graduate students taking these
of
feedback
the
benefitedfrom
with which they received some
enthusiasm
the
appreciate
classes,and
the notes. Especially Andres Merchan,
earlyandincompletedrafts of
helpful discussion and
provided
KushalSinha,and Megan Zagrobelny
assistance.
Wehavehad numerous helpful discussions with colleagues on many
topicscoveredin the text. JBRwould like to acknowledge especially

DaveAnderson,David Mayne, Gabriele Pannocchia, and Joe Qin for
their interest and helpful suggestions.
Severalcolleaguesgave us helpful reviews of book chapters. We
wouldlike to thank Prodromos Daoutidis, Tunde Ogunnaike, Patrick

Underhill,VenkatGanesan, Dave Anderson, and Jean-Luc Thiffeault
for their valuablefeedback. We are also grateful to colleagues who
respondedto a surveythat we conducted to gather information on

mathematical
modelingcourses for chemical and biological engineer-
ing graduatestudents. Their

valuable feedback had significant impact
on the content of this book.
Severalmembersof our
helpedus typeset solutionsresearch groups also reviewed chapters, and
to some of the exercises. Anubhav, Cuyler
Bates,AnkurGupta,
Rafael Henriquez, Amit
Kumar, Jae Sung Park, and
Sung-Ning
Wangdeserve special
mention.
JohnEatongenerously
provided his usual invaluable computing and
typesetting
expertise.MDGis
anceduringthe
grateful to his family for their forbearpreparation
of this book. Our
the staff at Nob
special relationship with
HillPublishing
again made the book production process
contents
LinearAlgebra Linear
and
Spaces
1.1 VectorsSubspaces .
1.1.1
1.12 Length, Distance, and Alignment .
1.1.3 Linear Independence and Bases

Operators and Matrices
1.2 Linear
1.2.1 Addition and Multiplication of Matrices
1.22 Transpose and Adjoint .

12.3 Einstein Summation Convention
12.4 Gram-Schmidt Orthogonalization and the QRDe-
2
2
4
5
6
8
9
composition
10
1.2.5 The Outer Product, Dyads, and Projection Operators
11
1.2.6 Partitioned Matrices and Matrix Operations
12
1.3 Systems of Linear Algebraic Equations
1.3.1 Introduction to Existence and Uniqueness .
1.32 SolvingAx = b: LU Decomposition

1.3.3 The Determinant . .
16
18
1.3.4 Rank of a Matrix .

19
1.3.5 Range Space and Null Space of a Matrix
20
and
Uniqueness
in Terms of Rank and
1.3.6 Existence
Null Space .
22
1.3.7 Least-Squares Solution
22
27
1.3.8 Minimum Norm Solution
28
1.3.9 Rank, Nullity, and the Buckingham Pi Theorem .
1.3.10Nonlinear Algebraic Equations: the Newton-Raphson
30
Method .
1.3.11Linear Coordinate Transformations

1.4 The Algebraic Eigenvalue Problem
1.4.1 Introduction
1.42 Self-Adjoint Matrices
1.4.3 General (Square) Matrices
1.4.4 Positive Definite Matrices
1.4.5 Eigenvalues, Eigenvectors, and Coordinate Trans-
formations
ix
33
33
33
35
37
41
42
ontehts
Decomposition . . .
1.4.6 Schur
Value Decomposition
1.4.7 Singular
of Matrices
1.5 Functions
Polyomial and Exponential .
15.1
1.52 OptimizingQuadratic Functions
1.5.3 VecOperator and Kronecker Product of Matrices
52
32
1.6 Exercises .
Equations
2 OrdinaryDifferential
69
2.1 Introduction. .
2.2 First-OrderLinear Systems .
22.1 SuperpositionPrinciple for Linear Differential Equations
22.2 HomogeneousLinear Systems with Constant Coefficients
22.3 QualitativeDynamicsof Planar Systems

22.4 LaplaceTransform Methods for Solving the InhomogeneousConstant-Coefficient Problem
22.5 Delta Function

2.3 Linear Equations with Variable Coefficients
2.3.1 Introduction
2.32 The Cauchy-EulerEquation
2.33 SeriesSolutionsand the Method of Frobenius
2.4 Function Spaces and Differential Operators

2.4.1 Functions as Vectors
2.42 Self-AdjointDifferentialOperators and SturmLiouville Equations
2.4.3 Existenceand Uniqueness of Solutions
2.5 LyapunovFunctions and Stability

2.5.1 Types of Stability
2.52 LyapunovFunctions. . .
97
97
98
98
99
102
104
110
112
112
113
118
118
126
133
145
145
148
153
155
158
158
2.5.3 Applicationto Linear Systems

2.5.4 DiscreteTime Systems
2.6 AsymptoticAnalysisand Perturbation Methods
2.6.1 Introduction
2.62 SeriesApproximations:Convergence, Asymptotic158
ness, Uniformity
162
2.6.3 Scaling,and Regular and Singular Perturbations
165
2.6.4 RegularPerturbation Analysis of an ODE.
166
2.6.5 Matched Asymptotic Expansions .
xi
contents
174
Method of Multiple Scales
2.6.6
Dynamics of Nonlinear Initial-ValueProblems 179
Qualitativ e
179
Introduction
2.7
2.7.1
179
Subspaces and Manifolds
2.7.2 Invariant
183
Special Nonlinear Systems
2.7.3 Some
187
Behavior and Attractors
2.7.4 Long-Time
Fundamental Local Bifurcations of Steady States 193
2.7.5 The
200
Solutions of Initial-Value Problems . . .
Numerical
2.8
201
Methods: Accuracy and Stability
2.8.1 Euler
204
Accuracy, and Stiff Systems
2.8.2 Stability,
204
Methods
2.8.3 Higher-Order
208
Solutions of Boundary-ValueProblems
Numerical
2.9
208
of Weighted Residuals
2.9.1 The Method
220
.
2.10 Exercises
Calculus and Partial Differential Equations

Vector
3
and Tensor Algebra .
3.1 Vector
3.1.1 Introduction
3.12 Vectors in Three Physical Dimensions
and Integral Theorems
3.2 DifferentialOperators
32.1 Divergence,Gradient, and Curl
3.22 The Gradient Operator in Non-CartesianCoordinates
32.3 The DivergenceTheorem
32.4 Further Integral Relations and Adjoints of Multidimensional Differential Operators
3.3 Linear Partial Differential Equations: Properties and
Solution Techniques .
3.3.1 Classification and Canonical Forms for SecondOrder Partial Differential Equations . . .
3.32 Separation of Variables and Eigenfunction Expansion with Equations involving V2
3.3.3 Laplace's Equation, Spherical Harmonics, and the
HydrogenAtom
3.3.4 Applications of the Fourier Transform to PDEs .
3.3.5 Green's Functions and Boundary-ValueProblems
3.3.6 Characteristics and D'Alembert's Solution to the
WaveEquation
3.3.7 Laplace Transform Methods . . .
253
253
253
253
256
256
258
264
269
271
271
272
287
291
297
305
308
xi
contents
174
of Multiple Scales
2.6.6 Method
Dynamics of Nonlinear Initial-Value Problems 179
Qualitative
179
2.7
Introduction
2.7.1
179
Subspaces and Manifolds
2.7.2 Invariant
183
Special Nonlinear Systems
2.7.3 Some
187
Behavior and Attractors
2.7.4 Long-Time
21.5 The Fundamental Local Bifurcations of Steady States 193
200
Solutions of Initial-ValueProblems
Numerical
2.8
201
2.8.1 Euler Methods: Accuracy and Stability
204
2.82 Stability,Accuracy, and Stiff Systems
204
2.8.3 Higher-OrderMethods
208
Solutions of Boundary-ValueProblems
2.9 Numerical
208
2.9.1 The Method of Weighted Residuals
220
Exercises
2.10
and Partial Differential Equations

3 VectorCalculus
Algebra
3.1 Vectorand Tensor
3.1.1 Introduction
3.12 Vectors in Three Physical Dimensions
3.2 DifferentialOperators and Integral Theorems
32.1 Divergence,Gradient, and Curl
3.22 The Gradient Operator in Non-Cartesian Coordinates
32.3 The DivergenceTheorem
32.4 Further Integral Relations and Adjoints of Multidimensional Differential Operators
3.3 LinearPartial Differential Equations: Properties and
Solution Techniques . .
3.3.1 Classificationand Canonical Forms for SecondOrder Partial Differential Equations
3.32 Separation of Variables and Eigenfunction Expansion with Equations involving V2

3.3.3 Laplace's Equation, Spherical Harmonics, and the
HydrogenAtom
3.3.4 Applications of the Fourier Transform to PDEs .
3.3.5 Green's Functions and Boundary-ValueProblems
3.3.6 Characteristics and D'Alembert's Solution to the
WaveEquation
3.3.7 Laplace Transform Methods .
253
253
253
253
256
256
258
264
269
271
271
272
287
291
297
305
308
xii
3.4 Numerical Solution of Initial-Boundary-Value
Contents
Problems.
3.4.1 Numerical Stability Analysis for the

Diffusion
3.42 Numerical Stability Analysis for the
Convection
3.4.3 Operator Splitting for Convection-Diffusion
Prob-
3.5 Exercises
4 Probability, Random Variables, and Estimation

4.1 Introduction and the Axioms of Probability
4.2 Random Variables and the Probability Density
Function
4.3 MultivariateDensity Functions
4.3.1 Multivariatenormal density
4.32 Functions of random variables.
4.3.3 Statistical Independence and Correlation .
4.4 Sampling
316
320
323
325
347
347
349
356
358
368
370
374
4.4.1 Linear Transformation

375
4.42 Sample Mean, Sample Variance, and Standard
Error 379
4.5 Central Limit Theorems
381
4.5.1 Identically distributed random variables
383
4.52 Randomvariables with different distributions
386
4.5.3 Multidimensionalcentral limit theorems .
387
4.6 Conditional Density Function and Bayes's Theorem
388
4.7 Maximum-LikelihoodEstimation
392
4.7.1 Scalar Measurement y , Known Measurement Variance
394
y,
Unknown
4.72 Scalar Measurement
Measurement
Variance
4.7.3 Vector of Measurements y, Different Parameters Corresponding to Different Measurements,

Known Measurement Covariance R
Vector of Measurements y, Different Parameters Correspondingto Different Measurements,
Unknown Measurement Covariance R .
4.7.5 Vector of Measurements y, Same Parameters for
all Measurements, Known Measurement Covariance R
399
404
410
412
contents
4.7.6 Vector
all Measurements,
xiii
Unknown
Measurement
co-
PLS regression .
4.8 PCA and
4.9 Appendix Proof of the Central
Limit Theorem
Exercises .
4.10
Models and Processes

5 Stochastic
5.1
414
416
425
430
Introduction .
455
5.2 Stochastic Processes for Continuous

Random Variables.
5.2.1 Discrete Time Stochastic
456
Processes
456
5.22 Wiener Process and Brownian
Motion .
459
52.3 Stochastic Differential Equations
.
52.4 Fokker-Planck Equation
5.3 Stochastic Kinetics .

5.3.1 Introduction, and Length and Time
Scales .
5.32 Poisson Process

5.3.3 Stochastic Simulation .
5.3.4 Master Equation of Chemical Kinetics .
53.5 Microscopic, Mesoscopic, and MacroscopicKinetic Models
5.4 Optimal Linear State Estimation
5.4.1 Introduction
5.42 Optimal Dynamic Estimator .
5.4.3 Optimal Steady-State Estimator

5.4.4 Observability of a Linear System
5.4.5 Stability of an Optimal Estimator
5.5 Exercises
A MathematicalTables
A.l LaplaceTransform Table
A.2 Statistical Distributions ..
A.3 Vector and Matrix Derivatives ...
A.3.1 Derivatives: Other Conventions

A.4 Exercises .
AuthorIndex
CitationIndex
SubjectIndex
463
470
475
475
477
483
486
492
498
498
501
506
508
511
513
528
528
531
532
536
538
540
543
546
List of Figures
1.1 The four fundamental subspaces of matrix

A
1.2 Least-squaressolution of Ax = b; projection

of b into
R(A) and residual r = Axo b in
23
for solving
26
N (AT).
1.3 An iteration of the Newton-Raphson method
1.4 The four fundamental subspaces of matrix

A=
1.5 Convexfunction. The straight line connecting USVT
on the function curve lies above the function;two points
(1 a) f (y) f(cxx + (1 cx)y) for all x, y. cxf(x) +
1.6 Contours of constant f (x) x TAx.
1.7 Two vectors in R2 and the angle between them.
1.8 Experimentalmeasurements of variable y versus
1.9 Measuredrate constant at several temperatures. x.
1.10 Plot of Ax as x moves around a unit circle.
1.11 Manipulatedinput u and disturbance d combine
to affect
y.
output
2.1
R2x2
regimes for the planar system dx/dt =

Ax
2.2 Dynamicalbehavior on the region boundaries for the

planar system dx/dt = Ax, A e R2x2
2.3 Particle of mass m at position y experiences spring force
Kyand applied force F(t). .
2.4 Functionf (x) = exp (

and truncated trigonometric Fourier series approximations with K = 2, 5, 10.
The approximationswith K = 5 and K = 10 are visually
indistinguishable from the exact function.

2.5 Truncated trigonometric Fourier series approximation to
f (x) = x, using K = 5, 10, 50. The wiggles get finer as K
increases
2.6 Functionf (x) = exp ( 8x 2) and truncated LegendreFourier series approximations with n = 2, 5, 10.
2.7 Functionf (x) = H(x) and truncated Legendre-Fourier

series approximations with n
xiv
10, 50, 100.
31
so
59
60
69
74
76
84
86
103
104
107
123
125
127
128
Figures
of
List
xv
to the initial-value
problem with
2.8 Solution
nonhomogeboundary
conditions.
neous
behavior; stability (left) and

2.9 Solution
asymptotic stability
(right). .
2.10 A simple mechanical system with total energy E,
internal
energy U, kinetic energy T = (1/2)mv2,
and potential
energy K = tngh.
2.11 The origin and sets D, Br, V(shaded), and B.
2.12 Leading-order inner Uo, outer uo, and composite solutions uoc, for Example 2.30 with = 0.2,K = 1, and
= 1.
2.13 Examples of invariant subspaces for linear systems: (a) =
1,2= 0; (b) 1,2= l i,3 = l.
2.14 Invariant subspaces of the linearized system (a) and invariant manifolds of the nonlinear system (b).
2.15 Contours of an energy function v (Xl, x2) or H(XI, x2).
2.16 Energylandscape for a pendulum; H = p2 Kcosq ; K
-2.
4
for
H
2P2
Landscape
+ {q
2.17
2.18 A limit cycle (thick dashed curve) and a trajectory (thin
solid curve) approaching it.
2.19 Periodic (left) and quasiperiodic (right) orbits on the surface of a torus. The orbit on the right eventuallypasses
through every point in the domain
2.20 A limit cycle for the Rssler system, a = b = 0.2, c = 1.
2.21A strange attractor for the Rssler system,a = b = 0.2,

c = 5.7.
2.22 Bifurcation diagram for the saddle-node bifurcation

2.23 Bifurcation diagram for the transcritical bifurcation
2.24 Bifurcationdiagrams for the pitchfork bifurcation.
2.25 Approximate solutions to = xusing explicitand implicit Euler methods with At = 2.1, along with the exact
e
solution x(t)
2.26 Stabilityregions for Adams-Bashforth methods; = Rx.
2.27 Stabilityregions for Adams predictor-corrector methods;
= Rx.
2.28 Stabilityregions for Runge-Kuttamethods; = Rx
2.29 Hat functions for N = 2
2.30 Approximate solutions to (2.91)using the finite element
method with hat functions for N = 6 and N = 12. The
exact solution also is shown
139
146
149
151
173
181
182
184
188
189
191
192
194
194
196
197
198
203
207
208
209
212
213
List of Figures
for the
Legendre
-Galerkin ap217
223
228
10.
xvi
uses
corrector.
nth-order
predictor
and nth-order
point Xo.
a
around
size
er and eo.
zero
vectors
to
ng and unit
snrillkl O)
3.1
3.3
two- divergence.
3.4 A
of the
3.6
260
263
265
266
3.2
a
equationin
Laplace's
239
square
Original
(a)
domain.
prob-
spherical harsurface
parts of the
real
to right,
Y44.
left
Y43,
domain.
31 From Y40, Y41, Y42,
physical
< 0 rex
the
monics
domain
in
in the
and sink
wave
source
right-traveling
3.8 A
initially
opposite sign
3.9 An
with
"image"
distance for
left-traveling
penetration
a
of
position
membrane
versus
3.10Concentration
rate constants. and sphere.
reaction
different
of slab, cylinder,
heating
3.11 Transient domain..
3.12 Wavy-walled
density p; (x)
probability
distribution, with
4.1 Normal
m) 2/2). .
(1/dQF)
2. The contour lines show
=
n
for
normal
4.2 Multivariate 95, 75, and 50 percent probability
containing
ellipses
form x TAx = b.
quadratic
of
geometry
The
4.3
c
=
4.4 Theregion X(c) for y
random
4.5 Ajointdensityfunction for the two uncorrelated
variables in Example 4.8.
4.6 Anearlysingularnormal density in two dimensions
278
290
303
307
313
334
339
355
360
360
370
372
376
4.7 Thesingularnormalresulting from y = Ax with rank

deficientA
378
xvii
Figures
Li5t
Of
Histogram
samples of uniformly distributed x.

of 10,000
10
of y =
of 10,000 samples
Histogram
marginals, marginal box, and
4.9
multivariate normal,
4.8
Tile
4.10 bounding
box.
384
384
398
squares fitting error (top) and validation erof

Thesum for PCRversus the number of principal comror (bottom)
validation indicates that four principal
cross
,
S
421
pollent
best.
are
components uares validation error for PCRand PLSR
of sq
sum
principal components/latent variThe
4.12versusthe number of
required
note that only two latent variablesare
,
422
ables
principal components.
four
versus
dataset.
versus measured outputs for the validation
Predicted
PLSR
4.13 : pCRusing four principal components. Bottom:
latent variables. Left: first output. Right: sectwo
using
ond output..
undermodeling. Top: PCRusing three principal
of
4.14Effect
PLSRusing one latent variable.
components.Bottom:
apindicator (step) function fl (w; x) and its smooth
The
4.15
proximation,f (w;x).
strain versus time data from a molecular dynamTypical
4.16
data file rohit. dat on the website
ics simulationfrom
che. wi sc.edu/jbraw/pri nci pl es.
www.
errvbl s . dat on the
versus x from data file

4.17Plotof y che . wi sc . edu/Njbraw/pri nci pl es
websitewww.
approximation to a unit step function, H (z 1).

4.18Smooth
with fixed sample
5.1 A simulation of the Wiener process
timeAt = 10-6 and D = 5 x 105.
423
424
428
444
445
451
461
5.2 Samplingfaster on the last plot in Figure 5.1; the sam-
ple time is decreased to At = 10-9 and the roughness is

restored on this time scale.
5.3 Arepresentative trajectory of the discretely sampled Brownian motion; D = 2, V = 0, n = 500.
5.4 Themean square displacement versus time; D = 2, V = 0,
n = 500..
5.5 Twofirst-order reactions in series in a batch reactor, cAO
l, CBO CC() O, kl = 2,
5.6 A samplepath of the unit Poisson process.
462
469
469
477
479
xviii
list Of
5.7 A unit Poissonprocess with more events;

sample
(top)and frequency distribution of event times
ath
5.8 Randomlychoosing a reaction with appropriate
ity. The interval is partitioned according to
the
relative
5.9 Stochasticsimulation of first-order series reaction
A
100 A molecules.
starting
5.10 Masterequation for chemical reaction A + B
probabilitydensity at state E changes due to
forward
5.11 Solutionto master equation for A + B
starting
With
20 A molecules, 100 B molecules and 0 C
molecules,
k
1/20, k-l =
5.12 Solution to master equation for A + B

starting With
200 A molecules, 1000 B molecules and O C
molecules
5.13 The equilibriumreaction extent's probability density
for
Reactions5.52 at system volume Q = 20 (top)
and
200 (bottom). Notice the decrease in variance in the Q
reaction extent as system volume increases.
B for no = 500, Q = 500.

Top:
discrete simulation; bottom: SDE
5.14 Simulationof 2 A
simulation.
5.15 Cumulative distribution for 2 A
B at t = 1 with no =
500, Q = 500. Discrete master equation (steps) versus
omega expansion (smooth)
5.16 The change in 95%confidence intervals for R(klk) versus
489
490
491
493
497
498
time for a stable, optimal estimator. We start at k = O
with a noninformative prior, which has an infinite confidence interval

5.17 Deterministic simulation of reaction A +
compared
to stochastic simulation.
5.18 Species A and B in a well-mixed volume element. Continuum and molecular settings.
5.19 Molecularsystem of volume V containing moleculesof
mass mAwith velocity VAi.. .
512
522
523
524
Tables
of
List
1.1
2.1
2.2
2.3
Quadratic function of scalar and vector argument.

small table of Laplace transform pairs. A more extensive
table is found in Appendix A.
Laplace transform pairs involving (5and its derivatives.
The linear differential equations arising from the radial
part of V 2y y = 0 in rectangular, cylindrical, and spher-
icalcoordinates..
3.1 Gradient and Laplacian operators in Cartesian,cylindrical,and spherical coordinates. . .
Larger table of Laplace transforms.
Statistical distributions defined and used in the text and
exercises.
summary of vector and matrix derivativesdefinedand

used in the text and exercises.
62
107
113
119
263
530
531
536
List of Examples and Statements
1.1
1.2
1.3
1.4
Definition: Linear space

Definition: Subspace
Definition: Norm
Example:Common transformations do not
commute
Matrixidentities derived
1.5 Example:
with index
notation
1.7 Theorem: Existenceand uniqueness of solutio ns

for square
systems
1.8 Example: Linearly independent columns, rows of
a matrix
1.10 Example: The geometry of least squares .
1.11 Theorem: Self-adjoint matrix decomposition
1.12 Example: A nonsymmetric matrix .
1.13 Example: A defective matrix
14 Example:Vibrational modes of a molecule
1.15 Theorem: Schur decomposition
1.16 Theorem: Symmetric Schur decomposition
1.17 Theorem: Real Schur decomposition
1.18 Definition: Convex function
1.19 Proposition: Full rank of ATA
10
11
19
20
24
39
40
43
46
47
48
58
82
Example: Particle motion

Example:A forced first-order differential equation
Example:Sets of coupled first-order differential equations
Example:Power series solution for a constant-coefficient
equation
2.5 Example:Frobenius solution for Bessel's equation of or-
2.1
2.2
2.3
2.4
der zero
2.6 Example:Fourier series of a nonperiodic function
2.7 Example:Generatingtrigonometric basis functions
2.8 Example: Bessel's equation revisited

2.9 Example: Legendre's differential equation and Legendre
polynomials
2.10 Theorem: Alternative theorem .
106
109
109
115
117
124
130
131
132
133
Examples
Listof
and Statements
xxi
2.11 Example:
steady-state temperature
2.12 Example:
Steady-state temperature
profile with
profile with
fixed end
insulated
134
Steady-state temperature
2.13 Example:
profile with
fixed flux
Fixed flux revisited .
Example:
137
2.14
141
2.15 Example: Nonhomogeneous boundary-value
problem
and
the Green's function
142
2.16 Definition: (Lyapunov) Stability
Attractivity .
147
2.17 Definition:
147
2.18Definition:Asymptotic stability
147
2.19Definition:Exponential stability
148
2.20 Definition: Lyapunov function
149
2.21Theorem: Lyapunov stability
150
2.22 Theorem: Asymptotic stability .
151
2.23 Theorem: Exponential stability
152
2.24 Theorem: Lyapunov function for linear systems .
155
2.25 Definition: Exponential stability (discrete time)
156
2.26 Definition: Lyapunov function (discrete time)
156
2.27 Theorem: Lyapunov stability (discrete time)
157
Asymptotic
Theorem:
stability
2.28
(discrete time) .
157
Exponential
Theorem:
stability (discrete time) .
2.29
157
2.30 Example: Matched asymptotic expansion analysis of the
reaction equilibrium assumption
169
2.31 Example: Oscillatory dynamics of a nonlinear system
2.32 Theorem: Poincar-Bendixson
176
189
3.1 Example: Gradient (del) and Laplacian operators in polar
(cylindrical)coordinates
3.2 Example:The divergence theorem and conservationlaws .
3.3 Example: Steady-state temperature distribution in a circular cylinder
3.4 Example:Transient diffusion in a slab
259
267
3.5 Example: Steady-state diffusion in a square domain
273
275
277
3.6 Example: Eigenfunction expansion for an inhomogeneous

problem
278
3.8 Example: Transient diffusion from a sphere
279
283
3.7 Example:Steady diffusion in a cylinder: eigenfunction

expansionand multiple solution approaches .
List of Examples
and
stQteth
xxii
field around a sphere in a linear

Temperature
Example:
3.9
perturbation: heat conduction around
Domain
Example:
3.10
28
a Fourier transform formula

Derivationof
in an unbounded domain.
3.11Example:Transientdiffusion
dimensions .
3.12Example:
multiple
diffusionfrom a wall with an imposed

Steady
3.13Example: profile .
concentration
and diffusion in a membrane
Reaction
3.14Example:
the wave equation
286
293
one and
294
296
309
314
3.15Example:
function of the normal density
Characteristic
356
4.1 Example: mean and covariance of the multivariate
The
4.2 Example:
normal
function of the multivariate normal 361
Characteristic
365
4.3 Example:
normal density
Marginal
4.4 Example:
. . .
Nonlinear transformation.
Example:
4.5
of two random variables .
4.6 Example:Maximum
implies uncorrelated
4.7 Example:Independent
imply independent?
4.8 Example:Does uncorrelated
366
369
369
371
371
4.9 Example:Independentand uncorrelated are equivalent

for normals
373
4.10 Definition:Density of a singular normal .

4.11 Example:Computing a singular density
376
377
Normaldistributions under linear transformation 379

4.12Theorem:
Sumof 10 uniformly distributed random variables 382
4.13Example:
4.14 Theorem:De Moivre-Laplace central limit theorem
4.15 Assumption:Lindeberg conditions
383
386
387
4.16Theorem:
Lindeberg-Feller
central limit theorem
4.17 Theorem:Multivariate CLTIID

4.18 Theorem:Multivariate CLTLindeberg-Feller
4.19 Example:Conditional normal density
387
387
4.20 Example:Morenormal conditional

densities
4.21Example:
Theconfidenceregion, bounding box, and marginal
box .
390
391
397
4.22Theorem:
Meanand varianceof samples from a normal . . 408
4.23
Example:Comparing
PCR and PLSR .
4.24
Theorem:Taylor's
theorem with bound on remainder
420
426
List of
Examp les and
Statements
xxiii
Example:Diffusion on a plane in Cartesian and polar co-
ordinatesystems
properties from sampling
Example:Average
468
3.2 Example:Transport of many particles suspended in a fluid 473
3.3 Example:Fokker-Planck equations for diffusion on a plane 474
3.4 Algorithm:First reaction method
483
485
3.3 Algorithm: Gillespie's direct method or SSA
509
3.6
Observability of a chemical reactor .
Example.
511
5.7
iteration and estimator stability
Theorem: Riccati
515
(with probability one)
3.8
Definition: Continuity
3.9
LinearAlgebra
1.1 Vectors and Linear Spaces

A vector is defined in introductory physics courses as a quantity having magnitude and direction. For example, the position vector of an
objectin three dimensions is the triple of Cartesian coordinates that
determine the position of the object relative to a chosen origin. Another
wayof thinking of the position vector is as a point in three-dimensional
space, generally denoted R3. This view leads us to the more general and
abstract definition of a vector: A VECTORIS AN ELEMENTOF A LINEAR

SPACE:
Definition 1.1 (Linear space). A linear space is a set V whose elements
(vectors)satisfy the followingproperties: For all x, y, and z in V and

for all scalars and
closure under addition
closure under multiplication
ax e V
definition of the origin

definition of subtraction
a(x) =
(a + )x = ax + x
+ y) = ax + ay
Ix = x, Ox = 0
Naturally,these properties apply to vectors in normal 3-1)space;

but they also apply to vectors in any finite number of dimensions as
1
Linear Algebra
well as to sets whose elements are, for example, 3 by 3 matrices

or
trigonometric functions. This latter case is an example of a function
space; we will encounter these in Chapter 2. Not every set of vectors
forms a linear space, however. For example, consider vectors pointing
from the origin to a point on the unit sphere. The sum of two such
vectors will no longer lie on the unit spherevectors definingpoints
on the sphere do not form a linear space. Regarding notation, many
readers will be familiar with vectors expressed in boldface type, x, v,

etc. This notation is especially common in physics-based problems
where these are vectors in three-dimensional physical space. In the
applied mathematics literature, where a vector takes on a more general
definition, one more commonly finds vectors written in italic type as
we have done above and will do for most of the book.
1.1.1 Subspaces
Definition 1.2 (Subspace).A subspace S is a subset of a linear space

V whose elements satisfy the following properties: For every x, y e S
and for all scalars
x +y e S
ax e S
closure under addition

closure under multiplication
(1.1)
For example, if V is the plane (R2), then any line through the origin
on that plane is a subspace.
1.1.2 Length, Distance, and Alignment

The idea of a norm generalizes the concept of length.
Definition 1.3 (Norm).A norm of a vector x, denoted llxll, is a real

number that satisfies
llaxll =
llxll
llxll > O,Vx

llxll
llx + yll llxll + llyll
triangle inequality
The Euclidean norm in VI is our usual concept of length

n
llx112= ElXi12
and Linear Spaces

1.1Vectors
in whichXi is the ith component of the vector. Unless otherwise noted,

this is the norm that will be used throughout this book, and will generallybe denoted simply as llxll rather than llx112.It should be noted,
however,that this is not the only definition of a norm, nor is it always
the most useful. For example, the so-called Ip norms for vectors in
are defined by the equation
l/p
n
lixilp=
particularlyuseful are the cases p = 1, sometimes called the "taxicab
norm"(why?)and p = co: llxlloo= maxi IXil.
generalizes the dot product of elementary algeThe INNERPRODUCT
bra and measures the alignment of a pair of vectors: an inner product
of twovectors, denoted (x, y) is a scalar that satisfies
(ax,y) =
(x, x) > 0, if x *O
Theoverbar denotes complex conjugate. Notice that the square root of
satisfies all the properties of a norm, so it is
the inner product
is
a measureof the length of x. The usual inner product in
(x, y) =
XiYi
in whichcase
= llx112.This is a straightforward generalization
of the formula for the dot product x y in R2 or R3 and has the same
geometricmeaning
(x, y) = ltxll llyll cose
whereis the angle between the vectors. See Exercise 1.1 for a derivation. If we are considering a space of complex numbers rather than real
numbers,the usual inner product becomes
(x,y) =
i=l
Xii
If (x, y) = O,then x and y are said to be ORTHOGONAL.
Linear Algebra
a
Finally,we can represent a vector x in Rn as single columnof
XT as a Row
and define its TRANSPOSE
VECTOR,
elements, a COLUMN
VECTOR
Nowthe inner product (x, y) can be written xTy if x and y are

and x T if they are complex.
1.13 Linear Independenceand Bases
If we have a set of vectors, say {Xl,x2, x3}, in a space V, this set is said
(1.1)if the only solution to the equation
INDEPENDENT
to be LINEARLY
is
DEPENDENT.
Otherwisethe set is LINEARLY
A
= 0 for all i.
if it contains a set of n linearly independent

space V is 'I-DIMENSIONAL
vectors, but no set of n + 1 linearly independent vectors. If n LIvectors
can be found for any n, no matter how large, then the space is INFINITEDIMENSIONAL.
Everything said above holds independent of our choice of coordinate

system for a space. To actually compute anything, however, we need
a convenientway to represent vectors in a space. We define a BASIS

{el, e, e3 J as a set of LI vectors that SPANthe space of interest, i.e.,
every vector x in the space can be represented
x = alel +
+ a3e3 +
If a space is n-dimensional,then a basis for it has exactlyn vectors

and vice versa. For example,in tR3 the unit vectors in the x, y, and z
directions form a basis. But more generally, any three LIvectors form
a basis for R3,
Although any set of LIvectors that span a space form a basis, some
bases are more convenient than others. The elements of an ORTHONOR
MAL(ON)basis satisfy these properties
(ei,ej)
0, i
each basis vector has unit length

the vectors are mutually orthogonal
These properties may be displayed more succinctly

Linear
1.2
DELTA.In an orthonormal
is called the KRONECKER
ij
symbol
The
can be expressed
vector
basis,any
and Matrices
1.2 Linear Operators
transforms one vector into another. Operators appear

AnOPERATOR
in applied mathematics. For example, the operator d/dx
everywhere
optransformsa function f (x) into its derivative. More abstractly, an
of A)
DOMAIN
eratorA is a mapping that takes elements of one set (the
of A). LINEAR
andconvertsthem into elements of another (the RANGE
operatorssatisfythe followingproperties for all vectors u and v in

theirdomain and all scalars A'
(1.2)
A(au) = a(Au)
Wefocushere on operators on finite-dimensionalvector spaces [VI;

operatorson spaces of complex numbers are similar. (In Chapter 2 we
wmlookat an important class of operators in function spaces.) In these
spaces,and havingchosen a coordinate system in which to represent

vectors,any linear operator can be expressed as multiplication by a
A matrix is an array of numbers
MATRIX.
A ll
A 12
A 21
'422
Aln
Aml Am2
Thefirst subscript of each element denotes its row, while the second
denotesits column. The transformation of a vector
into another
Y=
xn
thenoccursthrough matrix-vector multiplication. That is: y = Ax,
whichmeans
Linear
Algebra
In this example,the matrix A is m by n (rowsby columns);

it is an
element of the linear space
and multiplication by A
maps
in Rtt into vectors in Rm. That is, for the function defined vectors
by matrix
multiplication, f (x) = Ax, f :
Rm. Some readers will
be
with matrices written in bold and the matrix-vector product familiar
between
matrix A and vector x
as either Ax or A
x.
One can also think of each row of A as a vector. In this
case
component of y can be thought of as the dot product betweenthe ith
row of A and the vector x. This is probably the best way to theith
remember
the actual algebra of the matrix-vector multiplication formula.
A
intuitive and general geometric interpretation (which will be more
used extensively as we proceed through the chapter) is allowed by
considering
each column of A as a vector, and thinking of the vector y as a
linear
combination of these vectors. That is, y is in the space spannedby
the
columns of A. If we let the ith column of A be the vector Ci,then
y = XICI +
n
+ X3C3+
(Note that in this equation xj is a scalar component of the vectorx,
while cj is a vector.) This equation implies that the number of columns

of A must equal the length of x. That is, matrix-vectormultiplication
only makes sense if the vector x is in the domain of the operatorA.
12.1 Addition and Multiplication of Matrices

The following terminology is used to describe important classes of matrices.
1. A is SQUARE
if m = n.
2. A is DIAGONAL
if A is square and Aij = 0 for i * j. Thisis
sometimes written as A = diag(al, Q, ...,an) in whichai is the

element Aii for i = 1,2
n.
3. A is UPPER(LOWER)
TRIANGULAR
if A is square and Aid = 0 for
4. A is UPPER(LOWER)
HESSENBERG
if A is square and Aij = 0 for
5. A is TRIDIAGONAL
if it is both upper and lower Hessenberg.
and Matrices
Operators
Linear
if A is square and Aij = Aji, i, j = 1,2,. .. , n.

SYMMETRIC
is
A
6.
if A and B both have
of two matrices is
ADDITION
domain and range, then
same
the
(A + B)ij = Aij + Bij
the matrices cannot be added.

Otherwise,
is simple
MULTIPLICATION
SCALAR
is not as simple. The product AB
MULTIPLICATION
MATRIX-MATRIX
of the rows of A with the columns of B. If
isthematrixof dot products
AB only exists if n = p. Otherwise, the
Ae Rtnxnand B e RPxq,then
with the columns of B. If ul
lengthsof the rows of A are incompatible
representsthe ith row of A, and vj the jth column of B, then
1,...q
(AB)ij = ui
Equivalently,
n
(AB)ij= E AikBkj,i=
= l,...q
SoABis an m by q matrix. Note that the existence of AB does not

implythe existence of BA. Both exist if and only if n = p and m = q.
Evenwhenboth products exist, AB is not generally equal to BA. In
otherwords,the final result of a sequence of operations on a vector
generallydepends on the order of the operations, i.e., A(Bx) * B(Ax).
Oneimportantexception to this rule is when one of the matrices is the
I. The elements of I are given by Iij = ij, so for
MATRIX
IDENTITY
example, in R3x3
100
010
001
Foranyvector x and matrix A, Ix = x and Al = IA.

Example1.4: Common transformations do not commute
LetmatricesA and B be given by
o
Linear
Algebra
Show that the operations of

factor in the "2" direction.
stretchinsame
g and
commute.
rotating a vector do not
Solution
The matrices AB and BA are
1
- 7
Sincethese are not equal, we conclude that the two vector operations
do not commute.
1.2.2 Transpose and Adjoint

For every matrix A there exists another matrix, called the TRANSPosE
of A and denoted AT, such that (AT)ij = Aji. The rows of A become
the columns of AT and vice versa. (We already saw this notionin the
contextof vectors: viewingx as a matrix with one column,thenXT
is a matrix with one row.) A matrix that equals its transpose satisfies
this can occur only for square
Aji = Aij and is said to be SYMMETRIC;
matrices. Someproperties of the transpose of a matrix are
(AB) T = B TA T
(ABC) T = c TB TA T
Properties involving matrix-vector products follow from the treatment

of a vector x as a matrix with only one column. For example
(Ax) T = x T A T
If A, x, and y are real, then the inner product between the vectorAx
and the vector y is given by
(Ax) Ty =
(1.3)
One can generalize the idea of a transpose to more generalopera-
tors. The ADJOINT

of an operator L (not necessarily a matrix) is denoted
L* and is defined by this equation
(Lx,y) =
(1.4)
12 LinearOperators and Matrices
If L is a real matrix A, then (Lx, y) becomes (Ax)Ty and comparison

of (1.3)and (1.4) shows that
if L is a complexmatrix A then we show in the following

Similarly,
section that
Byanalogywith this expression for matrices, we will use the notation

x* = RTfor vectors as well. Some general properties of the adjoint of
an operator are
If L = L* , then L is said to be SELF-ADJOINTor HERMITIAN.Self-adjoint
operators have special properties, as we shall see shortly, and show up

in many applications.
1.23 Einstein Summation Convention

Noticethat when performing matrix-matrix or matrix-vector multipli-
cations,the index over which the sum is taken appears
in the
formula,while the unsummed indices appear only once. For example,

in the formula
(ABC)ij= E E AikBklClj
the indicesk and I appear twice in the summations, while the indices i
andj only appear once. This observation suggests a simplified notation
for products, in which the presence of the repeated indices implies
summation,so that the explicitsummation symbols do not need to
be written. Using this EINSTEINSUMMATION

CONVENTION,
the inner
ProductxTy is simply XiYiand the matrix-vectorproduct y = Ax is

Yi = Aijxj. This convention allows us to concisely derive many key
results.
Linear
10
Algebra
derived with index notation

Example1.5: Matrixidentities
identities using index notation
Establishthe followingmatrix
(c) AAT = (AAT)T

(b) (AB)T = BTAT
(a) (Ax,y) = (x,Ty)
T
(e) ATA = (ATA) T
(d) A + AT = (A + AT)
Soluon
T
(a) (Ax, y) = (x, y)
(Ax, y) = AijXji
= XjAiji
= xjAjOi
= xjjiYi
= (x,Ty)
(A + AT)ij = Aij + Aji
= Aji + Aij
(b) (AB) T = BTAT

T
(AB) T
ij = (AikBkj)
= AjkBki
= BkiAjk
BikAkj
= (BTAT)ij
= (A + AT)T
ij
(e) ATA = (ATA) T

(ATA) ij = AikAkj
= AkiAkj
= AjkAki
(AAT) ij = AikAkj
= AikAjk
= AjkAik
= (ATA) ji
= (ATA)Tj
AjkAki
(AAT)T
1.2.4 Gram-SchmidtOrthogonalization and the QR Decomposition

We will encounter a number of situations where a linearly independent
a
set of vectors are available and it will be useful to construct from them

Linear
1.2
11
vectors. The classical approach to doing this is called

setoforthogonalorthogonalization. As a simple example, consider LI
GLAM-SCHMIDT
which we wish to find an orthogonal pair ul
vectorsVI and v2, from
loss of generality we can set
and112.Without
component of v'2that is orthogonal to

Itisstraightforwardto find the
just subtract from c'2 the component that is parallel to ul (the
illwe
of V2onto ul)
PROJECTION
(V2,u1)ul
Inhigherdimensions,where we have v3, v4, etc., we continue the process,subtractingoff the components parallel to the previously determinedorthogonal vectors
(V3,u1)
(V3,u2)
u2
2
llul 11
and so on.
Wecan apply Gram-Schmidtorthogonalization to the columns of

anym x n matrix A whose columns are linearly independent (which
impliesthat m n). Specifically,we can write
whereQ is an m x n matrix of orthonormal vectors formed from the

columns
of A and R is an n x n upper triangular matrix. This result is
knownas the QR DECOMPOSITION.

We have the following theorem.
Theorem1.6 (QRdecomposition). If A e
has linearly independentcolumns, then there existsQ
anduppertriangular R such that
with orthonormal
columns,
SeeExercise1.38 for the proof. Becausethe columns of Q are or-
thonormal,QTQ = I.
1.2.5The Outer Product,
Dyads, and Projection Operators
Given
twoLIvectors VI and in [RTZ,
Gram-Schmidtuses projection to
constructan orthogonal
pair
u 1 = VI
Linear
12
Algebra
definition (u, v) = uTv.

haveused the inner product equation is linear in v2,Observe that the
so we should
right-handside of the second form
be
= AV2,where A is a
able to put this equationin the important
matrix.
concepts so we
The form of A illustrates someA = I P, where
eXpIicit1y
constructit here. We can write
PV2
Notingthat aTb = bTa for vectors a and b, this rearranges to

PV2 = 1(1V2)
whichhas the form we seek if we move the parentheses to have
PV2= (ll
That is, P is givenby what we will call the OUTERPRODUCT
between
generally,
the
outer
project uvT between
l and itself: 1T. More
a
DYAD,
called
that
satisfies the following
vectors u and v is a matrix,
properties
(uvT)ij = uiVj
(uv T)w = u(v Tw)
w T(uv T) = (w Tu)v T
wherew is anyvector.The outer product is sometimes denoted u v.
Whenthe notationu v is used to represent the inner product, u e v
or uv is used to represent the outer.

Finally,returningto the specificcase P = ll , we can observe that
PW = l (l w); the operation of P on w results in a vector that is the
The
OPERATOR.
projectionof w in the l direction: P is a PROJECTION
operator I P is also a projectionit takes a vector and produces the
projectionof that vector in the direction(s) orthogonal to l. We can
checkthat both 1T and I ll satisfy the general definition of a
projection operator
1.2.6 PartitionedMatrices and Matrix Operations
It is often convenient to consider a large matrix to be composed of other
matrices,rather than its scalar elements. We say the matrix is partiexplicit,

tionedinto other smaller dimensional matrices. TO make this
ii

Linear
1.2
13
submatrix as follows. Let matrix A e Rtnxn,and define

firstwedefinea
< < ik m, and 1 jl < j2 <
i2
<
il
1
indices
matrix S, whose (a, b) element is
thenthe k x C
Sab = Aia,jb
submatrix of A.
is calleda
is partitioned when it is written as
AmatrixA e
All
AIC
'412
A21 A22
Akl Ak2
whereeachAij is an mi x nj submatrix of A. Note that Ei=l mi = m
andEj=lnj = n. Two of the more useful matrix partitions are columnpartitioningand row partitioning. If we let the m-vectors ai, i
1,2 n denote the n column vectors of A, then the column partitioning of A is
A = al
a2
Ifwelet the row vectors (1 x n matrices) j,j = 1, 2, ... , m denote the

m rowvectors of A, then the row partitioning of A is
al
a2
am
The operations of matrix transpose, addition, and multiplication

becomeevenmore useful when we apply them to partitioned matrices.
Considerthe two partitioned matrices
All
A12
Akl Ak2
AIC
Bll
Bml
B12
Bin
Bmn
inwhichAij has dimension Pi x qj and Bij has dimension ri x sj. We

thenhave the following formulas for scalar multiplication, transpose,
matrixaddition,and matrix multiplication of partitioned matrices.
Linear Algebra
14
1. Scalarmultiplication.
Ml 1 A12
RAk1 Ak2
2. Transpose.
11
21
18
2e
ke
3. Matrixaddition. If Pi = ri and qj = sj for i = 1

k andj =
Cand k = m and = n; then the partitioned matricescan
be added
Cll
Cij = Aij + Bij
Ckl
C, then we saythe
4. Matrixmultiplication.If qi = ri for i = 1
partitioned matrices conform, and the matrices can be multiplied
Cll
Cle
Ckl
These formulas are all easily verified by reducing all the partitioned
matrices back to their scalar elements. Notice that we do not haveto
remember any new formulas. These are the same formulas that we
learned for matrix operations when the submatrices Aij and Bij were
scalar elements (exceptwe normally do not write the transposefor
scalars in the transpose formula). The conclusion is that all the usual
rules apply provided that the matrices are partitioned so that all the
implied operations are defined.
1.3 Systems of Linear Algebraic Equations

1.3.1 Introduction to Existence and Uniqueness
Any set of m linear algebraic equations for n unknowns can be written
in the form
Ax = b
of Linear Algebraic Equations

1.3Systems
15
b e
and x (e Rti) is the vector of
whereA e R
unknowns.
consider the vectors Ci that form the columns of A. The solution x (if
it exists)is the linear combination of these columns that equals b

b = XICI +
+ X3C3+ ... + xncn
Thisviewof Ax = b leads naturally to the following result. The system

of equations
AX = b,
Rmxn X e Rn, b e Rm
has at least one solution x if and only if the columns of A are not
linearlyindependent from b.
For example, if m = n = 3 and the columns of A form an LIset, then

theyspan R3. Therefore, no vector b e R3 can be linearly independent
fromthe columns of A and therefore Ax = b has a solution for all
b e R3. Conversely, if the column vectors of A are not LI, then they do
not span R3 so there will be some vectors b for which no solution x

exists.
Considerthe case where there are the same number of equations

as unknowns: n = m. Here the above result leads to this general
theorem.
Theorem1.7 (Existenceand uniqueness of solutions for square systems).If A e Rnxn, then

The prob(a) If the columns of A are 11, then the matrix is INVERTIBLE.
lem Ax = b has the followingproperties:
problem)
has only the trivialsolution
(1) Ax
= 0 (the homogeneous
(2)Ax
= b (the inhomogeneous problem) has a unique nonzero
(1)Ax
number of nonzero solutions.These

= O has an infinite
solutionfor all b * 0.
or
(b)If the columns of A are NOT1.1,then the matrix is SINGULAR
In this case:
NONINVERTIBLE.
solutions comprise the NULLSPACEof A.
(2) Forb * 0, Ax
= b has either:
(i) No solution, if b is 1.1of the columns of A. That is, b is not
of A, or
in the RANGE
Linear
16
solution to Ax
ticular
Algeb
= b and any
combination
x = XH + XP where Axpofth
i.e.,
0,
=
Ax
lutions of
Qhd
13.2
LU Decomposition
b:
=
Ax
Solving
of explicitly constructing solutions.

issue
the
to
Forthe
n = m and to case (a)
Wenow turn
to
attention
Oftheabove
present,we restrict we can define the INVERSE
of A, denoted
case,
this
In
theorem.
that satisfies
operator
matrix
This is a
of A-I )
(definition
I
=
1. A-I A
2. AA-I = 1
1
3. (AB)-I = B-l A
that A-IAx = A-lb reduces to x = A-lb

The first property implies
finding A-I. Finding A-I is not necessary,
so Ax = b can be solved by
a
however,to solve Ax = b, nor is it particularly efficient. Wedescribe
widelyused approach called LU decomposition.
LUdecompositionis essentially a modification of Gaussianelimi.
nation,with which everyone should be familiar. It is based on thefact
that triangularsystems of equations are easy to solve. Forexample,
this matrix is upper triangular
123
048
007
Allthe elements below the diagonal are zero. Since the third rowhas
onlyone nonzero element, it corresponds to a single equationwitha
it
singleunknown. Once this equation is solved, the equationabove
has only a singleunknown and is therefore easy to solve,andso
LUdecompositiondepends on the fact that a square matrixAcanbe
written A = LU,where L is lower triangular and U is upper triangular
Usingthis fact, solving Ax = b consists of three steps, the firstofwhid
takes the most computation:
1. Find L and U from A: LU factorization,
2. SolveLc = b for c:
forward substitution.
of Linear
1.3Systems
Algebraic Equations
= c for x: back substitution.

3. SolveUx
steps are simple operations, because L and U are trianThelatter two L and U are independent of b, so to solve Ax = b
gular.Notethat
values of b, then once A is factored, only the inexformanydifferent
The LU
pensivesteps 2 and 3 of the above process need be repeated.
decompositionprocedure (first step above) is illustrated on the matrix
A=
352
082
628
step a. Replacerow 2 with a linear combination of row 1 and
row
2 that makes the first element zero. That is, r2 is replaced
by r2 L21r1, where L21 = A21/ All. For this example, A21 is
already zero, so L21= 0 and r2 is unchanged.

Stepb. Replacerow 3 with a linear combination of row land row
3 that makes the first element zero. That is, r3 is replaced
byr3 L3ff1, where 1.31= 2431/All. So L31= 6/3 = 2 and A
is modified to
3
0 82
Stepc. Nowthe first column of the matrix is zero below the diagonal. Wemove to the second column. Replace row 3 with a
linear combination of row 2 and row 3 that makes the sec-
ond element zero. That is, r(3) is replaced by r3 L32TQ,
where 1.32 A32/A22. So 1.32= 1and A is modified to
352
082
006
=U
This matrix is now the upper triangular matrix U. For a matrix in higher dimensions, the procedure would be continued
until all of the elements below the diagonal were zero. The
matrix L is simply composed of the multipliers Lij that were
computed at each step
1
L = L21
L31 L32 1
Linear
18
Algebra
elements of L are 1 and all aboveNote that all the diagonal
diagonalelements are zero. The elements on the diagonal

of U are called the PIVOTS.
Lc = b and then Ux
Now,for any vector b the simple systems
as written, the method willfail

can be solved to yield x. Notice that
if
Modern
procedure.
computational
the
of
step
any
at
routines
Aii = 0
actually compute a slightly different factorization PA = LU wherep
is a permutation matrix that exchanges rows to avoid the case Aii= 0
(see Exercise 1.9). With this modification, known as PARTIALPIVOTING,
even singular or nonsquare (m > n) matrices canbe factored. However,

the substitution steps will fail except for values of b in the range ofA.
To see this, try to perform the back substitution step with a matrixU
that has a zero pivot.
13.3 The Determinant

In elementarydiscussionsof the solution to Ax = b that are based
of the matrix A, denoted detA,
RULE,the DETERMINANT
on CRAMER'S
arises. One often finds a complicated definition based on submatrices, but having the LUdecomposition in hand a much simpler formula
emerges (Strang, 1980). For a square matrix A that can be decomposed
into LU,the determinantis the product of the pivots
n
If m permutations of rows must be performed to complete the decomposition, then the decomposition has the form PA = LU, and
detA = (1)m
Uii
The matrix A-I exists if and only if detA * 0, in which case detA 1
(detA)-1. Another key property of the determinant is that
detAB = detA detB

The most important use of the determinant that we will encounterin
this book is its use in the ALGEBRAIC

that appears
PROBLEM
EIGENVALUE
in Section 1.4.
of
Systems
1.3
Linear Algebraic Equations
19
of a Matrix
13.4 Rank
the rank of a matrix, it is useful to establish the followBeforewe definematrices: the number of linearly independent columns
of
ingproperty equal to the number of linearly independent rows.
is
of a matrix
Linearly independent columns, rows of a matrix
Example1.8:
Assume A has c linearly independent columns and r
GivenA e Rtnxn.
rows. Show c = r.
linearlyindependent
Solution
be the set of A's linearly independent column vectors. Let

Let
vectors and i be all of A's row vectors, so
theai be all of A's column
partitioned by its columns or rows as
theA matrix can be
n
al
a2
am
Eachcolumnof the A matrix can be expressed as a linear combination
of the c linearly independent Vi vectors. We denote this statement as
follows
aj
A:mxn
V:mxc
A:cxn
in whichthe column vector j e RC contains the coefficients of the
linearcombination of the Vi representing the jth column vector of matrixA. If we place all the j,j = 1, . .. , n next to each other, we have
matrixA. Next comes the key step. Repartition the relationship above
as follows
51
ai
A:mxn
V:mxc
A:cxn
20
lineQF
and we see that the rows of A can be expressed

as Iin
of the rows of A. The multipliers of the i th
row
of
elementsof the ith row of V, written as the
row vector
resslble as linear
combinations
of A,but we do not know if the rows of A
o;
dependent row (column) vectors of A are also

the linearly
column (row) vectors of AT. Combining c r
with r c, independent
we conclude
and the result is established. The number of

columnsof a matrix is equal to the number of linearly independent
linearlyindependent
rows, and this number is called the rank of
the matrix.
Definition 1.9 (Rankof a matrix). The rank of a matrix

is the number
of linearly independent rows, equivalently, columns, of the
matrix.
We also see clearly why partitioned matrices are so useful. Theproof

that the number of linearly independent rows of a matrix is equaltothe
number of linearly independent columns consisted of little morethan
partitioning a matrix by its columns and then repartitioning the same
matrix by its rows. For another example of why partitioned matricesare
useful, see Exercise 1.17 on deriving the partitioned matrix inversion

formula,which often arises in applications.
13.5 Range Space and Null Space of a Matrix

GivenA e R"txn, we define the range of A as
R(A)= {y e
I y = Ax, x e VI}
with
generated
be
The range of a matrix is the set of all vectors that can
linearly
the
are
VI
the product Ax for all x e IV. Equivalently, if Vi e
the
span of
independent columns of A, then the range of A is the
Linear
systems of
Algebraic Equations
21
is for the range of A. GivenA e Rtnxn,we define the
are a bas follows

Vi
Tbe
of A as
space
ull
MA)
range and null

similarlythe
= {x e
IAx = 0}
spaces of AT are defined to be
R(AT) = {x e
ye
Ix=ATy,
N(AT) = {y e Rm I
= 0}
is the set of linearly independent rows of A, transR(AT)

for
basis
A
vectors. We can show that these four sets also
posedto make column
properties of a subspace, so they are also subspaces
satisfythe two
1.14).
(seeExercise
A. We know from the previous examLetr be the rank of matrix
to the number of linearly independent rows of A
equal
is
r
that
ple
number of linearly independent columns of A.
andis also equal to the
the dimension of R (A) and R (AT) is also r
Equivalently,
dim(R(A)) = dim(R(AT)) = r = rank(A)
relations
Wealsocan demonstrate the following pair of orthogonality
amongthese four fundamental subspaces
R(A)
L N(A T )
R(A T ) L N(A)
the first orthogonality relationship. Let y be any element of

Consider
T
N(AT). WeknowN(AT) = {y e Rm I A y = 0}. Transposing this
relationand using column partitioning for A gives
T
y A=o
y Ta1 Y a2
Thelast equationgives yr ai = 0, i = 1, , n, or y is orthogonal to

everycolumnof A. Since every element of the range of A is a linear
combination
of the columns of A, y is orthogonal to every element of
R(A),whichgives N(AT) L R(A). The second orthogonality relation-
shipfollowsby switching the roles of A and AT in the preceding argument(seeExercise1.15). Note that the range of a matrix is sometimes
calledthe image, and the null space is sometimes called the kernel.
Linear
22
Algebra
Terms of Rank and Null

13.6 Existence and Uniqueness in
Space
The FUNDAMEN
Wereturn now to the general case where A e
complete characterization
TAL THEOREMOF LINEARALGEBRAgives a
of the existenceand uniqueness of solutions to Ax = b (Strang,1980):

and Rm into the four
every matrix A decomposes the spaces
funda1.1.
The
answer
Figure
to
in
depicted
the
mental subspaces
questionof
to
solutions
Ax
=
b
can be summarized
existenceand uniqueness of
as follows.
1. Existence. Solutionsto Ax = b exist for all b if and onlyif the

rows of A are linearly independent (m = r).
2. Uniqueness. A solution to Ax = b is unique if and onlyif the

columns of A are linearly independent (n = r).
We can also state this result in terms of the null spaces. A solution
to Ax = b exists for all b if and only if N(AT) = {O}and a solution
to Ax = b is unique if and only if N(A) = {O}. Moregenerally,a
for a particular b if and only if b e R(A),by
solution to Ax = b
the definition of the range of A. From the fundamental theorem,that
means y Tb = 0 for all y e N(AT). And if N(AT) = {0}we recover
the existence condition 1 stated above. These statements providea
succinct generalization of the results described in Section 1.3.1.
13.7 Least-Squares Solution for Overdetermined Systems
Nowconsider the OVERDETERMINED
problem, Ax = b where A e
with m > n. In general, this problem has no exact solution,because

the n columns of A cannot span (Rtn, the space where b exists. This
problem arises naturally in fitting models to data. In general,the best
we can hope for is an approximate solution x that minimizes the resid-
ual (or error) r = Ax b. In particular, the "least squares"method

attempts to minimize the square of the Euclidean norm of the residual,
llr112= rTr. Replacingr by Ax b, this quantity (dividedby 2)reduces
to the function
P(x) = x TA TAx x TA T b + b T b
P is a scalar function of x and the value of the vector x that minimizes

P is the solution we seek. That is, we now want to solve P/x1 = 0, l,-
Linear Algebraic
of
3 Systems
23
Equations
Ax = b
R(AT)
N(AT)
MA)
Figure 1.1. The four fundamental subspaces of matrix A (after

(Strang, 1980, p.88)). The dimension of the range of A
and AT is r, the rank of matrixA. The nullspaceof A
and range of A T are orthogonal as are the nullspace of

AT and range of A. Solutions to Ax = b exist for all b
if and only if m = r (rows independent).A solution to
Ax = b is uniqueif and only if n = r (columnsindependent).
l,...,n, or in different notation, VP(x) = 0. Performingthe gradient
operationyields
= AljAjkXk Xl
or in matrix form
dx
Therefore,the condition that P be minimized is equivalent to solving
24
lineqt
EQUATIONS.
These are called the NORMAL
Notice
that
easy to solve as LUX = ATb. In Exercise 1.41 You

are
s of A are
dependent.l
If ATAhas full rank, the inverse is uniquely
defined
write the least-squares solution to the normal equations ,and
as
Xls =
b
The matrix on the right-hand side is ubiquitous in
as the pseudoinverse of A (or least-squares
lems,it is
Moore-Penrose
doinverse in honor of mathematician E. H. Moore
and
physicist Roger Penrose) and given the symbol At. The
Xls = At b
At = (ATA) IAT
The normal equations have a compelling geometric interpretation

that illustrates the origin of their name. Substituting r into thenormal
equations gives the condition ATr = O. That is, the residual r = Ax-b
is an element of the null space of AT, N(AT), which means r is orthog.
onal, i.e., normal, to the range of A, R (A) (right side of Figure1.1).This
is just a generalization of the fact that the shortest path (minimum

llrll)
b
not
point
on
a
that
and
plane
plane
is
a
connecting
perpendicular
to
the plane. Note that this geometric insight is our seconduseofthe
fundamental theorem of linear algebra, This geometric interpretation
is perhaps best reinforced by a simple example.
Example1.10:The geometry of least squares

Weare interested in solving Ax = b for the following A and b.
1
A = 21
I
1
1
The
to remember.
easy
least
at
is
IPutting proof aside for a moment, the condition
has morerows
squares
least
solutiol
A in the overdetermined system for which we apply
least-squares
columns, So the rank of A is at most the number of columns. The rank of Aequals
i.e.,
is unique if and only if the rank is equal to this largest value,
number of columns.
.1

of
Systems
1.3
25
is the rank of A? Justify your answer.

What
(a)
a sketch of the subspace R (A).
Draw
(b)
a sketch of the subspace R (AT).
Draw
(c)
a sketch of the subspace N(A).
(d) Draw
sketch of the subspace N(AT).
(e) Drawa
to Ax = b for all b? Justify your answer.
(f)Is there a solution
(g)Is there a solution for the particular b given above? Justify your
answer.
(h)Assumewe give up on solving Ax = b and decide to solve instead

the least-squares problem
min(Ax b) T(Ax b)
Whatis the solution to this problem, xo?

(i) Is this solution unique? Justify your answer.
(j) Sketchthe location of the b0 for which this xo does solve Ax = b.

In particular, sketch the relationship between this b0 and one of
the subspaces you sketched previously. Also on this same drawing, sketch the residual r = Ax o b.
Solution
(a) The rank of A is 2. The two columns are linearly independent.

(b)R(A) is the xy plane in R3.
(c) R(AT) is R2. Notice these are not the same subspaces, even though
they have the same dimension 2.
(d)N(A) is the zero element in R2.

(e)N(AT)is the z axis in R3.
(f)No.The rows are not independent.
Linear
26
Algebra
MAT)
RCA)
Figure 1.2: Least-squaressolution of Ax = b; projectionof b into
R(A) andresidualr = Ax o b in N(ATL
(g) No. The range of A does not have a nonzero third element and
this b does.
(h) The solution is
(i) Yes, the least-squares solution is unique because the columnsof

A are linearly independent.
(j) The vector b is decomposed into b0 e R(A) and r = Axo b e

N(AT).
We want Axo = b0, so b0 =

= Pb and the projection operator is P =
The residual is r Axo b =
(P I)b and we have for this problem
100
000
0
0
O
0
-1
of Linear Algebraic Equations

Systems
1.3
Substitutingin the value for b gives

1
b0 =
-1
The spaces R (A) and N(AT) are orthogonal, and therefore so are
b0 and r. The method of least squares projects b into the range
of A, giving b0, and then solves exactly Ax = b0 to obtain xo.

Theserelationships are shown in Figure 1.2.
Theaboveanalysis is only the beginning of the story for parameter

estimation.Wehave not dealt with important issues such as errors in
themeasurements,quantifying the uncertainty in parameters, choice
ofmodelform, etc. Manyof these issues will be studied in Chapter 4
as part of maximum-likelihood estimation.
13.8 MinimumNorm Solution of the Underdetermined Problem

Considerthe case of solving Ax = b with fewer equations than unproblem. Assume that the
knowns,the so-called UNDERDETERMINED
rowsof A are linearly independent, so a solution exists for all b. But
wealsoknowimmediately that N(A) * {O},and there are infinitely
manysolutions. One natural way to choose a specific solution from
solutheinfinitenumber of possibilities is to seek the MINIMUM-NORM
2 subject to the constraint that Ax = b.
tion.Thatis, we minimize llx11
Byanalogywith the approach taken above in constructing the leastsquaressolution, we define an objective function
P(x) = x Tx z T (Ax b) = XiXi Zi(AijXj bi)
wherenow z is a vector of Lagrange multipliers. The minimization

conditionP/xk = 0 is thus
Xk = zjAjk
orx = ATz. Inserting this into the equation Ax = b yields

Sincethe rows of A are linearly independent, AAT is full rank.2 We can
solvethis equation for z and insert into the equation x = ATz that we
2
Transposethe result of Exercise 1.41.
Linear Algebra
28
found above to deduce the
minimum-norm solution
(1.6)
Note the similarityin the solution structure of the underdetermined,

minimum-normproblem to the overdetermined, least-squares problem
given in (1.5). The singular value decomposition, which we introduce
in Section 1.47, allows for a unified and general treatment of both the
underdeterminedand overdetermined problems.
13.9 Rank, Nullity, and the Buckingham Pi Theorem
As engineers,we often encounter situations where we have a number
of measurementsor other quantities di and we expect there to be a
functionalrelationship between them
In general, we would like to have a dimensionless representation of this

relation,one that does not depend on the units of measurement, i.e.,
where each
has the form
11= dfl df2X
and the exponents ai are chosen so that

each Ili is dimensionless. If the
set
of n quantitiesdi depend on m units

(kilograms, meters, seconds,
amperes,
the key question is: what is the relationship between n,
m, and the number I of
dimensionless
characterize the relationship between variables Ili that is required to
the variables?
We
address this issue with a specific
example. Consider fluid
flow through a tube. The fluid
has density p and viscosity n, and flows
with averagevelocity U
through a tube with radius R and length L,
driven by a pressure drop
to mean "has dimensionS
of,"we seek dimensionlessAP. Defining
quantities of the form
11=
[=
kg m al m a2 kg
s2m2
3 kg ms a4
m2s2
(m) a5 (m) a6
29

of
Systems
must cancel, so we require that

units
Allthe
al + a3 + a4 = 0
al + a2 3a3 a4 + a5 + a6 = 0
m:
2m a2 a4 = 0
s:
Thisis a system
of three equations with six unknowns and has the form
m = 3,n = 6, and x =
- 0,whereA e
We
at most three LI columns, so in six dimensions there

has
A
that
know
three dimensions that cannot be spanned by these
mustbe at least
it is easy show that A does have three LI
threecolumns. In this case
that there are 6 3 = 3 families of solutions
means
which
columns,
proper dimensionless quantities. By inspection, we
ai that willyield
and
(1, 2,
1, 1,
= (0, 1,
can find the solutions x
(0, 0, 0, 0, 1, I)T,
yielding the three dimensionless
PUR
Ap
113 =
groups
as the
Readerswith a background in fluid mechanics will recognize
(Bird, Stewart, and Lightfoot, 2002).
NUMBER
REYNOLDS
Becausethe solution to Ax = 0 is not unique, this choice of dimensionlessgroups is not unique: each Ili can be replaced by any nonzero
powerof it, and the IliS can be multiplied by one another and by any
constantto yield other equally valid dimensionless groups. For exam= -AP
-, fluid mechanicians
ple,112can be replaced in this set by 112113
FACTOR.
recognizethis quantity as the FRICTION
Nowwe return to the general case where we have n quantities and m
units.BecauseA has m LI rows (and thus m LI columnssee Example
1.8),it has a nullspace of n m dimensions, and therefore there is an
n - m dimensionalsubspace of vectors x that will solve Ax = 0. This
given a problem with n
resultgivesus the BUCKINGHAM
PI THEOREM:
dimensional
parameters containing m units, the problem can be recast
interms of I = n m dimensionless groups (Lin and Segel, 1974). This
theoremholds under the condition that rank(A) = m; in principle it is
possiblefor the rank of A to be less than m. One somewhat artificial
examplewhere this issue arises is the following: if all units of length are
representedas hectares per meter, then the equations corresponding

to thosetwo units would differ only by a sign. They would thus be
redundantand the rank of A would be one less than the number of
units.If m were replaced by rank(A), then the Pi theorem would still
hold.
Linear
30
Algebra
which the Buckingham Pi theorem

A less trivial example in
can cause
problems involving mixtures.
confusion is the case of
One might
chemical species A and moles of

pect that moles (or masses) of
chemical
species B (or mole or mass fractions of these species) would be independent units, but they are not. Unlike kilograms and meters, which
cannot be added to one another, moles of A and moles of B can be added
to one another so they do not yield separate equations for exponents

the way that kilograms and meters do.
13.10 Nonlinear Algebraic Equations: the Newton-Raphson Method
Manyif not most of the mathematical problems encountered by engineers are nonlinear: second-orderreactions, fluid dynamics at finite
Reynoldsnumber, and phase equilibrium are a few examples. Wewill
write a general nonlinear system of n equations and n unknowns as
f(x) = o
(1.7)
and f e VI. In contrast to the case with linear equations,

where x e
where LUdecompositionwill lead to an exact and unique solution(if
the problem is not singular), there is no general theory of existence and
uniqueness for nonlinear equations. In general, many solutions can
exist and there is no way of knowing a priori where they are or how
many there are. To find solutions to nonlinear equations, one almost
always needs to make an initial guess and use an iterative method to
find a solution. A powerful and general method for doing this is called
NEWTON-RAPHSON
iteration.
Consideran initialguess x and assume for the moment that the
exact solution xe is givenby x + d, where d is as yet unknown, but is

assumed to be small, i.e., the initial guess is good. In this case
f(xe) = f (x + d) = 0
We next expandingthe right-hand side in a Taylor series aroundx.
It is now convenientto switch to component notation to express the
second-orderTaylor series approximation for vector f
1 2fi
3)
djdl + O (lld11
2 xjXl x
where the notation O(P) denotes terms that are "of order P,"which
means that they decayto zero at least as fast as P in the limit (5-10,
fi
fi(x + d) = fi(x) +
xj x

of
1.3Systems
31
dx x
f(x)
f(x+)
x
method for solving
Figure 1.3: An iteration of the Newton-Raphson
f(x) = 0 in the scalar case.
the terms
Anapproximatesolution to this equation can be found if
yielding the
thatare quadratic and higher degree in d are neglected,
linearizedproblem
x x
Settingf(x+d) = 0 and defining the JAc0BIANmatrixJij(x) = fi/xj
thiscanbe rearranged into the linear system
J(x)d = f(x)
Thisequation can be solved for d (e.g., by LUdecomposition) to yield a
newguess for the solution x + = x + d in which we use the superscript
x+ to denote the variable x at the next iterate. Denoting the solution
byd = J-1
the process can be summarized as
x + = x J -1 (x)f(x)
(1.8)
32
Linear
Algebra
This equation is iterated until llx+ xll or

reaches a
prescribed
error tolerance. One iteration of (1.8) is depicted for a scalar
function
in Figure 1.3.
An important question for any iterative method is how

converges.To address this issue for the Newton-Raphson rapidlyit
method,let
e = x xe be the difference between the approximate solution
and the
+
exact solution. Similarly, 6+ = x xe and therefore c+

Usingthis result and (1.8),the evolution equation for the error
is
Taylor expandingthis equation around xe yields, again in

indexnotation due to the Taylor series,
Jj1
Ei+= Ei Ji-jllxe+ Xl
El + O
1 Jjk
O + JjklxeEk +
EkEl + O
2 Xl
Ji-j1
I -1 Jjk
Jjk
+
-J
X1
2 ij Xl
= Jj1Jjk I
ikEk+
EIEk
Ji-j1
I -1 Jjk
Jjk
+
J
X1
EIEk + O
Ji-j1
X1
Jjk + -J -1 Jjk
2 ij Xl
16k+0 (11611
3)
-1 Jjk
(Jj1Jjk)
AJ
Xl
2 ij Xl
1 -l Jjk
ik
-J
Xl
Jjk
I 1
EIEk+ O (116113)
EIEk+ O (116113)
+ O (11611
3)
This result, which we can summarize

illustrates
as 116+11
= O (116112),
that given a sufficiently good guess,
the Newton-Raphson iteration Converges rapidly, specificallyquadratically,
to the exact solution.
For example,if the error in iteration
error after step k + 1 is
(1.8)after step k is 10-2,the
10-4 and after step k + 2 is
10-8. Inde d
| 4 The
Problem
Algebraic Eigenvalue
33
of whether a code for implementing Newton-Raphsonis

check
a good
to v
correctis a sufficiently good guess is given. If the initial guess is
onlyholds if
not converge, or alternately may converge to a
the iteration may
from the initial guess.
far
solution
Coordinate Transformations
Linear
13.11
the components of a matrix operator depend on the
above,
noted
As
in which it is expressed. Here we illustrate how the
coordinatesystem
operator change upon a change in coordinate
componentsof a matrix
vectors x and y and a matrix operator A, where
system.Consider two
take x and y to be two-dimensional,in
Ax. For example, we can
which case
x and x12,where
Nowconsider new variables
x'l = TIIXI + T12 X 2
X'2= T21Xl +
the same vector, but

Thiscanbe written x' TX. Here x and x' are
coordinate
representedin the original (unprimed) and new (primed)
valsystems,and T is the operator that generates the new coordinate
there is
uesfromthe original ones. It must be invertibleotherwise
we
nota unique mapping between the coordinate systems. Therefore,
x = T-l x' and y = AT-I x'; the matrix AT-I yields the
canWTite
mappingbetween x' and y. If we also consider a coordinate transfor-
mationof the vector y of the form y' = Wy, then y' = WAT-1x'
Thematrix WAT-I provides the mapping from x' to y'. Some importantcoordinatetransformations that take advantage of the properties
of the operator A are described in Section 1.4.
1.4 The Algebraic Eigenvalue Problem

1.4.1 Introduction
Eigenvalue
problems arise in a variety of contexts. One of the most
importantis in the solution of systems of linear ordinary differential
equations.Consider the system of two ordinary differential equations
dz
dt
= Az
(1.9)
34
Linear
Here z e R2 and A e R2x2.If we guess, based on what we
Algebra
know about
then we have that

Ax = Rx
(1.10)
If we can find a solution to this equation, then we have a

(1.9). (To obtain the general solution to (1.9) we must findsolutionto
two solus
tions to this problem.) This is the algebraic version of the
EIGENVALUE
PROBLEM.
The eigenvalue problem can be rewritten as the homogeneous

sys.
tem of equations
(A - 1)x = O
As with any homogeneous system, this generally has only the
trivial
solution x = 0. For special values of R, known as the EIGENVALUES
of A, the equation has a nontrivial solution, however. The solutions
corresponding to these eigenvalues, which can be real or complex,are
the EIGENVECTORS
of A. Geometrically, the eigenvectors of A are those
vectors that change only a scalar multiple when operated on by A. This
property is of great importance because for the eigenvectors,matrix
multiplication reduces to simple scalar multiplication; Ax can be replaced by Rx. Becauseof this property, the eigenvectors of a matrix
provide a natural coordinate system for working with that matrix. This
fact is used extensively in applied mathematics.
From the existence and uniqueness results for linear systemsof

equations that we saw in Section 1.3.1, we know that the abovehomogeneous problem has a nontrivial solution if an only if A RI is
noninvertible: that is, when
det(A - M) = O
This equation is called the CHARACTERISTIC

for A, and det(AEQUATION
M) is the CHARACTERISTIC
POLYNOMIAL.
For an n x n matrix; this poly-
nomial is always nth degree in A;this can be seen by performingLU
decomposition on A M; therefore, the characteristic polynomial has n

roots (not necessarily all real or distinct). Each root is an eigenvalue,so
an n x n matrix has exactly n eigenvalues. Each distinct eigenvaluehas
a distinct (i.e. linearly independent) eigenvector. Each set of repeated
roots will have at least one distinct eigenvector,but may havefewr

than the multiplicity of the root. So a matrix may have fewerthan
4 The
35
Problem
The nature of the eigenvectors depends on the structure
the eigenvalues of a matrix may be found by finding

principle,
In
characteristic polynomial. Since polynomials of degree

its
of
theroots
four cannot be factored analytically, approximate numerithan
greater
must be used for virtually all matrix eigenvalue problems.
methods
cal
methods for finding the roots of a polynomial, but
Thereare numerical
this procedure is difficult and inefficient. An extremely
practice,
in
method, based on the QR factorization of a matrix (Exrobustiterative
most commonly used technique for general matrices.
ercise1.38),is the
only the "dominant" eigenvalue (the eigenvalue with the
cases,
some
In
(Exercise
METHOD
be found. The POWER
largestmagnitude)needs to
technique for this problem. Generalizations of
1.57)is a rapid iterative
method form the basis of powerful KRYLOV
theideabehind the power
methods for iterative solutions of many computational linear
sUBSPACE
Bau Ill, 1997).
algebraproblems (Trefethen and
1.42 Self-Adjoint Matrices
Considerthe real symmetric (thus self-adjoint) matrix
Thecharacteristicequation for A is 2 4 + 3 = 0 and its solutions

are = 1,2= 3. The corresponding eigenvectors x = VI and x = v2
are solutions to
Thesesolutionsare (to within an arbitrary multiplicativeconstant)

1
-1
Notethat these vectors, when normalized to have unit length, form an
ONbasis for R2. Now let
AVectorx in R2 can now be represented in two coordinate systems,

eitherthe originalbasis or the eigenvectorbasis. A representation in
Linear Algebra
36
by a ', so x' tx'l x'21T

the eigenvectorbasis will be indicated
is the
of
x
expressed
in
coordinates
the
eigenvector
vector containingthe
coordinate
the
that
transformation
shown
be
basis. It can
between
these bases is definedby Q, so that x = Qx' and x' = Q-lx. Remember
that A is defined in the original basis so Ax makes sense, but Ax' does
not. However,we can write
Ax = A(xlV1 + x5V2)
= x'1AV1+ x!2AV2
= x'1IV1+ x!22V2
= QAx'
where
Therefore,Ax = QAx'. Usingthe transformationx' = Q-lx gives
that Ax = QAQ-1x, or A = QAQ-I . This expression can be reduced

further by noting that since the columns of Q form an orthonormal
basis, QkiQkj= ij, or QTQ = I. Since Q-I Q = I by definition,
it
follows that Q-1 = QT. Matrices for which this property holdsare
called ORTHOGONAL.
In the complex case, the property becomes Q-1
QT and Q is said to be UNITARY.Returning to the example, the property
means that A can be expressed
A = QAQT
As an example of the usefulness of this result, consider the system
of equations
= Ax = QAQTx
dt
By multiplying both sides of the equation by QT and using the facts

that QTQ = I and x' = QTx, the equation can be rewritten
dt
or dX1/dt = Xl, dx2/dt = 3x2. In the eigenvectorbasis, the differential equations are decoupled. They can be solved separately.
The above representation of A can be found for any matrixA that
satisfies the self-adjointness condition A = T. We have the foll(Ning
theorem.
| 4 The
Problem
37
is self(Self-adjointmatrix decomposition). If A e
1.11
Tbeorem
thereexistsa unitary Q e cnxn and real, diagonal A e
then
adjoint,
nxnsuch that
A = QAQ*
of A, Aii, are the eigenvalues of A. The

elements
diagonal
The
real, even if A is not. The columns of the matrix Q
all
are
eigenvalUeS
eigenvectors Vi corresponding to the eigenvalues.
(normalized)
arethe
are orthonormal and form a basis for cn.
eigenvectors
The
every self-adjoint matrix operator, there
Thisresult shows that for
basis, in which the matrix becomes diagonal.
orthogonal
natural
a
is
the matrix. Since the eigenDIAGONALIZES
Thatis, the transformation
multiplication reduces to simple contraction
valuesare all real, matrix
(eigenvector)coordinate axes. In this basis, any
orstretchingalong the
equations containing Ax relinearsystemsof algebraic or differential
equations.
duceto n decoupled
We have
Thatthe eigenvalues are real can be established as follows.
noting that A* = A.
Av= liv and, by taking adjoints, v *A = v*, after
the first on the left by v* and the second on the right by v

Multiply
andsubtractto obtain 0 = (R )v*v. We have that v* v is not zero
sincev * 0 is an eigenvector, and therefore = and is real.

IfAhas distinct eigenvalues, the eigenvectors are orthogonal, which
is alsoreadilyestablished. Given an eigenvalue and corresponding
Vi,we have that Avi = iVi.Let (j,vj) be another eigeneigenvector
pairsothat Avj = jvj. MultiplyingAvi = iVion the left by , and
and subtracting gives (Ri j) (Vi*vj) = 0.
Avj= j on the left by
Iftheeigenvaluesare distinct, * j this equation can hold only if
vtvj = 0, and therefore Vi and vj are orthogonal.
Forthe case of repeated eigenvalues, since orthogonality holds for
eigenvalues
that are arbitrarily close together but unequal, we might
expectintuitivelythat it continues to hold when the eigenvaluesbecomeequal.This turns out to be true, and we delay the proof until we
haveintroducedthe Schur decomposition in Section 1.4.6.
1.43 General(Square) Matrices

Although
manymatrices arising in applications are self-adjoint, many
othersare not, so it is important to include the results for these cases.
Nowtheeigenvectorsdo not necessarily form an ONbasis, nor can the
matrixalwaysbe diagonalized. But it is possible to come fairly close.
Thereare three
cases:
Linear Algebra
38
1. If A is not self-adjoint, but has distinct eigenvalues (Ri * j, i * j),

then A can be diagonalized
A-SAS 1
(1.11)
As before, A = S-IAS is diagonal,and contains the eigenvalues

(not necessarily real) of A. The columns of S contain the corresponding eigenvectors. The eigenvectors are LI, so they form a
basis, but are not orthogonal.
2. If A is not self-adjoint and has repeated eigenvalues, it may still be
the case that the repeated eigenvalues have distinct eigenvectors
e.g. a root with multiplicity two that has two linearly independent
eigenvectors. Here A can be diagonalized as above.
3. If A is not self-adjoint and has repeated eigenvalues that do not

yield distinct eigenvectors,it cannot be completely
a matrix of this type is called DEFECTIVE.
Nevertheless,it can
always be put into Jordan form J
A = MJM -I
(1.12)
where J = M-IAM is organized as follows: each distinct eigenvalue appears on the diagonal with the nondiagonal elements of
the correspondingrow and column being zero, just as above,
However, repeated eigenvalues appear in JORDANBLOCKS
with
this structure (shown here for an eigenvalue of multiplicity three)
In the case of repeated eigenvalues,we can distinguish between

ALGEBRAIC
multiplicity and GEOMETRIC
multiplicity. Algebraic
multiplicityof an eigenvalueis simply its multiplicityas a root
of the characteristic equation. Geometricmultiplicity is the number of distinct eigenvectors that correspond to the repeated eigenvalue. In case 2 above, the geometric multiplicity of each repeated
eigenvalueis equal to its algebraic multiplicity. In case 3, the algebraic multiplicity exceeds the geometric multiplicity.
For a non-self-adjoint5 by 5 matrix with repeated eigenvalues
4 The
Problem
37
is self(Self-adjoint matrix decomposition). If A e

1.11
Tbeorem
existsa unitary Q e cnxn and real, diagonalA e
adjoint,then there
ttxnsuch that
A = QAQ*
of A, Aii, are the eigenvalues of A. The

The diagonalelements
are all real, even if A is not. The columns of the matrix Q
eigenvalUeS
eigenvectors Vi corresponding to the eigenvalues.
arethe (normalized) orthonormal and form a basis for Ctl.
are
Theeigenvectors that for every self-adjoint matrix operator, there
Thisresult shows
basis, in which the matrix becomes diagonal.
is a natural orthogonal
the matrix. Since the eigenDIAGONALIZES
Thatis, the transformation
reduces to simple contraction
valuesare all real, matrix multiplication
or stretchingalong the (eigenvector)coordinate axes. In this basis, any
linearsystems of algebraic or differential equations containing Ax re-
duceto n decoupled equations.

That the eigenvalues are real can be established as follows. We have
Av = Av and, by taking adjoints, v* A = v*, after noting that A* = A.
the first on the left by v* and the second on the right by v

Multiply
andsubtractto obtain 0 = (A)v*v. We have that v* v is not zero
sincev * 0 is an eigenvector,and therefore = and is real.
IfA has distinct eigenvalues, the eigenvectors are orthogonal, which

is alsoreadily established. Given an eigenvalue and corresponding
eigenvectorVi,we have that Avi = iVi.Let (j,vj) be another eigen-
pairso that Avj = jvj. MultiplyingAvi = iVion the left by v}, and
Avj = on the left by
and subtracting gives (Ri
= 0.
If the eigenvaluesare distinct, *
this equation can hold only if
Vi*vj= 0, and therefore Vi and vj are orthogonal.
Forthe case of repeated eigenvalues, since orthogonality holds for
eigenvaluesthat are arbitrarily close together but unequal, we might
expectintuitivelythat it continues to hold when the eigenvalues becomeequal. This turns out to be true, and we delay the proof until we
haveintroduced the Schur decomposition in Section 1.4.6.
1.4.3 General (Square) Matrices
Althoughmany matrices arising in applications are

self-adjoint, many
Othersare not, so it is important to
include the results for these cases.
Nowthe eigenvectorsdo not
necessarily form an ONbasis, nor can the
matrixalwaysbe diagonalized.
But it is possible to come fairly close.
Thereare
three cases:
38
Linear
AlgebtQ
but has distinct eigenvalues

1. If A is not self-adjoint,
(Ri *
diagonalized
then A can be
j),
A = SAS-I
(1.11)
As before,A = S-IAS is diagonal, and contains the

(not necessarilyreal) of A. The columns of S containeigenvalues
the
sponding eigenvectors. The eigenvectors are
LI, so they
form
2. IfA is not self-adjoint and has repeated eigenvalues, it may

stillbe
the case that the repeated eigenvalues have distinct eigenvectors
e.g. a root with multiplicity two that has two linearly
independent
eigenvectors. Here A can be diagonalized as above.
3. If A is not self-adjoint and has repeated eigenvalues that do not

yield distinct eigenvectors, it cannot be completely diagonalized;
a manixof this type is called DEFECTIVE.
Nevertheless, it can
always be put into Jordan form J
A = MJM -1
(1.12)
where J = M-IAM is organized as follows: each distinct eigenvalue appears on the diagonal with the nondiagonal elementsof
the correspondingrow and column being zero, just as above.
However,repeated eigenvalues appear in JORDANBLOCKS
with
this structure (shown here for an eigenvalue of multiplicity three)
In the case of repeated eigenvalues, we can distinguish between

multiplicity. Algebraic
ALGEBRAIC
multiplicity and GEOMETRIC
multiplicityof an eigenvalue is simply its multiplicity as a root
of the characteristic equation. Geometric multiplicity is the number of distinct eigenvectors that correspond to the repeated eigenvalue. In case 2 above, the geometric multiplicity of each repeated
al-
eigenvalueis equal to its algebraic multiplicity. In case 3, the

gebraic multiplicity exceeds the geometric multiplicity.
For a non-self-adjoint5 by 5 matrix with repeated eigenvalues
| 4 Te
39
Problem
are
The eigenvectors corresponding to the distinct eigenvalues
the corresponding columns
of M. A distinct eigenvector does not
existfor each of the repeated eigenvalues,but a GENERALIZED
can be found for each occurrence of the eigenvalue.

EIGENVECTOR
These vectors, along with the eigenvectors, form a basis for R?i.
nonsymmetric matrix
Example1.12; A
eigenvectors of the nonsymmetric matrix
Findthe eigenvalues and
andshowthat it can be put in the form of (1.11).

Solution
Thismatrix has characteristic equation (1 R)(3 R) = 0 and thus has

eigenvalues = 1, R = 3. For = 1, the eigenvector solves
andit is straightforward to see that this is satisfied by [Xl, X2JT= VI =
[1, OJT. For R = 3 we have
whichhas solution v2 = [1, IJ T. Here VI and v'2 are not orthogonal,

but they are LI, so they still form a basis. Letting
onecan determine that

O
Linear Algebra
40
Sincethe columns of S are not orthogonal, they cannot be normalized

to form a matrix that satisfies S-1 = ST. Nevertheless, A can be diagonalized
s -I AS = A =
Example 1.13: A defective matrix

Find the eigenvalues and eigenvectors of the nonsymmetric matrix
and show that it cannot be put in the form of (1.11),but can be put in
the form of (1.12).
Solution
The characteristic equation for A is (3 = 0, so A has the repeated
eigenvalue = 3. The eigenvector is determined from
which has solution x = VI = [1,01T. There is not another nontrivial

solution to this equation so the repeated eigenvalue = 3 has onlyone
eigenvector. We cannot diagonalize this A.
Nevertheless, we will seek to nearly diagonalize it, by finding a gen-
eralized eigenvectorv2 that allows us to construct a matrix
satisfying
bf-I AM = J =
Multiplyingboth sides of this equation by M yields that

1
which can be rearranged to
-t
Algebraic
| 4 The
Eigenvalue Problem
41
can be rewritten as the pair of equations

Thisequation
Thefirst of these is simply the equation determining the true eigenvector VI,while the second will give us the generalized eigenvector v2. For
this equation is
thepresent problem
Asolutionto this equation is v2 = [0, 1/2]T. (Any solution v'2must be

LIfrom VI. Why?) Constructing the matrix
0 1/2
onecan show that
o 2
and that
J = M -I AM =
Notethat we can replace v2 by va +
for any
and still obtain this
result.
1.4.4 Positive Definite Matrices
Positivedefinite and positive semidefinite matrices show up often in

applications.Here are some basic facts about them. In the following,A
is real and symmetric and B is real. The matrix A is POSITIVEDEFINITE

(denotedA > 0), if
x TAx > 0, V nonzero x e [R?t

Thematrix A is POSITIVESEMIDEFINITE
(denoted A
x TAx
0,
V x e Rn
Youshould be able to prove the following facts.
O,
e eig(A)
0), if
42
Linear Algebra
2. A 0
R 20, Reeig(A)
3.
B TABO
VB
BTAB > 0
5. A > 0 and B full column rank
BTAB > O
4. A > 0 and B nonsingular

6. Al > 0,A2
7. A > O
8. For A 0,
A = Al + A2 > O
Az > 0 V nonzero z e
Ax = O
If symmetric matrix A is not positive semidefinite nor negative semidefinite, then it is termed indefinite. In this case A has both positiveand
negative eigenvalues.
1.4.5 Eigenvalues, Eigenvectors, and Coordinate Transformations

Under the general linear transformation
Y = Ax
(1.13)
all the components of the vector y are coupled to all the components
of the vector x via the elements of A, all of which are generally nonzero.
We can always rewrite this transformation using the eigenvaluedecomposition as
Y = MJM-1x
Now consider the coordinate transformation x' = M-I x and y' =
M-1y. In this new coordinate system, the linear transformation, (1.13)
becomes
Y' = Jx'
In the "worst case scenario,"J has eigenvalueson the diagonal,some

values of 1 just above the diagonal and is otherwise zero. In the more
usual scenario J = A and each component of y' is coupled only to one

component of x'the coordinate transformation associated with the
eigenvectors of A provides a coordinate system in which the different
components are decoupled. This result is powerful and is used in a
wide variety of applications.
Further considering the idea of coordinate transformationsleads

naturally to the question of the dependence of the eigenvalueproblem
on the coordinate system that is used to set up the problem. Giventhat
Ax = Rx
1.4
43
Eigenvalue Problem
TheAlgebraic
TX,where T is invertiblebut otherwise arbitrary; this

let us take x' =
unprimed
expressionrepresents a coordinate transformation between
1.3.11.
as we have already described in Section
andprimed coordinates,
and thus
Nowx = T-lx',
AT-I x' = AT-I x'
side
Multiplyingboth sides by T to eliminate T-1 on the right-hand
yields
TAT I x' = Rx'
are the
Recallthat we have done nothing to the eigenvaluesthey eigenthe
samein the last equation of this sequence as the first. Thus
if two
valuesof TAT-I are the same as the eigenvalues of A. Therefore,
a
matricesare related by a transformation B = TAT-I, which is called
their eigenvalues are the same. In other
TRANSFORMATION,
SIMILARITY
under similarity transwords,eigenvaluesof a matrix are INVARIANT
formations.
In many situations, invariants other than the eigenvalues are used.
Thesecan be expressed in terms of the eigenvalues. The two most

commonare the TRACEof a matrix A
trA =
Aii
and the determinant
= (-1) m
n
i=l
Uii -
Example1.14: Vibrational modes of a molecule
Theindividualatoms that make up a molecule vibrate around their

equilibriumpositions and orientations. These vibrations can be used
to characterize the molecule by spectroscopy and are important in determiningmany of its properties, such as heat capacity and reactivity.
Weexaminehere a simple model of a molecule to illustrate the origin
and nature of these vibrations.
Let the ath atom of a molecule be at position xa = [xa, ya, za] T and
havemass ma. The bond energy of the molecule is U (Xl, x2, x3, ,.. , XN)
whereN is the number of atoms in the molecule. Newton's second law
for each atom is
U(x1, ,XN)
d2xa
dt2
xa
Linear Algebra
44
and M be a 3N x 3N
Let X =
on the diagonals. That

diagonalmatrix with the masses of each atom
= M66 =
is, Mll M22= M33= ml, M44 = M55
= TIN. Now the equations of motion for the
= M3N,3N
M3N-1,3N-1
coordinates of the atom become
U(X)
d2xj
Mij dt2
An equilibriumshape Xeqof the molecule is a minimum of the bond
energy U, and can be found by Newton-Raphson iteration on the problem

= 0. AssumeXeqis known and characterize small-amplitude
vibrations around that shape.
Soluon
Let R = X Xeqbe a small perturbation away from the equilibrium

d2x
shape. Because+ = O,this perturbation satisfies the equation
d2Rj
Mij dt2
U(Xeq+ R)
Taylor expandingthe right-hand side of this *tion,

that
xeq= 0, and neglecting terms of O(IIRII ) yield

U(Xeq + R)
xt
where
Hik =
using the fact
HikXk
2U
XiXk xeq
is called the HESSIAN

matrix for the function U. Thus the governing
equation for the vibrations is given by
Mijd2xj = -HikXk
By definition, H is symmetric. Furthermore, rigidly translating the en-
tire moleculedoes not change its bond energy, so H has three zero
eigenvalues,with eigenvectors
o, 1,01 T
o, o, 11T
1.4 TheAlgebraic Eigenvalue Problem
45
Thesecorrespondto movingthe wholemoleculein the
x, y, 'and z
directionS,respectively. Furthermore, because Xeq
is a minimum of
energy,
H
is
also
bond
the
positive semidefinite.
Weexpect the molecule to vibrate, so we will
seek oscillatory solutions. A convenient way to do so is to let
R(t) = ze iwt + e -iwt
cos wt + i sin
recallingthat for real w, ei(0t
wt. Substituting into the
equation
yields
governing
-w 2Mij(Zje iWt+ je -iwt ) = -Hik(ae iWt+ ae -Wt)
Gatheringterms proportional to eiwt and e-iwt, we can see that this

equationwill be satisfied at all times if and only if
(0 2Mijzj = HikZk
(1.14)
Thislooks similar to the linear eigenvalue problem, (1.10), and reduces

exactlyto one if all atoms have the same mass m (in which case M
ml).
Wecan learn more about this problem by considering the properties of M and H. Since M is diagonal and the atomic masses are positive,M is clearly positive definite. Also recall that H is symmetric
positive semidefinite. Writing M = L2, where L is diagonal and its diagonalentries are the square roots of the masses, we can qmitethat
w2L2Z = I-IZ, Multiplyingby L-1 on the left yields w 2LZ = L-I HZ
and letting = LZ results in w 2 = L-I HL-I . This has the form
of an eigenvalue problem f-I = w 2, where fl = LA I-IL-I . Solving
this eigenvalueproblem gives the frequencies w at which the molecule
vibrates. The corresponding eigenvectors , when transformed back
into the original coordinates via Z = L-I , give the so-called "normal
modes."Each frequency is associated with a mode of vibration that in
generalinvolves different atoms of the molecule in different ways. -Becausefl is symmetric, these modes form an orthogonal basis in which
to describethe motions of the molecule. A further result can be obtained by multiplying (1.14) on the left by ZT, yielding
w 2z TMZ = z THZ
BecauseZTMZ > 0 and ZTHZ 0, we can concludethat (02 0 with

equality only when Z is a zero eigenvectorof H. This result shows
Linear Algebra
46
that the frequenciesw are real and thus that the dynamics are purely
oscillatory.
Observe that the quantity ZTMZ arises naturally in this problem: via
the transformation = LZ it is equivalentto the inner product T
It is straightforward to show that for any symmetric positive definite

W, the quantity xTWy satisfies all the conditions of an inner product
between real vectors x and y; it is called a WEIGHTED
inner product.
eigenvectors
are
the
orthogonal
In the current case,
under the usual
"unweighted"inner product, in which case the vectors Z = L-I are
orthogonal under the weighted inner product with W = M.
1.4.6 Schur Decomposition
A major problem with using the Jordan form when doing calculations
on matrices that have repeated eigenvalues is that the Jordan form is
numerically unstable. For matrices with repeated eigenvalues, if di-
agonalizationis not possible,it is usually better computationallyto
use the Schur form instead of the Jordan form. The Schur form only
triangularizes the matrix. Triangularizing a matrix, even one with repeated eigenvalues, is numerically well conditioned. Golub and Van
Loan (1996, p.313) provide the following theorem.
Theorem 1.15 (Schur decomposition). If A e
unitary Q e c nxn such that
then there existsa
Q*AQ = T
in which T is upper triangular.
The proof of this theorem is discussed in Exercise 1.43. Note that

even though T is upper triangular instead of diagonal,
its diagonal elements are still its eigenvalues. The eigenvalues
of T are also equal
to the eigenvaluesof A because T is a the result
of a similaritytransformation of A. Even if A is a real matrix, T can
be complex because
the eigenvalues of a real matrix may come
in complex conjugate pairs.
Recall a matrix Q is unitary if Q *Q I. You
should also be able to
prove the following facts (Horn and Johnson,
1985).
1. If A e
and BA = I for some B e
then
(a) A is nonsingular
(b) B is unique
Algebraic
1.4 The
47
Eigenvalue Problem
is unitary if and only if

2, The matrix Q
1
(a) Q is nonsingular and Q* = Q
(b) QQ* = I
(c) Q* is unitary
(d) The rows of Q form an orthonormal set
(e) The columns of Q form an orthonormal set
If A is self-adjoint,then by taking adjoints of both sides of the Schur
decompositionequality,we have that T is real and diagonal, and the
columnsof Q are the eigenvectors of A, which is one way to show that
the eigenvectorsof a self-adjoint matrix are orthogonal, regardless of
whetherthe eigenvaluesare distinct. Recallthat we delayed the proof
of this assertion in Section 1.42 until we had introduced the Schur
decomposition.
If A is real and symmetric, then not only is T real and diagonal, but
Q can be chosen real and orthogonal. This fact can be established by
notingthat if complex-valuedq = a + bi is an eigenvector of A, then
so are both real-valued vectors a and b. And if complex eigenvector qj
is orthogonalto qk, then real eigenvectors aj and bj are orthogonal to
real eigenvectorsak and bk, respectively. -The theorem summarizing
this case is the following (Golub and Van Loan, 1996, p.393), where,
again,it does not matter if the eigenvalues of A are repeated.
Theorem 1.16 (Symmetric Schur decomposition). If A e
is sym-
metric,then there existsa real, orthogonal Q and a real, diagonal A
suchthat
Q TAQ = A = diag(1,
,n)
wherediag(a, b, c, .. .) denotes a diagonal matrix with elements a, b, c
on the diagonal.
Notethat the {M}are the eigenvaluesof A and the columns of Q,
{qi},are the corresponding eigenvectors.

For real but not necessarily symmetric A, you can restrict yourself
to real matrices by using the real Schur decomposition (Golub and Van
Loan,1996,p.341). But the price you pay is that you can achieve only
blockupper triangular T, rather than strictly upper triangular T.
48
Linear Algebra
Theorem 1.17 (RealSchur decomposition). IfA e R'txn then there

ists a real, orthogonal Q such that
Rim
QTAQ =
R22
R2m
Rmm
in which each Rii is either a real scalar or a 2x2 real matrix having com-
plex conjugate eigenvalues; the eigenvalues of Rii are the eigenvalues
ofA.
1.4.7 Singular Value Decomposition

Another highly useful matrix decomposition that can be applied to nonsquare in addition to square matrices is the singular value decomposition (SVD).Any matrix A e c mxn has an SVD
A = USV*
in which U e
and V e
are square and unitary
V*V = VV* = In
and S - Rtnxnis partitioned as
O(mr)xr
in which r is the rank of the A matrix. The matrix 2 is diagonaland

real
in which the diagonal elements, iare known as the singular values of

matrix A. The singular values are real and positive and can be ordered
from largest to smallest as indicated above,
Connection of SVD and eigenvalue decomposition. Given A e cmxn
with rank r, consider the Hermitian matrix AA* Rtnxm,also of

e
rank r. Wecan deduce that the eigenvalues of
AA* are real and nona
negative as follows. Given (R,v) are an eigenpair of AA*, we have
1.4 The Algebraic
49
Eigenva/ue Problem
AA*v= v,v * O. Taking inner products of both sides with respect

to v and solving for gives = v*AA*v/v*v. We know v* v is
a real, positive scalar since v * 0. Let y = A*v and we have that
= y*y/v*v and we know that y* y is a real scalar and y* y 0.
O. And we can connect the eigenvalues and
Therefore is real and
of AA* to the singular values and vectors of A. The r
eigenveCtS
nonzero eigenvalues of AA* (i)are the squares of the singular values
(i)and the eigenvectors of AA* (qi) are the columns of U (ui)
= i2(A) i = 1 r
Nextconsider the Hermitian matrix A*A e Rnxn,also of rank r. The

r nonzero eigenvalues of A*A (M)are also the squares of the singular
values (i)and the eigenvectors of A*A (Qi)are the columns of V (Vi)
= i
2(A) i = 1
Theseresults follow from substituting the SVDinto both products and

comparingwith the eigenvalue decomposition
=
= US2U* = QAQ 1
-1
= VS 2V* = QAQ
Real matrix with full row rank. Consider a real matrix A with more
columnsthan rows (wide matrix, m < n) and full row rank, r = m. In
this case both U and V are real and orthogonal, and the SVDtakes the
form
in whichVI contains the first m columns of V, and V2contains the

remainingm n columns. Multiplyingthe partitioned matrices gives
T
A = UEVI
and notice that we do not need to store the V2matrix if we wish to
representA. This fact is handy if A has many more columns than
rows,n > m because V2e
requires a large amount of storage
compared to A.
Linear Algebra
50
Ax b
R(At)
RCA)
(ui)l-l
N(AT)
Figure 1.4: The four fundamentalsubspaces of matrix A = USVT.
The range of A is spannedby the first r columnsof

The range AT is spanned by the first
U,
r columnsof V, {VI, ...,vr}. The null space of A is
spanned by
and the null space of AT is
spanned by
Real matrix with full column rank. Next consider the case in which
real matrix A has more rows than columns (tall matrix, m > n) and full
column rank. In this case the SVDtakes the form
2
in which UI contains the first n columns of U, and U2 contains the
remaining n m columns. Multiplyingthe partitioned matrices gives
A = UIEV T
and notice that we do not need to store the U2 matrix if we wish to

represent A.
1.4 The
Algebraic Eigenvalue Problem
51
SVDand fundamental theorem of linear algebra. The SVD provides

an orthogonal decomposition of all four of the fundamental subspaces
of matrix A. Consider first the partitioned SVDfor real-valued A
A = [UI
U2
VIT
Nowconsider AVkin which k r + 1. Because VI,is orthogonal to

vr}, we have AVk = 0, and these n r orthogonal VI,
VI = {VI,... ,
span the null space of A. Because the columns of VI are orthogonal to
this set, they span the range of AT. Transposing the previous equation
gives
A T = VIEUIT
andwe have {ur+l,... , um} span the null space of AT. Becausethe
columnsof UI are orthogonal to this set, they span the range of A.
Theseresults are summarized in Figure 1.4.

SVD and least-squares problems. We already have shown that if
A has independent columns, the unique least-squares solution to the
overdeterminedproblem
minllAx
b11
is given by
Xls = (A TA) 1
ATb
Xls = At b
TheSVDalso provides a means to compute Xls. For real A, the SVD

satisfies
S.
AT=
A = UIEV T
A T A = V2UTUIEV
T = vz 2v T
Thepseudoinverseis therefore given by
A t = v2 -2v TvE(JIT
At =
I UIT
and the least-squares solution is

Xls = VE -1 U Tb
52
Linear
Algebra
SVD and underdetermined problems. We already have

shownthat
if A has independent rows, the unique minimum-norm
solution to the
min llx112 subject to Ax = b
is given by
Xmn = A T (AA T ) b
The SVDalso provides a means to compute xmn. In this case

we
A = UEVITand substituting this into the minimum-norm solution have
gives
xmn = VIE-IUTb
Note the similarity to the least-squares solution above.
1.5 Functions of Matrices

1.5.1 Polynomial and Exponential
We have already defined some functions of square matrices usingmatrix multiplication and addition. These operations create the classof
polynomial functions
+ aol
with A e c nxn, e C, i = 1, n. Wewish to expand this set of functions so that we have convenient ways to express solutions to coupled
sets of differential equations, for example. Probably the most important function for use in applications is the matrix exponential.The
standard exponential of a scalar can be defined in terms of its Taylor
series
2!
3!
This series convergesfor all a e C. Noticethat this expressionis an

infinite-order series and therefore not a polynomial function. Wecan
proceed to define the matrix exponential analogously
2!
3!
and this series convergesfor all A e cnxn. Let's see why the matrix
exponential is so useful. Consider first the scalar first-order linear dif-
ferential equation
dx = ax
dt
x(O) = xo
x e
e R
1.5 Functions
53
of Matrices
is
whicharises in the simplest chemical kinetics models. The solution
givenby
x(t) = xoeat
and this is probably the first and most important differential equation
that is discussed in the introductory differential equations course. By
definingthe matrix exponential we have the solution to all coupled sets
of linear first-order differential equations. Consider the coupled set of
linearfirst-orderdifferential equations
all a12
dt
xn
aln
a2n
anl an2
ann
xn
with initial condition

X10
X2(0)
X20
xno
whichwe express compactly as
dx
dt
= Ax
x(O) = xo
x e Rn , A e [Rnxn
(1.15)
Thepayoff for knowing the solution to the scalar version is that we also
knowthe solution to the matrix version. Wepropose as the solution
x(t) = eAtxo
(1.16)
Noticethat we must put the xo after the eAt so that the matrix multiplicationon the right-hand side is defined and gives the required n x 1
columnvector for x(t). Let's establish that this proposed solution is
indeedthe solution to (1.15). Substituting t = 0 to check the initial
condition gives
x(0) = eA0 xo = e oxo = Ixo = xo
Linear Algebra
and the initial condition is satisfied. Next differentiating the matrix

exponentialwith respect to scalar time gives
g e.4t= (1 + tA + A2+ LA3 +
2!
dt
3!
2!
+ )
+ LAI + t2A2
1!
2!
We have shov,mthat the scalar derivative formula d/dt(eat) = aeat

also holds for the matrix case, d/dt(eAt) = AeAt. We also could have
factored the A to the right instead of the left side in the derivation above
to obtain d/dt(eAt) = eAtA.Note that although matrix multiplication

does not commute in general, it does commute for certain matrices,
such as eAt and powers of A. Finally, substituting the derivative result
into (1.15)gives
= A (eAtxo) = (AeAt)xo = A(eAtxo) = Ax

and we see that the differential equation also is satisfied.
Another insight into functions of matrices is obtained when we consider their eigenvalue decomposition. Let A = SAS-I in which we assume for simplicitythat the eigenvalues of A are not repeated so that
A is diagonal. First we see that powers of A can be written as follows
for p 1
ptimes
(SAS-I )
ptimes
= SAPS-I
Substituting the eigenvalue decomposition into the definition of the
1.5
Functions of
55
Matrices
exponential gives
matrix
t3
t2
eAt = 1 + tA + -A2 + ----A3+
3!
2!
2
= SS -I + tSAS -1 + SRS
2!
3!
eAt = seAts1
examining the
Therefore,we can determine the time behavior of eAtby
behaviorof
e1t
eAt
e2t
ent
andwededuce that eAtasymptotically approaches zero as t -+ 00if and
n. We also know that eAt is oscillatoryif

onlyifRe(i)< 0, t' 1
anyeigenvaluehas a nonzero imaginary part, and so on.
The matrix exponential is just one example of expanding scalar
functionsto matrix functions. Any of the transcendental functions
(trigonometricfunctions, hyperbolic trigonometric functions, logarithm,

squareroot, etc.) can be extended to matrix arguments as was shown
herefor the matrix exponential (Higham, 2008). For example, a square
root of a matrix A is any matrix B that satisfies B2 = A. If A = SAS-I
1/2 1/2,...).
then one solution is B = SAI / 2S -1 where A l / 2 = diag(1
Moregenerally, Al / 2 can be replaced by Q*A 1/2Q for any unitary ma-
trixQ. Moreover,for any linear scalar differential equation having solutionsconsistingof these scalar functions, coupled sets of the correspondinglinear differential equations are solved by the matrix version
of the function.
Boundon eAt. When analyzing solutions to dynamic models, we often

wishto bound the asymptotic behavior as time increases to infinity. For
lineardifferentialequations, this means we wish to bound the asymptoticbehaviorof eAt as t 00.Webuild up to a convenient bound in a
fewsteps. First, for scalar z e C, we know that
le Z l =
Re(z)+lm(z)i
Re(z)
Re(z)
Linear Algebra
56
diagonal
Similarly,ifwe have a
of eD is
then the matrix norm
matrixD e cnxn, D = diag(dl, d2,.

IleDxIl
IIXII
fact, if this max over the real parts of the

In
(Re(di)).
maxi
in which =
, then x = er achieves the maximumin
eigenvaluesoccurs for index i*
nonnegative time argument t 0, we also
IleDxll/ llxll. Givena real,
have that
t 20
we can use A = SAS-I to obtain

Next,if the matrix A is diagonalizable,
eAt = seAts-1
and we can obtain a bound by taking norms of both sides

IleAtll = IISeAtS -l ll s IISII IleAt ll IIS -I ll
For any nonsingularS, the product IISIIIIS-Ill is defined as the condition number of S, denoted K(S). A bound on the norm of eAtis
therefore
IleAtll
A diagonalizable
in which = maxi Re(i) = max(Re(eig(A))). So this leaves only the

case in which A is not diagonalizable. In the general case we use the
Schur form A = QTQ*, with T upper triangular. Van Loan (1977) shows
that3
IINtlIk
(1.17)
t0
Y
IleAtII
k!
in which N = T A where A is the diagonal matrix of eigenvalues and
N is strictlyupper triangular, i.e., has zeros on as well as belowthe

diagonal. Note that this bound holds for any A e cnxn. VanLoan
(1977)also shows that this is a fairly tight bound compared to some
popular alternatives. If we increase the value of by an arbitrarily small
amount, we can obtain a looser bound, but one that is more convenient
for analysis. For any R' satisfying
R' > max(Re(eig(A)))

3Note that there is a typo in (2.11) in Van Loan (1977), which isorrected
here.
57
of Matrices
Functions
1.5
constant
thereis a
c > O such that
t 20
IleAt
(1.18)
also for any A e cnxn. Note that the constant c deholds

result
This
e matrix A. Establishingthis result is discussed in Exercise
th
on
pends
one useful consequence of this bound, consider
1.71.To demonstrate
all eigenvalues of A have strictly negative real parts.
thecasein which such that Re(eig(A)) < R' < 0, and (1.18)tells us
Thenthere exists R'
0 exponentially fast as t 00for the entire class of "stable"
that eAt
do not need to assume that A has distinct eigenvalues
We
matrices.
A
for example, to reach this conclusion.
or is diagonalizable,
Quadratic Functions
15.2 Optimizing
in many engiOptimizationis a large topic of fundamental importance
control, and process
neeringactivitiessuch as process design, process
important
operations.Here we would like to introduce some of the
functions.
conceptsof optimization in the simple setting of quadratic
to make this discussion
Younowhave the required linear algebra tools
accessible.
finding the
Scalarargument. The reader is undoubtedly familiar with
maximumand minimum of scalar functions by taking the first deriva-
to
fiveand setting it to zero. For conciseness, we restrict attention
(unconstrained)minimization, and we are interested in the problem4
min f (x)
Whatdo we expect of a solution to this problem? A point x o is termed
a minimizerif f (xo) f (p) for all p. A minimizer xo is unique if

noother point has this property. In other words, the minimizer x o is
uniqueprovidedf (xo) < f (p) for all p * xo. We call x o the minimizerand f 0 = f (xo) the (optimal) value function. Note that to avoid
confusionf 0 = minx f (x) is called the "solution" to the problem, even
thoughxo is usually the item of most interest, and x o = arg minx f (x)
is calledthe "argument of the solution."
and is unique, and how
Wewish to know when the minimizer
to computeit. We consider first the real, scalar-valued quadratic functionof the real, scalar argument x, f (x) = (1/2)ax2 + bx + c, with
4Wedo not lose generality with this choice; if the problem of interest is instead
use the following identity to translate: maxx f(x) = minx f(x).
58
Linear Algebra
a, b, c, x e R. Puttingthe factor of 1/2 in front of the quadratic

term
is a convention to simplifyvarious formulas to be derived next. If
we
take the derivativeand set it to zero we obtain
f (x) = (1/2)ax2 + bx + c
x o = bla
This last result for xo is at least well defined provided a * 0. Butif
we are interested in minimization,we require more: a Ois required
for a unique solution to the problem minx f (x). Indeed, taking a second derivativeof f G) gives d2/dx2f(x) = a. The condition a 0 is
usually stated in beginning calculus courses as: the function is concave
upward. This idea is generalizedto the conditionthat the functionis
strictly convex,which we define next. Evaluating f (x) at the proposed

minimizer gives f 0 = f (xo) = (1/2)b2/a+c.
Convex functions. Generalizing the simple notion of a function having posive curvature (or being concave upward) to obtain existence
and uniqueness of the minimizer leads to the concept of a convexfunction, which is defined as follows (Rockafellarand Wets, 1998, p. 38).
Definition 1.18 (Convex function). Let function f G) be defined on all
reals. Consider two points x, y and a scalar that satisfy O
1.
The function f is convexif the followinginequality holds for all x, y

and
[0, 1]
f (ax + (1 a)y)
af(x) + (1
Figure 1.5 shows a convex function. Notice that if you draw a straight
line connectingany two points on the function curve, if the function

f
is convex the straight line lies above the function. We say the function
f() is STRICTLY
CONVEX
if the inequalityis strict for all x * y and
f (ax + (1 a)y) < af(x) + (1 a)f(y)
Notice that x and y are restricted to be nonequal and is restricted to

lie in the open interval (O,1) in the definition of strict convexity(orno
function would be strictly convex).
That strict convexity is sufficient for uniqueness of the minimizeris
established readily as follows. Assume that one has found a (possibly
nonunique) minimizer of f( i), denoted xo, and consider another point
Functionsof Matrices
59
fly)
flax + (1 a)y)
f(x)
x
Figure 1.5: Convex function. The straight line connecting two points
on the function curve lies above the function; af(x) +
(1
f (ax + (1 a)y) for all x, y.
p * x o. Weknow that f (p) cannot be less than f (xo) or we contradict
optimalityof x o. We wish to rule out f (p) = f (xo), also, because

equalityimplies that the minimizer is not unique. If f G) is strictly
convexand f (p) = f (xo), we have that
f(ax o + (1 a)p) < af(x 0) + (1
= f (xo)
Soforallz = ax o + (1 with e (0, 1), z * x o and f (z) < f(x 0),

whichalso contradicts optimality of x o. Therefore f(x 0) < f (p) for
allp * x o, and the minimizer is unique. Notice that the definition of
convexitydoes not require f ( ) to have even a first derivative, let alone
a secondderivativeas required when using a curvature condition for

uniqueness.But if f ( ) is quadratic, then strict convexityis equivalent
to positivecurvature as discussed in Exercise 1.72.
Vectorargument. We next take real-valued vector x e [VI,and the
generalreal, scalar-valued, quadratic function is f (x) = (1/2)xTAx +
bTx+ c with A e Rnxn b e RI, and c e R. Without loss of generality,
wecan assume A is symmetric.5 We know that the eigenvalues of a
51fA is not symmetric, show that replacing A with the symmetric A = ( 1/ 2) (A + AT)
doesnot change the function f (
60
Linear Algebra
as
Figure 1.6: Contoursof constantf (x) = x TAx; (a) A > O (or A <
O), ellipses; (b) A 0 (or A 0), straight lines; (c) A
indefinite, hyperbolas. The coordinate axes are aligned

with the contours if and only if A is diagonal.
symmetric matrix are real (see Theorem 1.16);the followingcases are

of interest and cover all possibilitiesfor symmetric A: (a) A > 0 (or
A < 0), (b) A 0 (or A 0), and (c) A indefinite. Figure 1.6 shows
contours of the quadratic functions that these A generate for x e R2.
SinceA is the parameter of interest here, we set b = 0, c = 0.6 The
positive cost contours are concentric ellipses for the case A > 0. The
contour for f = 0 is the origin, and minimizingf for A > 0 has the
origin as a unique minimizer. This problem corresponds to findingthe
bottom of a bowl. If A < O,contours remain ellipses, but the sign of
the contour value changes. The case A > Ohas the origin as a unique
maximizer. This problem corresponds to finding the top of a mountain.
For the case A 0, the positive contours are straight lines. The
line through the origin corresponds to f = 0. All points on this line
6Notethat c merely shifts the function f G) up and down,and b merelyshifts the
origin, so they are not important to the shape of the contours of the quadratic function
61
of Matrices
Functions
J.5
minimizersfor f G) in this case, and the minimizer is nonunique.

function is convex but not strictly convex. The function
Thequadratic long valley. As before, if A 0, contours remain
correspondsto a
sign of the contour value changes. For A 0,
straightlines, but the
but is not unique. The function is now a ridge.
themaximizer
techniques for numerically finding optima with
specialized
some
And
approaching this case are known as "ridge
badlyconditioned functions
regression."
hyperbolas.
ForindefiniteA, Figure 1.6 shows that the contours are
point in this case, because the function reTheoriginis termed a saddle
prefers to maintain
semblesa horse's saddle, or a mountain pass if one
without bound in
thetopographymetaphor. Note that f G) increases
bound
thenortheastand southwest directions,but decreases without

minimizer nor
in the southeast and northwest directions. So neither a
an important
a maximizerexists for the indefinite case. But there is
the
classof problems for which the origin is the solution. These are
minmaxor maxmin problems. Consider the two problems
maxminf (x)
minmaxf(x)
(1.19)
TheseIdnds of problems are called noncooperative games, and players

one and two are optimizing over decision variables Xl and x2, respectively.In this type of noncooperativeproblem, player one strives to
maximizefunctionf while player two strives to minimize it. Noncooperativegames arise in many fields, especially as models of economic
behavior.In fact, von Neumann and Morgenstern (1944) originally developedgame theory for understanding economic behavior in addition
to other features of classical games such as bluffing in poker. These
kindsof problems also are useful in worst-cases analysis and design.
Forexample,the outer problem can represent a standard design optimizationwhile the inner problem finds the worst-case scenario over
someset of uncertain model parameters.
Anotherimportant engineering application of noncooperative games
arisesin the introduction of Lagrangemultipliers and the Lagrangian
functionwhen solving constrained optimization problems. For the
quadraticfunction shown in Figure 1.6 (c), Exercise 1.74 asks you to
establishthat the origin is the unique solution to both problems in

(1.19).The solution to a noncooperativegame is known as a Nash
equilibriumor Nash point in honor of the mathematician John Nash
whoestablished some of the early fundamental results of game theory

(Nash, 1951).
62
Linear
Algebra
20
10
f(x)
Scalar
Vector
(1/2)ax 2 + bx + c
(1/2)x T Ax + b T x + c
ax + b
Ax + b
b/a
(1/2)b2/a + c
-(1/2)b
- xo)2 + f 0
f(x)
TA-1 b + c
(1/2) (x
xo) + fo
Table 1.1: Quadratic function of scalar and vector argument; a > 0,

A positive definite.
Finally, to complete the vector minimization problem, we restrict attention to the case A > O.Taking two derivatives in this case produces
f (x) = (1/2)x T Ax
+ b Tx + c
(x) = (1/2)(Ax + A T x) + b = Ax + b
Setting df/dx = Oand solving for xo, and then evaluating f (xo) gives
x o = A-I b
f 0 = (1/2)bTA-1 b + c
63
of Matrices
5 Functions
the scalar and vector cases are summarized in Table

for
results
These
in the last line of the table that one can reparameterize
also
Notice
1.1.
in terms of xo and f 0, in place of b and c, which is
thefunctionf()
often useful.
Revisiting linear least squares.

1.3.7
problemof Section
Consider again the linear least-squares
min(1/2) llx bll
with the nowherewe have changed Ax b to Ax b to not conflict

special case of a
tationof this section. We see that least squares is the
quadraticfunction with the parameters
c = (1/2) T
ObviouslyA is symmetric in the least-squares problem. We have alreadyderived the fact that T > 0 if the columns of A are independent
in the discussion of the SVDin Section 1.4.7. So independent columns
of A correspond to case (a) in Figure 1.6. If the columns of A are not

independent,then A 0, and we are in case (b) and lose uniqueness
of the least-squares solution. See Exercise 1.64 for a discussion of this
case.It is not possible for a least-squares problem to be in case (c),
whichis good,because we are posing a minimization problem in least
squares.So the solution to a least-squares problem always exists.
115.3Vec Operator and Kronecker Product of Matrices
Weintroducetwo final matrix operations that prove highly useful in

applications,but often are neglected in an introductory linear algebra
course.These are the vec operator and the Kronecker product of two
matrices.
The vec operator. For A e

the vec operator is defined as the
restackingof the matrix by its columns into a single large column vector
Linear
Algebra
All
.421
ml
All '412
A12
Aln
A 21
A 22
A2n
Am I
Ant2
mn
A22
vecA =
m2
Aln
A2n
mn
Note that vecA is a column vector in Rtnn. If we denote the n column

vectors of A as ai, we can express the vec operator more compactly
using column partitioning as
an
vecA
a2
an
Matrix Kronecker product. For A e Rtnxn and B e
necker product of A and B, denoted A B, is defined as
AllB A12B
A21B
A22B
AmIB Am2B
A2nB
the Kro-
(1.20)
AmnB
B,and
Note that the Kronecker product is defined for all matrices A and
the matrices do not have to conform as in normal matrix multiplication.
above,
By counting the number of rows and columns in the definition outer
we see that matrix A B e RntPXnq.Notice also that the vector
general
product, defined in Section 12.5, is a special case of this more
matrix Kroneckerproduct.
of Matrices
Functions
1.5
65
We next establish four useful identities insomeuseful identities.

and Kronecker product.
volvingthe vec operator
A)vecB
vec(ABC) = (CT
D) = (AC)
(A
(BD) A, C conform, B,D conform (1.22)

1
-I
(AB) -1 = A
(1.21)
(1.23)
A and B invertible
(1.24)
Be
and C e RPxq. Let the
Establishing(1.21). Let A e
columnpartitions of matrices B and C be given by
Weknowfrom the rules of matrix multiplication that the jth column

ofthe product ABC = (AB)C is given by ABcj. So when we stack these
columnswe obtain
ABCI
vec(ABC) =
ABC2
ABcq
Nowwe examine the right-hand side of (1.21). We have from the definitionsof vec operator and Kronecker product
C12A C22A
CplA
cp2A
bl
b2
ClqA C2qA
cpqA
bp
CllA C21A
(CT A)vecB=
Thejth row of this partitioned matrix multiplication can be rearranged

as follows
Clj
c-2j
CljAb1+ C2jAb2+ + C?jAbp = Abl
cpj
bp] cj
= ABcj
66
Linear
Algebra
Inserting this result into the previous equation gives
(CT e A)vecB =
ABCI
ABC2
ABcq
which agrees with the expression for vec(ABC).
Establishing(1.22).Herewelet A e
B e vxq, C e
so that A,C conform and B,D conform. Let

D RQXS
be
the column vectors of matrix C. We know from the rules of matrix
multiplication that the jth (block) column of the product (Ae B)(Ce D)
is given by A e B times the jth (block) column of C D, whichis
AllB
AlnB
CljD
AllB
AmnB
cnjD
Am1B
AlnB Clj
cnj
= (Acj
= (Acj) (BD)
Since this is the jth (block)column of (A B) (C D), the entire matrix
is
(AC2) (BD)
ACI AC2
and the result is established.
Acrl
(BD)
(Acr) (BD)I
67
of Matrices
Functions
5
and B e v xq, from the definiGivenA e

Establishing(1.23).
and cross product we have that
transpose
of
tion
AlnB
AllB A12B
A21B
A2nB
A22B
AmnB
A21B T
A22B T
Am2BT
AlnB T A2nB T
AmnBT
AllB T
A12BT
Establishing(1.24). Apply (1.22)to the followingproduct and obtain

=1
e B-1 ) = (AA-I ) (BB-I ) =
andtherefore
-1
singular values, and rank of the Kronecker product.

Eigenvalues,
Whensolvingmatrix equations, we will want to know about the rank
oftheKroneckerproduct A B. Since rank is closely tied to the singularvalues,and these are closely tied to the eigenvalues, the following
identitiesprove highly useful.
A and B square
nonzero singular values
eig(A B) = eig(A)eig(B)
rank(A B) = rank(A)rank(B)
(1.25)
(1.26)
(1.27)
Givenour previous identities, these three properties are readily estab-
lished.LetA and B be square of order m and n, respectively. Let

(R,v) and (u, w) be eigenpairs of A and B, respectively. We have that
AveBw = (v) (uw) =

gives
e w). Using(1.22)on Av e Bw then
andweconcludethat (nonzero) mn-vector (v w) is an eigenvector of

ASBwithproduct u as the corresponding eigenvalue. This establishes
(125).Forthe nonzero singular values, recall that the nonzero singular
valuesof real, nonsquare matrix A are the nonzero eigenvalues of AAT
68
Linear
Algebra
(and ATA).We then have for (A) and (A) denoting nonzero s'
and eigenvalues, respectively
(AeB) =
(BBT))
which establishes (1.26). Since the number of nonzero singular values

of a matrix is equal to the rank of the matrix, we then also have (1.27).
Properties(1.21)-(1.27)are all that we require for the materialin

this text, but the interested reader may wish to consult Magnusand
Neudecker(1999)for a more detailed discussion of Kroneckerprod.
ucts.
Solving linear matrix equations. We shall find the properties(1.21)(1.27)highly useful when dealing with complex maximum-likelihood
estimation problems in Chapter 4. But to provide here a small illustration of their utility, consider the following linear matrix equationfor
the unknown matrix X
AXB = C
in which neither A nor B is invertible. The equations are linear in X, so

they should be solvable as some form of linear algebra problem. But
since we cannot operate with A-I from the left, nor B-l from the right,
it seems difficult to isolate X and solve for it. This is an examplewhere
the linear equations are simply not packaged in a convenient form for
solving them. But if we apply the vec operator and use (1.21)we have
(BT A)vecX = vecC
Note that this is now a standard linear algebra problem for the unknown
vector vecX. We can examine the rank, and linear independence of the
rows and columns of BTe A, to determine the existence and unique
ness of the solution vecX, and whether we should solve a least-squares
problem or minimum-normproblem. After solution, the vecXcolumn
vector can then be restacked into its original matrix form X if desired.
Exercise 1.77 provides further discussion of solving AXB = C.
As a final example, in Chapter 2 we will derive the matrix Lyapun0V
equation, which tells us about the stability of a linear dynamic system,
ATS + SA = -Q
1.6
69
Exercises
A and Q are given, and S is the unknown. One way

matrices
inwhich
solving the matrix Lyapunov equation is to apply the vec
about
think
to
obtain
operatorto
= vecQ
algebra problem for vecS. Although this

linear
this
solve
andthen
characterizing the solution, given the special
for
useful
is
approach
equation, more efficient numerical solution
Lyapunov
the
of
structure
coded in standard software. See the function
methodsare available and
for example. Exercise 1.78 asks you
MATLAB,
Iyap(A' ,Q) in Octave or
example using the Kronecker product approach
to solvea numerical
result to the 1yap function.
andcompareyour
1.6 Exercises
in R2
Exercise1.1: Inner product and angle
Considerthe two vectors a,b e R2
shovmin Figure 1.7 and let denote

the angle between them. Show the
usualinner product and norm formu-
las
llall =
(a,b) = E aibi
i
satisfythe following relationship with

the angle
cos e =
Il all
b2
al
Il b Il
Thisrelationship allows us to generalizethe concept of an angle between

vectorsto any inner product space.
bl
Figure 1.7: Two vectors in R 2
and the angle between them.
Exercise1.2: Scalingand vector norm
Considerthe vector x e R2,whose elements are the temperature (in K) and pressure
300
(inPa)in a reactor. A typical value of x would be
1.0 x 106
300
x 106 be two measurements of the state
1.2
=
1.0 x 106
Ofthe reactor. Use the Euclidean norm to calculate the error 1b' xll for the
twovalues of y. Do you think that the calculated errors give a meaningful idea
(a) Let Yl =
310
of the difference between
and Y2?
70
Linear
Algebra
n
=
12 Wi
where Xi is the ith component of the vector x. Show that this

formula
if and only if Wi > O for all i. This is known as a weighted norm, is a norm
Withweight
(c) Propose a weight vector that is appropriate for the example in part
(a). justify
Exercise1.3:linear independence
Verify that the followingsets are u.
= (2,0, 11T,e3 = [1, 1, Il T.
(a) el = [O,
=
= 11 + 2i,1 (b) el = + i, 1 Hint: boress alel + a2e2 + a3e3 O. Taking inner products of this equation
with
el, Q, and e3 yields three linear equations for the O(i.
Exercise 1.4: Gram-Schmidtprocedure

Using Gram-Schmidt orthogonalization, obtain ON sets from the LI sets given in the
previous problem.
Exercise 1.5: Failure of Gram-Schmidt

The Gram-Schmidtprocess will fail if the initial set of vectors is not Ll.
(a) Construct a set of three vectors in R3 that are not LI and apply Gram-Schmidt,
pinpointing where it fails.
(b) Similarly,in an n-dimensional space, no more than n LI vectors can be found.

Construct a set of four vectors in R3 and use Gram-Schmidt to show that if three
of the vectors are U, then a fourth orthogonal vector cannot be found.
Exercise1.6: linear independenceand expressing one vector as a linear

combinationof others
We often hear that a set of vectors is linearly independent if none of the vectors in the
set can be expressed as a linear combination of the remaining vectors. Althoughthe
statement is correct, as a definition of linear independence, this idea is a bit unwieldy
because we do not know a priori which vector(s) in a linearly dependent set is(are)
expressible as a linear combination of the others.
The following statement is a more precise variation on this theme. Giventhe vectors
k,Xi e Rn are linearly independent and the vectors {Xi,a} are linearly
dependent, show that a can be expressed as a linear combination of the Xi.

stateUsing the definition of linear independence provided in the text, prove this
ment.
71
Exercises
Someproperties of subspaces
1.7:
Exercise
properties
following
Establishthe
element is an element of every subspace.

(a) Thezero
any set of j elements of Rti is a subspace (of the linear space Pi).
(b) The span of
zero subspace, a subspace cannot have a finite, largest element.
(c) Exceptfor the
the zero subspace, is unbounded.
Hence,every subspace, except
subspaces in 2-D and 3-1)

Exercise1.8: Some
(a) Considerthe
line in R2
1
1
Drawa sketch of S. Show that S is a subspace.
(b) Nextconsiderthe shifted
line
s' = yly= 1
1
1
Drawa sketch of S'. Show that S' is not a subspace.
(c) Describeall of the subspaces of R3.
Exercise1.9:Permutation matrices
(a) Giventhe matrix
100
P=OOI
010
showthat PA interchanges the second and third rows of A for any 3 x 3 matrix.
Whatdoes AP do?
(b)A generalpermutation matrix involving p row exchanges can be written P =
PpPp-1... P2P1where Pi corresponds to a simple row exchange as above. Show

that P is orthogonal.
Exercise1.10: Special matrices

Consideroperations on vectors in R2.
(a) Constructa matrix operator that multiplies the horizontal (Xl) component of
a vectorby 2, but leaves its vertical component (x2) unchanged.
(b) Constructa matrix operator B that rotates a vector counterclockwise by an angle
of 2TT/3.
(c) Computeand draw ABx and BAX for x = 1

2
(d) Showthat B3 = I. With drawings, show how this makes geometric sense.
72
Linear
Algebra
Exercise1.11:Integal operators and matrices
k(t,s)x(s)ds
of the operator.
where k(t,s) is a known function called the KERNEL
(a) Show that K is a linear operator.
(b) Read Section 2.4.1. Use the usual (i.e., unweighted) inner product on
the interval
(c) An integral can be approximated as a sum, so the above integral

operatorcan
be approximatedlike this:
Ka {x(iAt)) = E
= I,N
where At = I/N. Showhow this approximation can be rewritten as a standard

matrix-vectorproduct. What is the matrix approximation to the integraloperator?
Exercise 1.12: Projections and matrices

Givena unit vector n, use index notation (and the summation convention)to simplify
the following expressions:
(a) (nn T)(nn T)u for any vector u. Recalling that nn T is the projectionoperator,
what is the geometricinterpretation of this result?
(b) (I 2nn T)2. What is the geometric interpretation of this result?
Exercise 1.13: Use the source, Luke
Someonein your research group wrote a computer program that takes an n-vector
input, x e VI and returns an m-vector output, y e Rm.
All we know about the function f is that it is linear.
authorhas
The code was compiled and now the source code has been lost; the codefor
graduated and won't respond to our email. We need to create the sourceno longer
function f so we can compile it for our newly purchased hardware, which
do is execute
runs the old compiled code. To help us accomplish this task, all we can
the function on the old hardware.
write the source
can
you
before
make
to
(a) Howmany function calls do you need
code for this function?
functionf from
linear
the
construct
you
(b) Whatinputs do you choose, and how do
the resulting outputs?
|6
73
Exercises
matters worse, your advisor has a hot new project idea that requires
(c)Tomake a program to evaluate the inverse of this linear function,
youto write
this is possible. How do you respond? Give a complete

andhas asked you if
and uniqueness of x given y.
answerabout the edstence
range and null space of a matrix are subspaces

Exercise1.14: The
show that the sets R(A) and N(A) satisfy the properties of a subA e Rmxn,
Given
therefore R (A) and N(A) are subspaces.

space,(1.1),and
1.15:Null space of A is orthogonal to range of AT

Exercise
mxn show thatN(A) L R(A T ).
Given A e R
Exercise1.16:Rank of a dyad
n x n dyad uvT?
Whatis the rank of the
1.17:Partitionedmatrix inversion formula

Exercise
(a) Letthe matrix A be partitioned
as
and E are square.

in whichB,C,D, E are suitably dimensioned matrices and B
elimination.
Derivea formula for A-I in terms of B, C,D, E by block Gaussian
Checkyour answer with the CRCHandbook formula (Selby, 1973).
both B-l and

(b)Whatif B-l does not exist? What if E-l does not exist? What if
E-l do not exist?
1.18:The four fundamental subspaces

Exercise
Findbasesfor the four fundamental subspaces associated with the following matrices
1100
0101
Exercise
1.19:Zero is orthogonal to many vectors
Provethat if
x z = y z
for allz e
then
or,equivalently,prove that if
for all v e Rtt

then
Linear
Algebrq
16
12
10
0.5
1.5
Figure 1.8: Experimental measurements of variable y versus x.
Exercise 1.20: Existence and uniqueness

Find matrices A for which the number of solutions to Ax = b is
(a) O or 1, depending on b.
(b) 00,independent of b.
(c) Oor 00,depending on b.
(d) 1, independent of b.
Exercise1.21:Fitting and overfitting functions with least squares

One of your friends has been spending endless hours in the laboratory collectingdata
on some obscure process, and now wants to find a function to describethe variable
y's dependenceon the independent variable, x.
x
y
0.00 0.22 0.44

2.36 2.49 2.67
0.67
3.82
0.89
4.87
1.11
6.28
1.33
8.23
1.56
9.47
1.78 2.00
12.01 15.26
Not having a good theory to determine the form of this expression, your friendhas
chosen a polynomialto fit the data.
(a) Consider the polynomial model
y(x) = ao + al x + a2X2 + ... + anxn
the
Expressthe normal equations for finding the coefficients ai that minimize
sum of squares of errors in y.
75
| .6 Exercises
(b) Using the x- and y-data shown above and plotted in Figure 1.8, solve the leastsquares problem and find the a that minimize
2
in which na is the number of measurements and n is the order of the polynomial.

Do this calculation for all polynomials of order O n 9.
(c) For each n, also calculate the least-squares objective
and plot
versus n.
(d) Plot the data along with your fitted polynomial curves for each value of n. In
particular, plot the data and fits for n = 2 and n = 9 on one plot. Use the range
0.25 x 2.25 to get an idea about how well the models extrapolate.
and the appearance of your plots, what degree poly(e) Basedon the values of
nomialwould you choose to fit these data? Whynot choose n = 9 so that the
polynomialcan pass through every point and = 0?
Exercise1.22: Least-squares estimation of activation energy

Assumeyou have measured a rate constant, k, at several different temperatures, T,
and wish to find the activation energy (dividedby the gas constant), E/R, and the
preexponentialfactor, ko, in the Arrhenius model
(1.28)
k = koe-E/RT
Thedata are shown in Figure 1.9 and listed here.
T(K)
k
300
1.82
325
1.89
350
2.02
375
2.14
400
2.12
425
2.17
450
2.15
475
2.21
500
2.26
(a) Take logarithms of (1.28)and write a model that is linear in the parameters
In(ko) and E/R. Summarize the data and model with the linear algebra problem
Ax = b
in whichx contains the parameters of the least-squares problem

In(ko)
Whatare A and b for this problem?

(b) Find the least-squares fit to the data. What are your least-squares estimates of
In(ko) and E/R?
(c) Is your answer unique? How do you know?
(d) Plot the data and least-squares fit in the original variables k versus T. Do you
have a good fit to the data?
Linear
76
Algebra
2.4
2.2
300
400
350
450
500
Figure 1.9: Measured rate constant at several temperatures.
Exercise123:
and uniqueness of linear equations
Considerthe followingpartitioned A matrix, A e Rtnxn
in which Al e RPXPis of rank p and p < min(m,n).

(a) What are the dimensions of the three zero matrices?
(b) What is the rank of A?
(c) What is the dimension of the null space of A? Compute a basis for the null space
of A.
(d) Repeat for AT.
(e) For what b can you solve Ax = b?
that
(f) Is the solution for these b unique? If not, given one solution Xl, such
AXI = b, specify all solutions.
Exercise 1.24: Reaction rates from production rates

Considerthe followingset of reactions.
02
C02
H2 + 02 =H20
+ 202
C02 + 2H20
co + 2H20
77
1.6 Exercises
(a) Giventhe species list, A = CO" 02 C02 ?H2 AH20 CH4] write out the
stoichiometricmatrix, v, for the reactions relating the four reaction rates to the
six production rates
(1.29)
(b) Howmany of the reactions are linearly independent.

(c) In a laboratory experiment, you measured the production rates for all the species
and found
meas
-11T
Is there a set of reaction rates rex that satisfies (1.29) exactly? If not, how do you
know? If so, find an rex that satisfies Rmeas= VTrex.
(d) If there is an rex, is it unique? If so, how do you know? If not, characterize all
solutions.
Exercise1.25: Least-squares estimation

A colleaguehas modeled the same system as only the following three reactions
co + -02
C02
H2 + -02 =H20
CH4+ 202
+ 2H20
(a) Howmany of these reactions are linearly independent?

(b) In another laboratory experiment, you measured the production rates for all the
species and found
meas
-4.5
Is there a set of reaction rates rex in this second model that satisfies (1.29) exactly? If so, find an rex that satisfies Rmeas= VTrex. If not, how do you know?
(c) If there is not an exact solution, find the least-squares solution, rest. What is the
least-squares objective value?
(d) Is this solution unique? If so, how do you know? If not, characterize all solutions
that achieve this value of the objective function.
Exercise 1.26: Controllability

Considera linear discrete thne system governed by the difference equation
+ 1) = Ax(k) + Bu(k)
(1.30)
in whichx(k), an n-vector, is the state of the system, and u(k), an m-vector, is the
manipulatableinput at time k. The goal of the controller is to choose a sequence of
inputs that force the state to follow some desirable trajectory,
(a) Whatare the dimensions of the A and B matrices?
Linear
78
should redesign the system before
trying to design a controller
Algebra
for it. This

is an
A system is said to be connollableif n input values exist

that can move the system from any initial condition, xo, to any final state
X(n).
Byusing (1.30),show that x(n) can be expressed as
x(n) = Anxo + An-IBu(0) + An-2Bu(1) + + ABu(n 2) + Bu(n
1)
Stack all of the u (k) on top of each other and rewrite this expression in Partitioned.
matrix form,
x(n) = An xo + C
(1.31)
u(n 1)
What is the C matrix and what are its dimensions?
(c) Whatmust be true of the rank of C for a system to be controllable, i.e.,forthere
to be a solution to (1.31)for every x(O) and x(n)?
(d) Considerthe followingtwo systems with 2 states (n = 2) and 1 input (m = 1)
x(k+ 1) =
x(k)+
1 u(k)
x(k+ 1) =
x(k) +
Noticethat the input only directly affects one of the states in both of these
systems. Are either of these two systems controllable? If not, show whichx(n)
cannot be reached with n input moves starting from x (O) 0.
Exercise 1.27: A vector/matrix derivative

Considerthe followingderivative for A, C e Rtnxn x, b e Rn
C =
xx
T b)
Or expressed in component form
Cij= (AxxTb)i i =
dxj
j = 1,...,n
Find an expression for this derivative (C) in terms of A, x, b.
1.6 Exercises
79
1.28: Rank equality with matrix
products
Givenarbitrary B e Rtnxn, and full rank A e Rtnxm

andC e Rnxn establish the
followingtwo facts
rank(AB) rank(B)
rank(BC) = rank(B)
Use these to show
rank(ABC) = rank(B)
Exercise1.29: More matrix products

Findexamplesof 2 by 2 matrices such that
(b) A2 = I,with A a real matrix,
(c) B2 = O,with no zeros in B,
(d) CD = DC,not allowing CD = O.
Exercise 1.30: Progamming
LU decomposition
Writea program to solve Ax = b using LU decomposition.

It
matricesup to n = 10,'read in A and b from data files, and should be able to handle
write the solution x to a
file.Usingthis program, solve the problem where
1
-1
1 -1
Exercise1.31: Normal equations

Writethe linear system of equations whose solution x = (Xl,
2
+ 2XIX2 + 2X3)
)T minimizes
+ X2
Findthe solution x and the corresponding value of P (x).
Exercise1.32: Cholesky decomposition

A symmetric matrix A can be factorized into LDLT where L is lower triangular and D
is diagonal,i.e., only its diagonal elements are nonzero.
(a) Performthis factorization for the matrix
2 -1
-1
2 -1
2
(b) If all the diagonal elements of D are positive, the matrix can be further factorized
into LLTthis is called the CHOLESKY
DECOMPOSITION
of A. Find L for the matrix
of part (a).
80
Linear
Algebra
1.33: A singular matrix

For the system
4
(a) Find the value of q for which elimination fails (i.e., no solution to
If you are thoughtful, you won't need to perform the eliminationAx b exists).
to findout.
(b) For this value of q what happens to the first geometrical interpretation
ofthe
(c) Whathappens to the second (superpositions of column vectors)?

(d) Whatvalue should replace 4 in b to make the problem solvablefor this q?
Exercise 1.34: LUfactorization of nonsquare matrices

(a) Find the LU factorization of
(b) If b = (1, p,q) T, find a necessary and sufficient condition on p and q so that
Ax = b has a solution.
from
(c) Givenvalues of p and q for which a solution exists, will the algorithm
Section1.32 solve it? If not, pinpoint the difficulty.
(d) Find the LU factorization of AT.
(e) Use this factorization to find two LI solutions of ATx = b, where b = (2,5)T.
Since there are fewer equations than unknowns in this case, there are infinitely
this
many solutions, forming a line in R2. Are there any values of b for which
problem has no solution?
Exercise 1.35: An inverse
Herea is an
Under what conditions on u and v does (I auv T) (I + auvT)-l?
arbitrary nonzero scalar.
Exercise 1.36: LUdecomposition
Write the first step of the LUdecomposition process of a matrix A as

In other words, what are a, u, and v so that A21 = O?
A' =
this pair
Write a program that uses the Newton-Raphsonmethod to solve
y (x
=0
(y +
tanx O
Do not reduce the pair of equations to a single equation. With

least one solution.
of equations
findat
program,
this
81
1.6 Exercises
Exercise 1.38; The QR decomposition
In this exercise,we construct the QR decomposition introduced in Section 12.4. Con-
sideran m x n matrix A with columns ai. Observe that if A = BC, with B an m x n

each
matrixand C and n x n matrix, where bi are the columns of B, then we can
columnof A as a linear combination of the columns of B, as follows
Cli
C2i
ai
cni
Theithcolumn of A is a linear combination of all the columns of B, and the coefficients
in the linear combinationare the elements of the ith column of matrix C. This result
willbe helpfulin solving the followingproblem. Let A be an m x n matrix whose
columnsai are linearlyindependent (thus m n). We know that using the GramSchmidtprocedureallows us to construct an ONset of vectors from the ai. Define a
manixQ whosecolumns are these basis vectors, qi, where qi qj = ij.
(a) Expresseach ai in the basis formed by the qi. Hint: because the set of qi are
constructedfrom the set of ai by Gram-Schmidt,al has a component only in
the ql direction, a2 has components only in the qi and q2 directions, etc.
(b) Use the above result to write A = QR, i.e.; find a square matrix R such that each
columnof A is
upper triangular.
in terms of the columns of Q. You should find that R is
Exercise1.39:Orthogonal subspace decomposition

LetS be an r
n dimensionalsubspace of P i with a basis {al, a2,... , ar}. Consider
the subspace SL, the orthogonal complement to S.
(a) Provethat SL has dimension n r. Do not use the fundamental theorem of

linear algebra in this proof because this result is used to prove the fundamental
theorem.
(b) Showthat any vector x e

a e S and b e S L .
can be uniquely expressed as x = a + b in which
Exercise1.40:The QRand thin QRdecompositions
ForA e Rmxn with independent columns we have used in the text what
is sometimes
calledthe "thin" QRwith QI e
and RI e IRnxnsatisfying
A = QIRI
It is possible to "fill out" QI by adding the remaining

m n columns that span Rm. In
thiscaseA = QR and Q e Rmxm is orthonormal, and
R e Rtnxn. In the "thin" QR, QI
is the shape of A and RI is square (of the smaller
dimension
n), and in the full QR, Q
is square (of the larger dimension
m) and R is the shape of A.
(a) Is the "thin" QRunique?
(b) Showhow to construct the QRfrom the
thin QR.Is the full QRunique?
82
Linear
Algebra
1.41:Uniquenessof solutions to least-squares problems
Provethe followingproposition
Proposion 1.19 (Fullrank of ATA). Given matrix A e Rtnxn, the n x n matrix

ATA
has full rank if and only ifA has linearly independent columns.
Note that this proof requires our first use of the fundamental theoremof
linear
algebra. Sincemost undergraduate engineers have limited experience doingproofs,
we
provide a few hints.
1. The Sifand only if" statement requires proof of two statements: (i)ATAhaving
full rank implies A has linearly independent columns and (ii) A having
linearly
independent columns implies ATA has full rank.
2. The statement that S implies T is logically equivalent to the statement thatnot
T implies not S. So one could prove this proposition by showing (ii) and then
showing: (i') A not having linearly independent columns implies that ATAis not
full rank.
3. The fundamentaltheorem of linear algebra is the starting point. It tellsus

(amongother things) that square matrix B has full rank if and only if B has
linearlyindependent rows and columns. Think about what that tells you about
the null space of B and BT. See also Figure 1.1.
Exercise 1.42: A useful decomposition

Let A c nxn, B c pxp, andX e c nxp satisfy
AX = XB
rank(X) = p
Showthat A can be decomposed as
np
Tll
T12
T22
in which eig(T11) = eig(B), and eig(T22) = eig(A) \ eig(B), i.e., the eigenvalues of T22
are the eigenvalues of A that are not eigenvalues of B. Also show that eig(B) g eig(A).
Hint: use the QR decomposition of X.
Exercise 1.43: The Schur decomposition

Provethat the Schur decomposition has the properties stated in Theorem 1.15.
Hint: the result is obviously true for n = 1. Use induction and the result of Exercise
1.42.
Exercise1.44:Norm and matrix rotation

Given the following A matrix
0.46287 0.11526
0.53244 0.34359
invoking [u , s , v)
-0.59540 -0.80343
-0.80343 0.59540
in MATLABor Octave produces
0.78328
0.00000
0.00000
0.12469
-0.89798
-O. 44004
-0.44004
0.89798
1.6 Exercises
83
(a) Whatvector x of unit norm maximizes llAxll? Howlarge is llAxllfor this x?

(b) Whatvector x of unit norm minimizes llAxll? Howlarge is llAxllfor this x?
(c) Whatis the definition of IIAII?What is the value of IIAIIfor this A?
(d) Denotethe columns of v by VI and v2. Draw a sketch of the unit circle traced
by x as it travels from x = VI to x = v2 and the corresponding curve traced by
Ax.
(e) Let's find an A, if one exists, that rotates all x e R2 counterclockwiseby

radians. What do you choose for the singular values Iand 02? ChooseVI = el
and v2 = for the V matrix in which ei, i = 1, 2 is the ith unit vector. What
do you want ul and u2 to be for this rotation by e radians? Form the product
USVT and determine the A matrix that performs this rotation.
Exercise1.45:Lineardifference equation model

Considerthe following discrete-time model
+ 1) = Ax(k)
in which
0.798
-0.715
0.051
1.088
1
xo = 0
(a) Computethe eigenvalues and singular values of A. See the Octave or MATLAB
commandseig and svd. Are the magnitudes of the eigenvaluesof A less than
one? Are the singular values less than one?
(b) Whatis the steady state of this system? Is the steady state asymptotically stable?
(c) Makea two-dimensional plot of the two components of x(k) (phase portrait) as
youincreasek from k = 0 to k = 200, starting from the x(0) given above. Is

x(l) bigger than x(O)? Whyor why not?
(d) Whenthe largest eigenvalue of A is less than one but the largest singular value
of A is greater than one, what happens to the evolution of x(k)?
(e) Nowplot the values of x for 50 points uniformly distributed on a unit circle and
the correspondingAx for these points. For the SVDcorrespondingto Octave

and MATLABconvention
A = USV*
mark ul, 112,VI, v2, Sl, and s2 on your plot. Figure 1.10 gives you an idea of the
appearanceof the set of points for x and Ax to make sure you are on track.
Exercise1.46:Is the SVDtoo good to be aque?
GivenA e
with rank(A) = r and the SVDof A = UEV*,if we partition the first
r columnsof U and V and call them UI and VI we have
84
Linear
-0.5
0.5
1.5
Figure 1.10: Plot of Ax as x moves around a unit circle.
and A =
Then to solve (possibly in the least-squares sense) Ax
which motivates the pseudoinverse formula

= VIG I UI*
and the "solution"
x = A+b
If we form the residual for this "solution"
= AA+b-b
b-b
= UI
= Imb-b
1m
b we have
Algebra
Exercises
85
whichseems to show that r = 0. We know that we cannot solve Ax = b for every

something must have gone s,smong.What is wrong with this
b and every A matrix, so0?
=
argumentleading to r
Exercise1.47: SVDand worst-case analysis
consider the process depicted in Figure 1.11 in which u is a manipulatable input and
d is a disturbance. At steady state, the effects of these two variables combine at the
measurementy in a linear relationship
The steady-state goal of the control system is to minimize the effect of d at the measurementy by adjusting u.
For this problem we have 3 inputs, u e R3, 2 disturbances, d e R2, and 2 measurements,y e R2, and G and D are matrices of appropriate dimensions. Wehave the
followingtwo singular value decompositions available
[X] [E z T
075 -0.66
-0.66
-0.98
-0.19
0.75
-0.19
0.98
1.57
0.00
0.71
0.00
0.00
0.21
0.00
0.13
-0.89
0.37
-0.085
0.46
045 -0.81
094 -0.33
-0.33
0.94
(a) Can you exactly cancel the effect of d on y using u for all d? Why or why not?
(b) In terms of U,S, VI,X, E, Z, what input u minimizes the effect of d on y? In

other words, if you decide the answer is linear
u = Kd
What is K in terms of U, S, VI , X, E, Z? Give the symbolic and numerical results.
(c) Whatis the worst d of unit norm, i.e., what d requires the largest response in u?
What is the response u to this worst d?
Exercise1.48: Worst-case disturbance

Considerthe system depicted in Figure 1.11 in which we can manipulate an input u e
R2 to cancel the effect of a disturbance d e R2 on an output y e R2 of interest. The
steady-state relationship between the variables is modeled as a linear relationship
and y, u, d are in deviation variables from the steady state at which the system was
linearized. Experimentaltests on the system have produced the followingmodel parameters
2.857 3.125
0.991 2.134
If we have measurements of the disturbance d available, we would like to find the input
u that exactlycancels d's effect on y, and we would like to know ahead of time what
is the worst-case disturbance that can hit the system.
86
Linear
Algebra
u
Figure 1.11: Manipulatedinput u and disturbance d combine to affect output y,
(a) Find the u that cancels d's effect on y.

(b) For d on the unit circle,plot the corresponding value of u.
(c) What d of norm one requires the largest control action u? What d of normone
requires the smallest control action u? Give the exact values of dmaxand dmin,
and the corresponding Umaxand Umin.
(d) Assume the input is constrained to be in the box
-1
-1
sus
1
1
(1.32)
vmat is the size of the disturbance so that all disturbances less than this size
words
can be rejected by the input without violating these constraints? In other
find the largest scalar a such that
if lldll
then u satisfies (1.32)
Use your plot from the previous part to estimate a.
Exercise 1.49:Determinant, trace, and eigenvalues
facts
Use the Schur decomposition of matrix A e cnxn to prove the following
n
(1.33)
(1.34)
detA=
trA= E 'Ni
87
1.6 Exercises
e
inwhich
= 1,2,
,n.
1.50:Repeated eigenvalues
matrix
Theself-adjoint
011
1
Find the eigenvalues of the system and show that despite

hasa repeated eigenvalue.
therepeatedeigenvaluethe system has a complete orthogonal set of eigenvectors.
Exercise1.51: More repeated eigenvalues
matrix
Thenon-self-adjoint
0
013
002
000 21
0
alsohas repeated eigenvalues.

(a) Find the eigenvalues and eigenvectors (there are only two) of A.
(b) Denotethe eigenvector corresponding to the repeated eigenvalue as VI and the
v2 and v3 can be
EIGENVECTORS
other eigenvectoras v4. The GENERALIZED
found by solving
where RI is the repeated eigenvalue, Show that {VI,... , v4} is necessarily an LI

set.
(c) Determinethe set, construct the transformation matrix M, and show that J =
M-I AM is indeed in Jordan form.
Exercise1.52:Solutionto a singular linear system

Considera square matrix A that has a complete set of LI eigenvectors and a single zero
eigenvalue.
(a) Writethe solution to Ax
Oin terms of the eigenvectorsof A.
(b) In the problem Ax = b, use the eigenvectors to determine necessary and sufficient conditions on b for existence of a solution.
Exercise1.53:Example of a singular problem

Considerthe problem Ax = b, where
A =
123
123
123
(a) Perform LUdecomposition on this matrix. Give L and U.
88
Linear
Algebra
(b) Find two linearly independent vectors in the nullspace of

A.
(c) Use the LUdecomposition to find a solution when
4
4
(d) This solution is not unique. Find another.

(e) Find the eigenvalues and eigenvectors of A. How are these
related to youran.
Exercise 1.54: Linearly independent eigenvectors

Showthat if A has n distinct eigenvalues, its eigenvectors are linearly
independent
This result is required to ensure the existence of Q -1 in A = QAQ-I in (1.11).
Hint: set
aiqi = O and multiply by (A MI) (A 21) (A n-11)
to
establish
that an = O. With an = Owhat can you do next to show that an-I = 0? Continue
this
process.
Exercise 1.55: General results for eigenvalue problems

Prove the followingstatements:
(a) If A is nonsingular and has eigenvalues Ri, the eigenvalues of A-I are 1/i.
(b) LetSbeamatrixwhose columns forma set of linearlyindependentbutnonorthog.
onal basis vectors: the mth column is the vector um. Find a matrixS' whose
columns u'n satisfy uku'n = mn. A pair of basis sets whosevectorssatisfy
this condition are said to be BIORTHOGONAL.
(c) Assume that A has a complete set of eigenvectors. Showthat the eigenvectors
of A and AT are biorthogonal.
(d) Showthat if the eigenvectors of A are orthogonal, then AAT= ATA.Suchma-
trices are called NORMAL(The converse is also true (Horn and Johnson,1985).)
eigenvectors
(e) Show that the eigenvalues of A = ATare imaginary and that its
are orthogonal.
Exercise 1.56: Eigenvalues of a dyad
andeigenLet u and v be unit vectors in VI, with uTv * 0. What are the eigenvalues
vectors of uv T?
Exercise 1.57: The power method for finding largest eigenvalues

Consider the matrix
A=
001
001
111
Xi+l AXi.
(a) Let xo = (1,O,
and consider the iteration procedureresult.
several steps of this procedure by hand and observe the
Perform
1.6 Exercises
89
(b) Canyou understand what is happeninghere
basis? In particular, show that for a self-adjoint by writingx in the eigenvector

this iteration procedure yields the eigenvalue matrix with distinct eigenvalues,
of largest absolute value and the
correspondingeigenvector.
(c) Write an Octave or MATLAB

function to perform this
process on a real symmetric
matrix, outputting the largest
eigenvalue (to within

and the corresponding eigenvector, scaled so that a specified tolerance) of A
its largest componentis 1.
present results for a test case. This is the POWER
METHOD.
It is much faster than
findingall of the eigenvalues and can be generalized
to other types of matrices.
Google's "PageRank" algorithm is built around this method.
Exercise1.58:Markov chain models

Imaginethat there are three Idnds of weather: sunny, rainy, and snowy. Thus a vector
wo e R3 defines today's weather: wo = [1, 0, OITis sunny, wo = [O,1, OJTis rainy, and
wo = [0,0, 1] is snowy. Imagine that tomorrow's weather WI is determined only by
today's and more generally, the weather on day n + 1 is determined by the weather on
dayn. A probabilistic model for the weather then takes the form
Wrt+l = Twn
whereT is called a transition matrix and the elements of wn are the probabilitiesof
havinga certain type of weather on that day. For example, if u'5 = [0.2, 0.1, 0.7]T, then
the probability of snow five days from now is 70%.The sequence of probability vectors
on subsequent days, {wo, WI , u.'2,.. .} is called a MARKOV
CHAIN.Because w is a vector
of probabilities,its elements must sum to one, i.e.,

wn,i = 1 for all n.
wn,i = 1, what condition must the elements of T satisfy such
(a) Giventhat
Wn+l,i is also 1?
that
(b) Assume that T is a constant matrix, i.e., it is independent of n. What conditions
of the eigenvalues of T must hold so that the Markovchain will reach a constant
state wooas n 00?How is woorelated to the eigenvectors of T?
Exercise1.59: Real Jordan form for a real matrix with complexconjugate

eigenvalues
Fora 2 x 2 real matrix A with a complex conjugate pair of eigenvalues

= with eigenvectors VI and 1,'2.
(a) Derivethe result that VI = i'2.
+ iw,
(b) Writethe general solution to = Ax in terms of the real and imaginaryparts of

solution are
VI and sines and cosines, so that the only complex numbers in the
the arbitrary constants.
(c) For the specific matrix
-2 -2
2
the columns of S are the

show that the similarity transformation S-I AS, where
real and imaginary parts of VI, has the form
s -I AS =
90
Linear
Algebra
This result can be generalized, showing how a real matrix
Exercise 1.60: Solving a boundary-value problem by

eigenvalue
Consider the reaction
occurring in a membrane. At steady state the appropriate

reaction-diffusion
d2CA
DA
dx2
d2CB
equations
+ k-1CB= O
+ klCA k1CB
k2CB+ k2Cc= 0
d2cc
Dc dx2 + k2CB k-2Cc = O
where the ki,i = (1,2) are rate constants and the Dj,j = (A,B,
C) are the species
diffusivities.
The boundary conditions are
CA= 1
dCA
dCB
dx = dx
atx=l
dcc
dx
Convert this set of second-order equations into a set of first-order

differential
tions. Write a MATLAB
or Octave code to find the solution to this problemin equaterms
of eigenvalues and eigenvectors of the relevant matrix for a given
set of parameters.
Have the program plot the concentrations as functions of position.
Showresultsfor
parameter values DA = DB = Dc = 20, kl = k2 = 10, k-l = k-2 = 0.1,
and also for the
same rate constants but with the diffusivities set to 0.05.
Exercise 1.61:Nullspaces of nonsquare matrices
Consider a nonsquare m x n matrix A. Showthat ATAis symmetric positivesemidefinite. If A were square we could determine its nullspace from the eigenvectorscorresponding to zero eigenvalues. How can we determine the nullspace of a nonsquare
matrix A? What about the nullspace of AT?
Exercise 1.62:Stabilityof an iteration

Consider the iteration procedure x(i + 1) = Ax(i), where A is diagonalizable.
(a) What conditions must the eigenvalues of A satisfy so that x(i) 0 as i 00?
(b) What conditions must the eigenvalues satisfy for this iteration to convergeto a
steady state, i.e., so that x(i) x(i + 1) as i 00?
Exercise 1.63: Cayley-Hamiltontheorem
Suppose that A is an n x n diagonalizable matrix with characteristic equation

+ ao = 0
det(A RI) =
+ an -1 NL-I +
91
Exercises
(a) Show
that
An
anIAn I
=O
A satisfies its own characteristic equation; it

Thisresult shows that
the Cayley-Hamilton
is known as
theorem.
(b) Let
combinations of A and I.
Usethe theorem to express A2, A3, and A-I as linear
least-squares problem
Exercise1.64:Solvingthe nonunique
least-squares solution to Ax = b is unique if and only if

Wehaveestablishedthat the
columns. Let's treat the case in which the columns are not
Ahaslinearlyindependent
solution is not unique. Consider again the
linearlyindependent and the least-squares
SVDfor real-valued A
by
(a) Showthat all solutions to the least-squares problem are given
Xls = VIE-I UTb +
in which is an arbitrary vector.
(b) Showthat the unique, minimum-norm solution to the least-squares problem is

givenby
Xls = VI-1UTb
Thisminimum-norm solution is the one returned by many standard linear algebra packages. For example, this is the solution returned by Octave and MATLAB
wheninvokingthe shorthand command x = A \ b.
Exercise1.65:Propagatingzeros in triangular matrices

Whenmultiplyingtwo partitioned (upper) triangular matrices, if the first one has k
leadingcolumnsof zeros, and the second one has a Opxp matrix on the second element
ofthe diagonal,show that the product is a triangular matrix with k + p leading columns
of zeros. In pictures
0 0 73 r
inwhichTi,i = I
4 are arbitrary triangular matrices, T5is triangular, and * representsarbitrary(full)matrices. This result is useful in proving the Cayley-Hamilton
theoremin the next exercise.
92
Linear
Algebra
Exercise1.66:Cayley-Hamiltontheorem holds for all matrices
RevisitExercise1.63 and establish that all matrices A e cnxn satisfy their

teristic equation. We are removing the assumption that A is
Cayley-Hamiltontheorem so that it holds also for defective matrices.
Hint: use the Schur form to represent A and the result of Exercise 1.65.
k.
Exercise 1.67: Small matrix approximation

For x a scalar, consider the Taylor series for 1 | (1 + x)
1
= 1 x + x 2 x 3 +
which converges for IXI < 1.

(a) Using this scalar Taylor series, establish the analogous series for matrixX
R'txn
(1 +X) -I =1-X+X 2 -X 3 +
You may assume the eigenvalues of X are unique. For what matrix X doesthis
series converge?
(b) What is the corresponding series for

in which R e Rnxn is a full-rank matrix. What conditions on X and R are required
for the series to converge?
Exercise 1.68: Matrixexponential, determinant and trace

Use the Schur decomposition of matrix A e c nxn to prove the followingfact
deteA = etr(A)
Exercise 1.69: Logarithm of a matrix

If A e cnxn is nonsingular,there exists a B e cnxn such that
and B is known as the logarithm of A
B = lnA
If A is positive definite, B can be uniquely defined (the principal branch of the logarithm).
Giventhis definition of the logarithm, if A e cnxn is nonsingular, showthat
(1.35)
= etr(ln)
Exercise 1.70: Some differential equations, sines, cosines, and exponentials

the
(a) Solve the following vector, second-order ordinary differential equation with
given initial conditions for y e R2
d2y
dt
dt
Use the solution of the scalar version of this differential equation as
your guide
93
| Exercises
always reduce a high-order differential equation to a set of first-order

(b) Wecan
x = dy/dt and let
differentialequations. Define
equation can be written as a single first-order differenand show that the above
tial equation
dz
dt
appropriate initial conditions z(O)? What is

with z e R4. What are B and the
the solution to this problem?
of the two y components versus time for

(c) plot,on a single graph, the trajectories
the given initial conditions.
of (b), even though the

(d) Showthat the result of (a) is the same as the result
functionsexp and cos are different.
Exercise1.71:Boundingthe matrix exponential
validity of the bound in (1.18)

Giventhe bound for IleAtl in (1.17), establish the
Osuch that for all
Hints:first, for any k > Oand E > 0, show that there exists a c >
cet t k
that for
Usethis result to show that for any E > 0, N e c nxn , there exists c > Osuch
all t 20
IINtllk
k!
Exercise1.72:Strictly convex quadratic function and positive curvature

Considerthe quadratic function
T
f(x) = (1/2)x T Ax + b x + c
(a) Showthat f G) is strictly convex if and only if A > 0.

(b) For the quadratic function, show that if a minimizer of f G) exists, it is unique if
and only if A > O. The text shows the "if" part for any strictly convex function.
Soyou are required to show the "only if" part with the additional restriction that
f G) is quadratic.
(c) Showthat f G) is convex if and only if
0.
Exercise1.73:Concavefunctions and maximization

AfUnctionf() is defined to be (STRICTLY)
(concave downward) if fG) is
CONCAVE
(strictly)convex(Rockafellar and Wets, 1998, p. 39). Show that a solution to maxx f (x)
is unique if f ( ) is
strictly concave.
94
Linear
Algebra
Exercise1.74: Solutionsto minmax and maxmin problems
Consider again the quadratic function f(x) = (1/2)xTAx and the two games given
in
to the A matrix
(1.19). Confirm that Figure 1.6 (c) corresponds
= x2 = 0 is the unique solution to both games in (1.19).Hint:

(a) Showthat
With
the outer variable fixed, solve the inner optimization problem and note that its
solution exists and is unique. Then substitute the solution for the inner problem,
solve the outer optimization problem, and note that its solution also existsand
is unique.
(b) Showthat neither of the followingproblems has a solution

maxminf(x)
X2
Xl
minmaxf(x)
in which we have interchanged the goals of the two players. So obviouslythe

goals of the players matter a great deal in the existence of solutions to the game.
Exercise1.75:Gameswith nonunique solutions and different solutionsets

Sketchthe contours for f (x) = (1/2)x Tx with the following A matrix
What are the eigenvalues of A?

Showthat
= x2 = Ois still a solution to both games in (1.19),but that it is not
unique. Find the complete solution sets for both games in (1.19). Establish that the
solution sets are not the same for the two games.
Exercise 1.76: Who plays first?

When the solutions to all optimizations exist, show that
This inequality verifies that the player who goes first, i.e., the inner optimizer, has the
advantagein this noncooperativegame. Note that the function f G) is arbitrary,so
long as the indicated optimizations all have solutions.
Exercise 1.77: Solvinglinear matrix equations

Consider the linear matrix equation
(1.36)
AXB = C
in whichA
vnxn X
Rnxp,B e vxq, and C e
we consider
fixed
Of
matrices and X is the unknown matrix. The number of equations is the number
the
elements in C. The number of unknowns is the number of elements of X. Taking
vec of both sides gives
(1.37)
(B' e A)vecX = vecC
We wish to explore how to solve this equation for vecX.
95
Exercises
1.6
to exist for all vecC, and be unique, we require that (B' e A) has
(a) Forthe solution
rows and columns, i.e., it is square and full rank. Using the
linearlyindependent
square and full
rankresult (1.27)show that this is equivalent to A and B being
rank.
(b)Forthis
case show that the solution
vecX = (BT e A) -l vecC
is equivalentto that obtained by multiplying (1.36)by A-I on the left and B-l
on the right,
X = A-I CB-I
(c)If we have more equations than unknowns, we can solve (1.37) for vecX as a
least-squaresproblem. The least-squares solution is unique if and only if BT A
haslinearlyindependent columns. Again, use the rank result to show that this
is equivalentto: (i) A has linearly independent columns, and (ii) B has linearly
independentrows.
(d) Weknowthat A has linearly independent columns if and only if ATA has full
rank,and B has linearly independent rows if and only if BBThas full rank (see
Proposition1.19in Exercise 1.41). In this case, show that the least-squares solution of (1.37)
VeCXls = (B T e A) t vecC
is equivalentto that obtained by multiplying (1.36)by At on the left and Bt on

the right,
Xls = A t CBt
Notethat the superscript t denotes the Moore-Penrosepseudoinverse discussed

in Section 1.3.7.
Exercise
1.78:Solvingthe matrix Lyapunov equation
Write
a functionS = your 1yap(A, Q) using the Kronecker product to solve the matrix
Lyapunovequation
A TS + SA = -Q
Testyourfunctionwith some A with negative eigenvalues and positive definite Q by
comparing
to the function 1yap in Octave or MATLAB.
Bibliography
Phenomena.
John
Lightfoot. Transport
N.
and E.
Stewart,
E.
edition, 2002.
W.
second
B.Bird,
York,
New
The Johns
Matrix Computations.
&sons,
van Loan.
F.
edition, 1996.
C.
and
Maryland, third
Golub
Baltimore,
G.H.
Press,
University
N.J. Higham.
Functions
phia, 2008.
and
R.A. Horn
ofMatrices:
Theory and Computation.
Matrix Analysis.
Johnson.
C. R.
1985.
C.C.Lin
Mathematics
and L.A. segel.
Neudecker.
J. R.Magnusand H.
University press,
to Deterministic
Applied
Macmillan, New York,

Sciences.
Natural
Cambridge
SIAM, Philadel.
Problems in the
1974.
Matrix Differential
Calculus
with Applications
New York, 1999.

andEconometrics.John Wiley,
inStatistics
games. Ann. Math, 54:286-295, 1951.

J.Nash.Noncooperative
W.H.Press,S.A. Teukolsky, W. T. Vetterling, and B. T. Flannery. Numerical
Recipesin C: TheArt of Scientific Computing. Cambridge University Press,
Cambridge, 1992.
R.T. Rockafellar and R. J.-B. Wets. Variational
Analysis.
Springer-Verlag,
1998.
S.M.Selby.CRCStandardMathematical
Tables. CRC Press, twenty-first edition,
1973.
G.Strang. Linear
Algebra
second edition, 1980.
and its Applications.
Academic
L,N.Trefethen
and D. Bau Ill.
Numerical Linear Algebra.
and Applied
Mathematics, 1997.
C.F.VanLoan.
The sensitivity
of the matrix
14.971-981,
exponential.
1977.
J. von
Neumann and
O.
Press, New York,
Society for Industrial
SIAM J. Numer. Anal,
Morgenstern.
Princeton
Theory ofGames
University
and Economic Behavior
Press, Princeton
and oxford, 1944.
ordinary Differential Equations
2.1 Introduction
Differentialequations arise in all areas of chemical engineering. In this
chapter we consider
ORDINARY
differential equations (ODEs),that is,
equationSthat have only one independent variable. For example, for reactionSin a stirred-tank reactor the independent variable is time, while
in a simple steady-state model of a plug-flow reactor, the independent
variableis position along the reactor. Typically,ODEsappear in one of
two forms
dx
dt
(2.1)
or
dny + anI(X) dn-ly

dxn-l
dy +
dx
y e R
(2.2)
Wehaveintentionallywritten the two forms in different notation, as

the first form typically (but not always) appears when the independent
variableis time, and the second form often appears when the independent variable is spatial position. These two forms usually have different
boundaryconditions.Whent is the independentvariable,we normally

knowthe conditionsat t = 0 (e.g.,initial reactant concentration)and
must solvefor the behavior for all t > 0. This is called an INITIALVALUE
PROBLEM
(IVP). In a transport problem, on the other hand, we
knowthe temperature, for example,at the boundaries and must find it
in the interior. This is a BOUNDARY-VALUE

PROBLEM
(BVP).
97
Ordinary Differential Equations

98
First-order
Linear
Systems
for Linear
Differential Equations
principle
Superposition
can be Written
equation
2.2.1
differential
linear
Anarbitrary
Lu U
= d/dt
operator (e.g., L and - A, where
g is a
differential be determined,
linear
to
L is a
the solution the following general properties of
is
u
matrix),
A is a section1.2 introduced in terms of L
function.
we now
which
linear operators,
Lu + Lv
L(u + v) =
L(cxu) = cx(Lu)
the issue of boundary conditions, the

moment
for the
properties follow directly from linearity
LeoingasideSUPERPOSITION
two
are both solutions
Letg = 0. If ul and
problem.
1. Homogeneous cxul+ u2 is also a solution, for any scalars
to Lu = 0, then
and .
Let ul be a solution to Lu = gl and

problem.
Inhomogeneous
2.
aul + u2 is a solution to
be a solutionto Lu = u. Then
Lu = cxgl + U2.
Withregardto boundaryconditions,linearity also implies the following.
3. Letul be a solutionto Lu =
with boundary condition Bu = hi
on a particular boundary, where B is an appropriate operator, e.g.,
a constantfor a DIRICHLET
boundary condition, a first derivative
d/dx for a NEUMANN
boundary condition, or a combination B =
y + (5d/dx for a ROBIN

boundary condition. Let
solve Lu =
withboundaryconditionBu = 112.Then aul + u2 satisfies
Lu =
+ U2withboundarycondition Bu = ahl + h2.
These simple results are very powerful
and wll be implicitly and explicitlyused throughoutthe book,

as they allow complex solutions to
be constructedas sums (or
integrals) of simple ones.
2.2
First-order Linear Systems
99
Homogeneous Linear Systems with Constant

Coefficients
General Results for the Initial-ValueProblem
where t denotes time. The
function f is often called a
consider(2.1),
for
each
point x in the PHASESPACE
FIELD;
VECTOR
or STATESPACE
f(x) defines a vector giving the rate
system,
of change of x at
of the
system
is
called
The
AUTONOMOUS
if f is not an explicit
that point.
trajectory
The
x(t) traces out a curve in the state
functionof t.
space,
initial condition x(0) = xo.
the
from
starting
The most general linear first-order system can be written
dx
dt
(2.3)
section we further narrow the focus and consider

only
In the present
autonomous,
homogeneous
system
thelinear,
AX
X Rn A e Rnxn
(2.4)
whereA is a constant matrix. Note that many dynamicsproblems are
posedas second-order problems: if x is a position variable then Newton's second law takes the form = F(x). Letting = x,
= R, we
recovera first-order system
= F(ul)
Moregenerally, a single high-order differential equation can always be
writtenas a system of first-order equations.
UnlessA is diagonal, all of the individual scalar equations in the
system(2.4)are coupled. The only practical way to find a solution to
the system is to try to decouple it. But we already know how to do
thisweuse the eigenvector decomposition A = MJM-I , where J is
theJordan form for A (Section 1.4). Letting y = M-I x be the solution
vectorin the eigenvector coordinate system, we write
=Jy
If A can be completely diagonalized, then J = A = diag(1,112,... , An)
and the equations in the y coordinates are completely decoupled. The
solutionis
Yi(t) = e ttCi
or
Y = eAtc
Ordinary Differentia/ Equations

100
an initial-valUe problem
constants. For
of arbitrary xo, c = y(0) = M-lxo. Recallfrom

vector
wherec is a a knownvector called the MATRIx
Of
is
is
x(0)
eAt
where
matrix
that the
1.5
matrix A as
Section
general
for a
It is defined
1
+ {A3t+.
At = 1 + At + 21
simply a diagonal matrix with entries

is
this
matrix A,
eigenvalues ofA determine the
diagonal
the
that
the
see
For
= eAttCi,we
(t)
M
eigenvectors (columns ofM) determine
Since
eAit.
the
and
rates
growth or decay occurs. COnVerting
growthor decay
this
which
along
we have the general solution
the directions
coordinates,
back to the original
Cie
Vi
corresponding to Ai. This expression shows

eigenvector
where Viis the
when A has a complete LI set of eigenvec-
explicitlythat the solution

of exponential growth and decay in the
combination
simple
a
is
tors
eigenvectors.
directions defined by the
this result is that
An important
general consequence
of
defined by
conditionxo that lies on the line
an initial
the kth eigenvector leads
= aektvk. This solution

to Ci = cxikand thus to a solution x(t)
Vk. This line is

will never leave the line defined by the eigenvector
for the dynamics: an initial condition

SUBSPACE
thus an INVARIANT
that starts in an invariant subspace never leaves it. Similarly, each pair
of eigenvectorsdefinesa plane that is invariant, each triple definesa

three-dimensionalspace that is invariant and so on.
Aparticularly relevant special case of an invariant plane arises when
A has a complexconjugate pair of eigenvalues ico with correspond-
ing eigenvectorsv and ; see Exercise1.59. A solution with initial
conditionsin this subspace has the form
x(t) = Cleateiwtv + c2ete-tWt

If the initial conditions are real, then c2 =
Cl (to cancel out the imaginary parts of the two terms in this
equation). Equivalently, we can write
that
x(t) = Re (cle te t(0t v
whereRe denotesthe real

part of
an expression. Now writing

cr + iCi,V = vr + ivi
and e iwt = cos (0t +
isin cot, this can be vsmitten
2.2
First-order Linear
form
real
ill
x(t)
Systems
101
as
eat (cr COSwt +
sin wt)vr +
COS(0t + cr sin (Ot)vt
initial conditions, the invariant subspace corresponding

real
for
Thus
conjugate eigenvalues is the plane spanned by vr
to a pair of complex
and Vi.
diagonalized the situation is not as simple, but is

If A cannot be
very complicated. We still have that 5' = Jy, but J
stillnot really
rather than diagonal. Triangular systems have one-way
is triangular can solve from
the bottom up, back substituting as we
coupling,so we
illustrate, we consider the case
To
go.
Wecan solve the equation
= y first and then back substitute,
gettingan inhomogeneous problem for Yl. The inhomogeneousterm

preventsthe behavior from being purely exponential,and the general
solutionbecomes (after converting back to the original coordinates)
x(t) = CletV1 + c2et(v2 + tV1)
(2.5)
whereVI is the eigenvector corresponding to and c'2 is the generalizedeigenvector;compare with Example 1.13. The line defined by the
eigenvectorVI is an invariant subspace, as is the plane defined by VI
andv2. However,the line defined by the generalized eigenvector is
not invariant.
Notethe tent term that appears in (2.5). In initial-valueproblems,
this term allows solutions to grow initially even when all of the eigenval-
ueshavenegativereal parts. As t 00,though, the exponentialfactor

dominates.Thus even when A is defective,its eigenvaluesdetermine
thelong-timedynamics and, in particular, the stability. The issue of
stabilityis addressed at length in Section 2.5; for the present we note
STABLEinitial
that the steady state x = 0 of (2.4)is ASYMPTOTICALLY
conditionsapproach it as t ooif and only if all the eigenvaluesof A
havenegativereal parts.
To summarize, the above results show that every homogeneous
constant-coefficient
problem = Ax can be rewritten as S' = Jy,
whereJ has a block diagonal structure exemplifiedby the following
Ordinary Differentia/
Equations
102
template
(0
each block are decoupled from those

to
corresponding
define invariant sub.
Thedynamics and the associated eigenvectors
invariant subspace are decoupled from
of all the others
each
in
spaces; the dynamics
those in all the
others.
Dynamics of Planar Systems

Qualitative
22.3
system, there is a large range of Possible
Il-dimensional
In a general
real and complex, with positive or negeigenvalues,
of
combinations
simple and general classification of the
a
2,
=
n
For
parts.
ativereal
Such systems are called PLANAR,
bepossible.
is
dynamics
possible
occur on a simple plane (sometimes called
cause all of the dynamics
two eigenvectors (or an eigenvector and

by
defined
PLANE)
PHASE
the
Writing
generalized eigenvector,if A is defective).
= Ax =
thecharacteristicequationfor A is
Noticethat a + d = trA and ad bc = detA, which we call T and D,

respectively.Recallthat T = Al + and D = Al,2.In two dimensions,
the eigenvaluesare determined only by the trace and determinant of
the matrix. When Re(A1) < 0 and Re(2) < 0, any initialcondition de-
cays exponentially to the originthe origin is ASYMPTOTICALLY

STABLE.
Theseconditionsare equivalent to T < 0, D > 0.

Figure2.1 shows the dynamical regimes that are possible for the planar system as characterized by T and D; asymptotically stable steadystate solutionsoccupy the second quadrant,
excluding the axes. Each
regimeon Figure2.1 shows a small
plot of the dynamics on the phase
2.2
First-Order Linear Systems
103
determinant
-- definite
+ definite
Re(A) > O
stable spiral
unstable spiral
stable node
(trace)_
< 0 and
unstable node
trace
>0
unstable saddle
indefinite
indefinite
Figure 2.1: Dynamical regimes for the planar system dx/dt = Ax,
A e
parametrized in the determinant and trace of
A; see also Strang (1986, Fig. 6.7).
planein that regime; the axes correspond to the eigenvectors (or real
andimaginaryparts of the eigenvectorsin the case of complexconjugates)and trajectories x(t) on this plane are shown with time as
the parameter. The arrows on the trajectories indicate the direction of

time. An important curve on this diagram is T2 4D = 0, where the
twoeigenvaluesare equal. This parabola is also the boundary between
on the phase plane) and exponential ones
oscillatorysolutions (SPIRALS
a spiral arises from a complex conjugate pair of eigenvalues
(NODES);
while a node arises from the case of two real eigenvalues with the same
sign. In the lower half of the figure, D < 0, the eigenvalues are real and
withopposite signs. The steady states in this regime are called SADDLEPOINTS,
because they have one stable direction and one unstable.
Figure 2.2 shows the dynamic behavior that occurs on the boundaries
betweenthe different regions.
102
template
(0
corresponding to each block are decoupled fromthose

dynamics
The
associated eigenvectors define invariantsubs
the
and
others
of all the
invariant subspace are decoupledfrom
each
in
dynamics
spaces; the
those in all the others.
of Planar Systems
Dynamics
Qualitative
2.2B
system, there is a large range of Possible
In a general It-dimensional real and complex, with positiveor neg.
combinationsof eigenvalues,
simple and general classificationof the
ative real parts. For n = 2, a Such systems are called PLANAR,
bepossible dynamicsis possible.
a simple plane (sometimescalled
cause all of the dynamicsoccur on
defined by two eigenvectors (or an eigenvectorand
PLANE)
the PHASE
generalized eigenvector,if A is defective). Writing
= Ax
the characteristicequation for A is

+ (ad bc) = 0
2 (a +
Noticethat a + d = trA and ad bc = detA, which we call T andD,

respectively. Recallthat T = + 2and D = I2.In two dimensions,
the eigenvaluesare determined only by the trace and determinantof
the matrix. WhenRe(1)< 0 and Re(2)< 0, any initial conditionde-
STABLE.
cays exponentially to the originthe origin is ASYMPTOTICALLY
These conditions are equivalent to T < 0, D > 0.

Figure 2.1 shows the dynamical regimes that are possible for the planar system as characterized by T and D; asymptotically stable steadystate solutions occupy the second quadrant, excluding the axes. Each
regime on Figure 2.1 shows a small plot of the dynamics on the phase
2.2
First-Order Linear
103
Systems
determinant
+ definite
-- definite
stable spiral
stable node
unstablespiral
(Oce _
unstable node
trace
< 0 and 2> 0
unstable saddle
indefinite
indefinite
= Ax,
Figure 2.1: Dynamical regimes for the planar system dx/dt
parametrized in the determinant and trace of
A e
A; see also Strang (1986, Fig. 6.7).
planein that regime; the axes correspond to the eigenvectors (or
real
andimaginaryparts of the eigenvectorsin the case of complexconjugates)and trajectories x(t) on this plane are shown with time as
the parameter. The arrows on the trajectories indicate the direction of

time. An important curve on this diagram is T2 41) = 0, where the
two eigenvaluesare equal. This parabola is also the boundary between
on the phase plane) and exponential ones
oscillatorysolutions (SPIRALS
a spiral arises from a complex conjugate pair of eigenvalues
(NODES);
whilea node arises from the case of two real eigenvalues with the same
sign. In the lower half of the figure, D < 0, the eigenvalues are real and
withopposite signs. The steady states in this regime are called SADbecause they have one stable direction and one unstable.
DLEPOINTS,
Figure2.2 shows the dynamic behavior that occurs on the boundaries
betweenthe different regions.
Ordinary Differential
Equations
104
determinant
X = ico
neutral center
node or
trace
or
stable node
stable star
unstable node
unstable star
behavior on the region boundaries for the plaFigure 2.2: Dynamical

see also Strang
nar systemdx/dt = Ax, A e R2x2
(1986, Fig. 6.10).
2.2.4 LaplaceTransform Methods for Solving the Inhomogeneous

Constant-Coefficient Problem
Inhomogeneousconstant-coefficient systems also can be decoupledby

transformationinto Jordan form: = Ax + g(t) becomes S' = Jy +
h(t), where h(t) = M-lg(t). Accordingly, once we understand how
to solve the scalar inhomogeneous problem, we will have learned what
we need to know to address the vector case. A powerful approach to
solvinginhomogeneousproblems relies on the LAPLACE
TRANSFORM.
Definition
Considerfunctionsof time f (t) that vanish for t < 0. If there existsa
real constant c > 0 such that f(t)e ct Osufficiently fast as t 00,we

can define the Laplace transform of f (t) , denoted f (s) , for all complex-
First-order
Linear Systems
that Re(s)
s such
105
L(f(t))
The
inverse
Re(s) c
(2.6)
formula is given by
c+i 00
est f(s)ds
2TTi c-ioo
properties
(2.7)
operator is linear. For every scalar a,

transform
Laplace
1. The
g (t), the following holds
and functions f (t) ,
L {cxf(t) + g(t)} = M (s) + (s)
The inverse transform
is also linear
L-1 icx(s) +
= af(t) + g(t)
2. Transform of derivatives
df(t)
d2f(t)
dt2
= s 2f(s) sf (0) f' (0)
dnf(t)
n= s n f (s) s n- l f(0) s 2f'(0)
3. Transform of integral
f(t')dt' = -f(s)
4. Derivativeof transform with respect to s
anf(s)
dsn
106
5. Time delay and
s delay
a)H(t a)) = e-QSf(s)
L(f(t
unit step function is defined as
where the Heaviside or
Il(t) =
6. Laplaceconvolutiontheorem
t')dt'
L
f (t t')g(t')dt'
7. Finalvalue theorem
lims(s) = limf(t)
if and only if sf(s) is bounded for all Re(s) 0
8. Initial-value theorem
lims(s) = lim f (t)

00
We can readily compute the Laplace transform of many simplef(t)

by using the definition and performing the integral. In this fashionwe
can construct Table 2.1 of Laplace transform pairs. Such tables prove
useful in solving differential equations. We next solve a few examples
using the Laplacetransform.
Example 2.1: Particle motion

Consider the motion of a particle of mass m connected to a springwith
spring constant K and experiencing an applied force F(t) as depicted

in Figure 2.3.
Let y denote the displacement from the origin and model the spring
as applying force FS -Ky. Newton's equation of motion for this
system is then
d2y
dt2
= F Ky
2.2
First-OrderLinear
Systems
(t)
107
s2
sn+l
cos (0t
s2 + (02
sin cot
s2 + (02
co
s2 (0 2
Sinh (0 t
cosh (0t
s2 (02
eat
teat
eat cos (0t

eat sin cot
Table 2.1: Small table of Laplace transformpairs. A more extensive

table is found in Appendix A.
-KY
F(t)
Figure 2.3: Particle of mass m at position y experiences spring force

Ky and applied force F(t).
108
Equations
this second-order
boundary conditions for
two
require
rest at the
We
the particle is initially at
origin,then
assume
we
If
equation.
=
t
at
O
specified
and
both
these initial
conditions are
the boundary
conditions are
dY (0) = o
dt
If we divide by the
mass of the particle we can express the modelas
d2Y+ y = f
dt2
dy(t)
= F/m. Take the Laplacetransformof
in which = K/m and f
of the particle versus time y(t), for
the model and find the position
arbitrary applied force f (t).
Solution
of motion and SUbstitut.
Taking the Laplacetransform of the equation
ing in the two initial conditions gives
2
s 2(s) sy(0) y' (0) + k (s) = f (s)
s 2(s) + k2(s) = f (s)
Solvingthis equation for y(s) gives
Wesee the transform is the product of two functions of s. The inverse

of each of these is available
-1
= sin kt
The first followsby the definitionof f (s) and the second followsfrom
Table2.1. Usingthe convolutiontheorem then gives
1
f (t') sink(t t')dt'
2.2
First-order
Linear Systems
109
the complete solution. We see that the particle position

have
andwe
convolut10n 1
wish to check that this solution indeed satisfies the
is a
may
Thereader equation and both initial conditions as claimed.
differential
A forced first-order differential equation

Example2.2:
first-order differential equation with forcing term
considerthe
dx
= ax + bu(t)
dt
x(0) = xo
transform to find x(t) for any forcing u(t).

Usethe Laplace
Solution
transform, substituting the initial condition, and

Takingthe Laplace
solvingfor (s), give
SR(s) xo = ax(s) + b(s)
X(s) =
xo
the second
Wecan invert the first term directly using Table 2.1, and
termusing the table and the convolution theorem giving
x(t) = xoeat + b
u(t')dt'
ea(tt')
Wesee the effect of the initial condition xo and the forcing term u(t).
If a < 0 so the system is asymptotically stable, the effect of the initial
conditiondecays exponentiallywith time. The forcing term affects the
solutionthrough the convolution of u with the time-shifted exponential.
Example2.3: Sets of coupled first-order differential equations

Considernext the inhomogeneous constant coefficient system (2.3),
withg(t) = Bu(t)
dx = Ax + Bu(t)
dt
x(0) = xo
110
Ordinary
Differential
qu
inwhichx e VI, u e Rtn,A e Rnxn B e
Qtions
applications, x is known as the state vector an In systems

and
variable vector. Use Laplace transforms to
Again taking the Laplace transform, substitutin

g the
and solving for X(s) gives
initial
conditio
n,
SR(s) -- xo = AX(s) + B(s)

(sl
xo + B(s)
(s) = (sl - A)-l xo + (sl -
We next require the matrix version of the Laplace
f(t)
transformpair
f(s)
1
eAt A e Rnxn
(sl
A)-l
which can be checked by applying the definitionof the

Laplacetrans.
form. Using this result and the convolution theorem gives
x(t) = eAtxo+
Notice we cannot move the constant matrix B outside the integral

aswe
did in the scalar case because the indices in the matrix multiplications
must conform as shown below
x(t) = eAt xo +
nxl
nxn nxl
eA(t-t') B u(t') dt'

nxn nxm mxl
2.2.5 Delta Function

The DELTAFUNCTION,
also known as the Dirac delta function(Dirac,
1958, pp. 58-61) or the unit impulse, is an idealization of a narrowand
2.2 First-Order
Linear Systems
111
tall "spike." Two examples of such functions are

gdx)
1
47T(X
e x2/4cx
(2.8)
gdx) = TT(CX2+ X2)
(2.9)
where > 0. Setting x = 0 and then taking the limit tx -+ 0 shows that
00,while setting x = xo 0 and taking the same limit
showsthat for any nonzero xo,
ga(xo) 0. These functions
becomeinfinitely high and infinitely narrow. Furthermore, they both
have unit area
gdx) ax = 1
A set of functions depending on a parameter and obeying the above

propertiesis called a DELTA
FAMILY.
The delta function (x) is the
limitingcase of a delta family as
0. It has infinite height, zero
width, and unit area. It is most properly thought of as a GENERALIZED
or DISTRIBUTION;
FUNCTION
the mathematicaltheory of these objects
is described in Stakgold (1998).
operationally,the key feature of the delta function is that when
integrated against a "normal" function f (x) the delta function extracts

the value of f at the x value where the delta function has its singularity
f(x)(x) dx = limJ f (x)gdx) dx = f (0)
(2.10)
The delta function also can be viewed as the generalized derivative of

the discontinuous unit step or Heavisidefunction H(x)
dH(x)
dx
Also note that the interval of integration in (2.10)does not have to be
(00,
00).The integral over any interval containing the point of singularity for the delta function produces the value of f (x) at the point of
singularity. For example
f(x)(x a)dx = f (a)
for all
for all ae R
Finally,by changing the variable of integration we can show that the

delta function is an even function
Equations
112
Delta Function
the
of
Derivatives
Doublet.
of the delta function is that it is

property
An interesting derivative is termed the doublet or dipole
The first
also differentiable.
(x)
usually denoted (5'
d(x)
dx
b(x) to denote the doublet instead

notation
dot
Sometimeswe see the integration by parts on the integral
of D'(x). If we perform
(x)dx
the negative of the first derivativeoff

selects
doublet
we find that the location of the doublet's singularity
evaluatedat the
(2.11)
also find by changing the variableof

We
equation.
this
in
Note the sign
delta function, or singlet, which is an even
integration that, unlike the
function, the doublet is odd
'(x)
integration by parts
the
Higher-order derivatives. Repeatedtriplets, quadruplets, produces
etc.
followinghigher-orderformulas for
nf (n)(0)
n 0
(x)dx =
As with the singlet and doublet, we can change the variableof integration and shift the location of the singularity to obtain the general
formula
a)dx =
nf (n )(a)
a e R
Finallywe can use the definition of the Laplace transform to takethe

transform of the delta function and its derivatives to obtain the transform pairs listed in Table 2.2.

2.3.1 Introduction
In many chemical engineering applications, equations like this one are
encountered
x 2d2y
dy
dx
(2.12)
113
f(s)
1
s
1
Table 2.2: Laplace transform pairs involving (5

and its derivatives.
This is called BESSEL'SEQUATIONOFORDERv, and arises in the
study of
diffusion and wave propagation via the Laplacianoperator in cylindrical

coordinates. Since the coefficients in front of the derivativeterms are
not constant, the exponential functions that solvedconstant-coefficient
problems does not work here. Typically,variable-coefficient

problems
must be solved by power series methods or by numericalmethods, as
they have no simple closed-form solution. We focus here on secondorder equations, as they arise most commonlyin applications.
23.2 The Cauchy-EulerEquation

equation, also called the EQUIDIMENSIONAL
The CAUCHY-EULER
equa-
tion, has a simple exact solution that illustrates many important features of variable-coefficientproblems and arises during the solution of
manyproblems. The second-order Cauchy-Eulerequationhas the form
aox 2y" + al xy' + a2Y = 0
(2.13)
where y' = dy /dx. Its defining feature is that the term containingthe
nth derivativeis multipliedby the nth power of x. Becauseof this,

guessing that the form of the solution is y = x a yields the quadratic
equation
and
1) +
then each root leads to a solutionand thus the general
solution is found
+ a2 = 0. If this equation has distinct roots
Y=
+ C2xa2
(2.14)
=
For example, let ao = 1, al = 1, = 9,yielding the equation cx2 9
0, which has solutions = 3. Thus the equation has two solutions
114
general solution is y = ClX3+

of the form y = xox.the
up at x = O;this singular behavior
that this solution can blow
(linear) problems, but is
frequently
arise in constant-coefficient
found
the general solution does
In the case of a repeated root,
given one solution to a second-order
linear problem
(x), the
(x). For example, second

can be found in the form Y2(x) =
let
repeated
root
=
I.
the
yielding
l,
Thus
=
=
2
-- ,
l,al
=x
into
the
substitution
upon
differential
= A(x)x, which,
equation,
yields
A"x3 + 2A'x2 A'x2
Ax + Ax = O
which simplifies to
Letting A' = w leads to a simple first-order equation for w
so that w = c/x and thus A = clnx + d, where c and d are arbitrary
constants. Thus the general solution for this problem can be written
y(x) =
+ c2xlnx = x(C1+ Inx)
It can be shown in general that (second-order) Cauchy-Eulerequations

with repeated roots have the general solution
y(x) = x a (C1+ Q Inx)
(2.15)
2.3.3 Series Solutions and the Methodof Frobenius

A general linear second-order problem can be written
p(x)y" + q(x)y' + r(x)y = 0

or
q(x)
r(x)
p(x)
p(x)
(2.16)
(2.17)
q(x) and r (x) are ANALYTIC,

i.e., they have a convergent TaylorseIf p(x)
p(x)
POINT.
ries expansion,at some point x = a, then a is an ORDINARY
Otherwise, x = a is a SINGULAR
POINT.
2.3 Linear
Equations with Variable Coefficients
If x = a is an ordinarypoint, there exist
power series
115
solutionsin the form of

(2.18)
TWOsuch solutions can be found, thus yielding the general

solution.
Lettingp be the distance between a and the nearest
singular point of
the differential equation, which might be at a complexrather
than a
real value of x, the series convergel for Ix al < p. Accordingly
p is
calledthe RADIUSOFCONVERGENCE
of the series. The exception to this
is when a series solution truncates after a finite number of terms, i.e.,
0 for M > MO;in this case the sum is always
CM
finite for finite x.
Example2.4: Power series solution for a constant-coefficientequation
Letp(x) = 1, q(x) = 0 and r (x) = 1<2,
resultingin the equation
Solvethis by power series expansion.
Solution
Weseek a solution by expandingaround the ordinarypoint a = 0. For

this simple example, every point is an ordinary point. Inserting the
solutionform, (2.18),into this equation yields
n=2
- 1)cnxn-2 +
E cnxn = 0
n=0
The two sums can be combined if we can make their lower limits the
same. Thus we set n = m + 2 in the first series and n = m in the

second,obtaining
[ (m +
+ 1)cm+2 + k2cm] x m = 0
Thiscan only hold if the term inside the square brackets is zero for all
m, requiring that
cnk2
IA full understanding of convergence of power series requires knowledge of functions of complex variables, see, e.g., Ablowitz and Fokas (2003).
116
now reverted to
have
we
(where
that
arbitrary, we find
Equations
using n as the index). Leaving

coand
Cok2
Clk2
3!
--C21<2 cok4
4!
4-3
C3k2
5-4
5!
Cl (recall that it is arbitrary), the series

Absorbinga factor of l/k into
solution becomes
k2x 2
k4x 4
kx
4!
k 3x 3
k 5x 5
3!
5!
co and Cl, so it is the gen.

Note that this has two arbitrary constants
eral solution. The two infinite series can be recognized as the Taylor
expansions of two familiar functions, and we can thus rewrite the gen_
eral solution as
y(x) = co cos kx + Cl sinkx
If p (x) - 0 at some point x = a, the situation is more complex.We
seta = 0 from now on for convenience. Now q (x)/p (x) and r(x)/p(x)
POINT.If x (q(x)/p(x))
are not analytic and x = 0 is called a SINGULAR
and x2 (r(x)/p(x)) are analytic,i.e., the singularityin p(x) is not

SINGULAR
POINT.Observe
very strong, then the point is a REGULAR
that x = 0 is a regular singular point for the Cauchy-Eulerequation.In

fact, by multiplying(2.17)by x2 and Taylor-expanding the coefficients,
one can see that when the conditions for a regular singular point are
satisfied, this general case reduces precisely to a Cauchy-Euler equation

as x 0. This observation motivates the METHODOFFROBENIUS,
which
seeks solutions of the form
y(x) =
n
E cnx
n=0
(2.19)
The power series has the same convergenceproperties as described

above for ordinary points.
2.3 Linear
Equations with Variable Coefficients
117
Frobenius solution for Bessel's

Example2.5:
equation of order zero
(2.12)
with
v
equation
= 0 is
Bessel's
(2.20)
Here X
0 is a regular singular point. Solveby the

method of Frobe-
Ilius.
Solution
Observethat this equation can be written x2y" +xy' + (0+x2)y = 0 so

the corresponding Cauchy-Euler equation is thus x2y" + xy' + Oy = 0.
yields the repeated root = 0 and
seekinga solution y =
thus a
y(x)
Cl
+
c2
solution
Inx.
As
we
will
general
see, this structure is
form
of
the
the
solution
in
to
reflected
Bessel's equation.
Frobenius
the
solution
Inserting
form, (2.19)into (2.20)yields that
n+cxl
n=0
+ CnX
(n+a)cnx n+cxl
Tosimplifythis series, set n = m + 2 in the first two sums and m = n

in the third. Then set all the ms back to n. This yields a summation
startingat n = 2,which is fine as long as we make c-2 = c-1 = 0. The
formulabecomes
2) 2ctt+2 +cn]x
Sincex can vary, the equality can only hold if the terms in the brackets
are all zero. This is the recursion formula for the coefficientscn. The
firstterm (n = 2)picks out the Cauchy-Eulerbehavior and is called the
EQUATION.
Since c-2 = 0, it reduces to (n + +
INDICIAL
=
= 0.
Aswe anticipated above with the corresponding Cauchy-Eulerequation,
thishas the repeated root = 0. The general recursion relation for the
coefficientsreads
cn
Sincec-1 = 0, all the coefficients with n odd are zero. Therefore, only
oneof the two solutions to the problem has the form of (2.19),again,
in parallel with the Cauchy-Euler analysis. With some rearrangements,
this solution becomes
Yl(x) =
n=0
118
Equations
symbol Jo(x) and is called the "Bessel

special
the
zero." For general v, the solutions
order
This function has
and
kind
function of the first A second solution can be found for this problem
It is not of
are denoted Jv(x).
Frobemus
see Exercise 2.31. (again
parameters;
O
of
x
as
variation
as
by
anticipate
d
logarithmic singularity Cauchy-Eulerequation).
a
having
form,
corresponding
It
is
the
from the solution tothe "Besselfunction of the second kind and order
called Yo(x) and is
Thegeneral
general v are denoted
for
solutions
Singular
zero."
solution is
c2Yo(x)
(2.21)
y(x) = ClJo(X) +
functions Jo and Yo.Note that for com.
of
graph
a
for
2.3
See Table
shows the solution for the radialpart
also
table
the
parison purposes,
cylindrical, and spherical coordinates.
rectangular,
in
0
=
y
of V2y
indicial equation yielded a singlerethe
example,
previous
In the
solution of Frobenius form. Other casesare
one
and
for
root
peated
and their consequences.
possibilities
the
are
Here
possible.
equal, only one Frobenius solutionis obare

roots
indicial
the
If
1.
the above example.
tained. This is what occurred in
constant, then each root leads
2. If the roots differ by a noninteger
solution is obtained.
to a solution and the general
then the (algebraically)largerroot
3. If the roots differ by an integer
leads to a Frobenius solution and either
(a) the smaller root also leads to a Frobenius solution and the
general solution is obtained, or
(b) the smaller root does not lead to a second solution of Frobe-
nius form. A second solution can be found by reduction

of order and have a logarithmic singularity just as in the
Cauchy-Euler case.
2.4 Function Spaces and Differential Operators

2.4.1 Functions as Vectors
One of the main tasks of mathematical modeling is the exact or approximate representation of functions. Here we extend the ideas of vectors
and bases into the regime where each vector is a function, so the space
SPACE.
the vectors live in is a FUNCTION
2.4 Function Spaces and Differentia/
Operators
119
Rectangular
Coordinates
d2y
dx2
Cylindrical
Coordinates
=O
Spherical
Coordinates
I d
2dy
r2 dr (r dr ) Y
cosx
Jo(r)
cosr
8 10 12 14
sinx
Yo(r)
sin r
10 12 14
8
10
10
10
Io(r)
Ko(r)
Table 2.3: The linear differential equations arising from the radial
part of V2y y = 0 in rectangular, cylindrical,and spherical coordinates. Bessel functions (Jo,Yo)and modified
Bessel functions (10,1<0)are two linearly independent so-
lutions in cylindrical coordinates for the plus and minus

signs, respectively. The solutions in spherical coordinates
are called spherical Bessel functions.
12 14
120
space cn, the usual inner productof

finite-dimensional
Il-dimensional version of the dot product
In the
the
simply
tors u and v is
uiVi
v (x) in a domain a
and
u(x)
For functions
is
to this relation
(u(x),v(x)) =
b, a natural
analog
dx
product for functions defined on the interval

inner
usual
This is the
product, we can obtain a norm
inner
this
From
[a, b].
1/2
a
u(x)(x) dx
which plays an important role shortly, is given

product,
inner
Another
by the formula
dx
(u(x),
weight function and must be positive

in
where w(x) is a so-called
bounded function
(a, b). Finally, with these

satisfies
definitions, a
is onethat
2 < 00
dx = llu11w
With these definitions in hand, we can define an important function

space. The set of functions u(x) that satisfy (u, u) = llull < withthe
SPACEL2(a, b). If wehad
usual inner product (w = 1) is the LEBESGUE
used a nonunit weight function w (x) in the inner product, wewould

SPACES.
A
have L2,w(a, b). Lebesgue spaces are examples of HILBERT
Hilbert space is essentially identical to a space of vectors with infinitely
many components, so that all of our intuition about directions,lengths
and angles carries over from two dimensions into an infinite numberof
dimensions!
Basis Sets and Fourier Series

In a finite-dimensionalspace, any vector can be represented in an orthogonal basis {el, Q, } as
(ei, ei)
ei
Spaces and Differential

2.4 Function
Operators
121
in
Thesame is true a Hilbert space, except that
function(hi(x) and the sum is infinite2,e.g.,each basis vector is now
of the most important basis sets for 1.2are

TWO
the trigonometric
functionsand the Legendre polynomials.
consider the space L2( r, TT),i.e., the
Lebesgue space defined
as
above,except on the interva13 from TT
to r. The functions
etkx = cos kx + isin kx, k
arein this space. In addition, they satisfy

(e ikx , e ilx ) = 2TTk1
Thatis, they are orthogonal. A natural question, then, is

whether this
set can be used as a basis for L2
TT).Specifically,we examine
the
propositionthat every function in L2(TT,
r) can be represented as
ikx
(2.22)
Thisis the trigonometric FOURIER

SERIES
representation of f (x). The
are the FOURIERCOEFFICIENTS
and are given by the standard formula
for expansion of a vector in an orthogonal basis
(ek,ek)
dx
f(x)e LkX
27T -TT
(2.23)
The equality (2.22) cannot possibly hold at everypoint x for every

IT), simply because trigonometric functions
function f (x) e L2(TT,
are continuous and smooth, and functions in
TT)are allowed
r) is not measured pointto have discontinuities. Distance in L2(7T,
wise,however, but rather via the L2 norm. To address the issue of
the distance between a function and its Fourier series representation,
considerthe finite trigonometric series expansion
k=-K
2Depending
on the specific situation, the sum's lower limit might be 0, 1, or 00.
3The interval
(0, 27T)might also be used.
122
Equations
and recall that in L2,the distance between f and Pk is givenby
f (x)
Uketkx dx
We can now ask the question: given integer K, what coefficients

minimize the 1-2distance between f and PK? It can be shown that the
solution to this minimization problem is
for k
1, 2, ... , K, with the Ckgiven by (2.23) (Gasquet and Witomski
1999). Becausethe Ckdo not depend on the number of terms,K,if

we decide to increase the order of the approximation, we do not need
to recalculate the lower-order coefficients. We can now considerthe
truncated Fourier series
Cke
ikx
The question of convergence of this series to the function f is nontrivial; we state without proof that for functions in L2(TT,
TT)
as K 00
llf(x)
The rate of convergenceof fK to f depends on the behaviorof the
Fourier coefficients Ckas 11<1 00. Returning to (2.23) and integrating
by parts
2TTCk = (f, e lkx )
(2.24)
f(x)e tkxdx
-1
(2.25)
f' (x)e -ikx dx
(2.26)
Therefore ICkldecays at least as fast as k-l as k 00.This is oftenwritten as Ck= O(k -l ): "Ckis order k-l " If, additionally, f (r) = f(-TT)
and f' (x) is differentiable,then the first term in (2.26)vanishesand

we can repeat the integration by parts procedure on the remainingintegral to conclude that Ck= O(k-2). Iterating this argument,we canconr),
clude that if f (x) is m-times continuouslydifferentiablein (TT,
i.e., the mth derivativef (m)is continuous, and that f (j) is periodicfor
all j m 2, then
= O(k-nt)
123
Spaces and Differential Operators

Function
2.4
exact
1
K = 10
0.8
0.6
fK(x)
0.4
0.2
0
-0.2
-4
-3
-2
-1
trigonoFigure 2.4: Function f (x) = exp ( 8G)2) and truncated

metric Fourier series approximations with K = 2, 5, 10.
The approximations with K = 5 and K = 10 are visually
indistinguishable from the exact function.
to j
Thecase just discussed, in which Ck = O(k-2), corresponds
= 2. For infinitely smooth periodic functions, this argument impliesthat the Fourier coefficientsdecay faster than any finite negative
convergence. Figpowerof k. This is called exponential or SPECTRAL
ure 2.4 shows truncated Fourier series approximations to the function
f (x) = exp ( 8() ) with several values of K. Although this function
is not exactlyperiodic, its function values and derivatives at x =
are extremelysmall, so convergence is rapid.
If f (x) is discontinuousor f (r) * f (r), then Ck= O(k-l )
convergenceis very slow. The most obvious characteristic of Fourier

seriesrepresentations of discontinuous functions is the GIBBSPHENOMENON,
the rapid oscillation of the truncated series fn in the vicinity of the discontinuity. For further discussion of the convergence of
Fourierseries see Gasquet and Witomski (1999);and Canuto, Hussaini,
Quarteroni,and Zang (2006).
124
Equations
Example 2.6: Fourier series of a nonperiodic function

What is the Fourier series expansion of f (x) = x?
Solution
Application of (2.26)immediately yields that
i
Observe that c-k

series as
k0
(see Exercise 2.5), so we can write

the
fK(x) = co + 2
Fourier
(Re(ck) cos kx Im(Ck)sin kx)
which in the present case reduces to
fK(x) =
sinkx
This series contains only sines, not cosines, reflecting the fact
that the
function f (x) = x is odd. Figure 2.5 shows the approximation

for
K = 5, 10, and 50, which exhibits Gibbs phenomenon as expected

for a
nonperiodic function.
The plot remains essentially the same if the discontinuityis in the
interior rather than on the boundary. For example, the function
f(x) =
is periodic (along with all its derivatives) but has a discontinuity at the
origin. The Fourier series of this function is the same as that for the
previous, except shifted by 7T
fK(x) =
sink(x + 7T)=
-2
sin kx
For trigonometric Fourier series, Gibbs phenomenon occurs whether

the discontinuity occurs on the boundary or in the interior of the domain.

Function
125
4
3
2
1
fK(x) 0
-1
-2
-3
-4
-4
-3
-1
-2
series approximation to
Figure 2.5: Truncated trigonometric Fourier
as
f (x) = x, using K = 5, 10, 50. The wiggles get finer
K increases.
Implicitly,the trigonometric basis assumes that the function is pethe

riodic,with the period being the length of the interval. This is why
Gibbsphenomenonoccurs if the boundary values of the function are
not the same. Another basis that does not make this implicit assumpThis basis can
POLYNOMIALS.
tion is given by the so-called LEGENDRE
be constructedby performing Gram-Schmidtorthogonalization on the
set {1,x, x 2, x 3, .. .}. The first several of these polynomials, now in the
spaceL2(1,1), the usual setting for polynomial basis functions, are
PI(x) = x
(2.27)
(2.28)
P2(x) = (3x2 - 1)/2
(2.29)
PO(x) = 1
Pj+1(X)=
xPj(x) -
Pj-l (X)
(2.30)
126
Ordinary
Differential
and the Legendre-Fourier series representation of a function

is
Note that the sum starts with the index i = 0, whichis c

onventional
polynomial bases.
for
not
is
basis
orthonormal;
this
written,
As
instead each
has been scaled so that its value is 1 at x = 1. The function
f
can be represented exactly, since PI (x) = x. Convergence (x)
for Fourier
series based on Legendre polynomials is analogous to that
for
metric functions; in particular, spectral convergence is found trigonoforfunc_
tions that have infinitely many derivatives, whether they are
periodicor
not. Werefer the interested reader to Canuto et al. (2006)for
detailed
analysis.
Figure 2.6 shows Legendre-Fourier series approximations to the
func-
tion f (x) exp ( 8x2) truncated at n + 1 terms, i.e.,includingpoly.

nomials up to degree n. As with the trigonometricFourierseriesapproximation of this function, convergence is rapid. Figure2.7shows
Legendre-Fourier Series approximations to the unit step functionf(x) =
H(x); because this function is discontinuous, the Legendre-Fourier
series also displays Gibbsphenomenon.
The trigonometric and Legendre basis sets are very important,but
there are many others that also are important and widelyseenin applications. The following section introduces an entire class of equations,
each of whose members generates a basis set.
2.4.2 Self-Adjoint Differential Operators and Sturm-LiouvilleEquations
When we studied linear algebra, we learned that self-adjointmatrix

operators in [R1thave special properties, namely that their eigenvalues
are real and their eigenvectors form an orthogonal basis for Vt. Selfadjoint differential operators also generate basis vectors (functions).
Recallthe definition of the adjoint L* of an operator L
(Lu,v) = (u, L*v)

Let us apply this definition to the operator L = d/dx in the interval
2.4

Function
127
exact
1
n = 10
0.8
0.6
0.4
fn(x)
0.2
-0.2
-0.4
-1
0.5
-0.5
Figure 2.6: Functionf (x) = exp ( 8x2) and truncated LegendreFourier series approximations with n = 2, 5, 10.
[0,1] and the usual, i.e., uniformly weighted, inner product
u(l)v(l)
u(x)v'(x)dx
SinceL is here a first derivative, any differential equation involving it

requiresspecification of one boundary condition. As an example, we
requirethat u(0) = 0. Now the boundary term at x = 0 vanishes. Now
observethat if we require that v (1) = 0, the boundary term at x = 1
alsovanishes, leaving the result
(Lu,v) =
u(x)v'(x)dx
whereL* = d/dx. Therefore, if L is d/dx, operating on functions that

vanishat x = 0, then from the above equation, L* = d/dx, operating
128
Equations
0.8
0.6
fn(x)
0.4
0.2
0
-0.2
-1
-0.5
0.5
Figure 2.7: Functionf (x) = H (x) and truncatedLegendre-Fourier

series approximations with n = 10, 50, 100.
on functions that vanish at x 1. The first derivative operator is not

self-adjoint.
If, however,we let L = d2/dx2 and require that u(0) = u(l) = O,
then the same procedure (but using integration by parts twice)shows
that L* is also d2/dx 2 operating on the same domain. The secondderivative operator, therefore, with appropriate boundary conditions,
is self-adjoint. More generally, consider a class of second-order differoperators. These operators
ential operators called STURM-LIOUVILLE
have the general form
1
(2.31)
w (x) dx
in the domain a < x < b, with homogeneous boundary conditions

cxu(a) + u'(a) = 0,
yu(b) + u'(b) = 0
To avoid the possibility of singular points, p (x) must be positive in the

Function
domain.
129
Furthermore,take the inner product to be
(u,v)w
u(x)v(x)w(x) dx
w (x) here is the same as in (2.31).For this integral to be

function
The
in the domain.
inner product, we must require that w (x) > 0
proper
a
that Sturm-Liouvilleoperators are self-adjoint. Reshow
now
We
integration by parts yields
peated
+ r(x)u v w dx
dx
(x)
aw
p(b) (u'(b)v(b) u(b)v'(b))
p(a) (u'(a)v(a) u(a)v'(a))
1
(Lu,
a w (x) dx
w dx
(2.32)
(2.33)
vanish, then this expression satisfies the selfIf the boundary terms
(u, Lv). This is the case if the above
adjointnesscondition (Lu, v) =
u and v. The restriction on the
boundaryconditionsapply on both
vanishes at one or both
boundaryconditions can be relaxed if p (x)
the function and its
boundaries,in which case only boundedness of
latter case is called a sinderivativeis required at that boundary. The
operator, because it has a singular point at the
gularSturm-Liouville
the boundary terms
boundaryor boundaries where p vanishes. Finally,
are
CONDITIONS
BOUNDARY
alsovanish if p (a) = p(b) and PERIODIC
imposed:u(a) = u(b), u' (a) = u' (b) and likewisefor v. the SturmNextconsider the eigenvalue problem associated with
operator4
Liouville
Lu + Au = 0
the
Aswith all self-adjoint operators, the eigenvalues are real and
called eigenfunctionsbecause they are elements of
eigenvectorsnow
a functionspaceareorthogonal with respect to the inner product
weightedby w (x). Furthermore, and very importantly, there are an
infinitenumber of eigenfunctions and they form a complete basis for
L2,w
(a, b). Wenext consider three Sturm-Liouvilleoperators that producesome famous eigenfunctions that are popular choices for use as
basis functions.
4Thisis the conventional form for writing differential eigenvalue problems. Unfortunately,it is different from the convention for algebraic problems.
130
trigonometric basis functions

Example2.7: Generating
d2/dx2,
boundaryconditions
Considerthe operatorL =
for this operator is
u(l) = 0. The eigenvalueproblem
(2.34)
Whatare the eigenvaluesand eigenfunctions?

Solution
This equationhas the general solution

u(x) = Cl sinx
We have thus taken
+ c2 cos
0: a negative value of
would lead to a
gen.
eral solution consisting of growing and decaying exponentials,which
cannot satisfy homogeneous boundary conditions on both boundaries
as can be easily checked. The boundary condition u(0) = 0 requires
that c2 = 0. Setting Cl = 0 leaves only the trivial solution u = 0, soto
satisfy the remainingboundary condition, we require that

sin N/I= 0
This is the characteristic equation for this eigenvalueproblem;it has
infinitely many roots
= n2TT2/12for n = 1, 2, 3, ... , 00.The case n = 0
does not result in an eigenvaluesince sino = 0. Thus the eigenfunctions are

I-ITT
X
un(x) sin
1
The result that Sturm-Liouville eigenfunctions
with (um, un) = 2mn.
form a basis for functions in L2(0,l) implies that we can writeany
function in that space as a Fourier series
f (x) =
where
(f
(sm
n=l
sin
sin
cn(x) sin
nTTX
1
=
21(f (x), sin nrx)
1
This is the FOURIER

SINESERIES
of f (x).
Nowconsider the same operator but with periodicboundaryconditions u(0) = u(l), u' (0) = u' (l). The boundary terms in (2.33)also

Function
2.4
this
vanish in
131
case, because here p(a) = p(b) = 1. Nowthe solution to
(2.34) is
u = exp iMx
the periodicity requirement if = ( 1 for any insatisfies

which
Thus the eigenfunctionsof d2/dx2 with periodic boundary
teger n.
l) are
conditionsin (0,
2nrrx
un expi
we recover the first set of basis functions we considered

TakingI = 2TT,
in Section
2.4.1.
equation revisited
Example2.8: Bessel's
Theoperator
x dx
d)
dx
arisesin many differential equations originatingin problems in polar

coordinates,e.g., diffusion in a cylinder. It has Sturm-Liouvilleform
withw = p = x, r = 0. The eigenvalueproblemfor this operator can
be written
x
or,multiplyingthrough by x2, as
x 2u" + xu' + x2u = 0
Whatare its eigenfunctions and eigenvalues?

Solution
Thisis a variable-coefficient problem with a regular singular point at
x = 0, so we can seek solutions by the method of Frobenius. Alternately,in the present case we can make the substitution z = xv/, thus
rewritingthe equation as
2d2u
du
dz
whichis in fact Bessel's equation of order zero. We already found that

this equation has the general solution u(z) = ClJo(z) + c2Yo(z), or,
revertingto the original independent variable,
132
Equations
To complete the specification of the eigenvalue problem

requires Choo
ing the domain and imposing specificboundary conditions.
s.
Consider
d; this is
all that
Boundednessrequires
is required, since p(0) = 0 and u(l) = 0.

c2 = 0, because Yodiverges logarithmically at the origin.
that
Satisfaction
The top center plot of Table 2.3 shows Jo (x); the positions of
its
determine the eigenvalues A. The first several of these are at zeros
mately x = 2.4, 5.5,8.7, 11.8,... and are tabulated in many approxi_
places,ins
cluding Abramowitz and Stegun (1970). Thus
(2.4/1)2,etc.
The
functions
un(x) =
form an orthogonal basis for L2,u,(0, l). Referring again to Table2.3

ul is the function Jo scaled so that its first zero is at x = l, is the
same function, but scaled so that its second zero is at x = I, etc.
Other boundary conditions could be chosen. For example, one could
require u(a) = 0, u(b) = 0. In this case the eigenfunctionsinvolveboth

Jo and Yo,and the eigenfunctions and eigenvaluesare determinedby
the solution to the coupled nonlinear equations
Jo(Vb)+
=O
Since is arbitrary, it has been set to unity for convenience.Herec2

and are the unknowns. Solution of these highly nonlinear equations
is nontrivial.
Example 2.9: Legendre's differential equation and Legendrepolynomials

Consider the Sturm-Liouvilleeigenvalue problem with p (x) = 1 x2,
w(x) = 1, r (x) = 0 in the domain 1< x < 1
1 x 2) u" 2xu' + Ru = 0
It has regular singular points at x = 1 while the originis an ordinary point. Becausep(x) = 0 at x = 1, only boundednessat these
points is required of the eigenfunctions. What are the eigenvaluesand

eigenfunctions?
2.4 FunctionSpaces and Differential
Operators
133
Solution
seeking a series solution around

x
Oreveals
Oan integer, then one of
the solutions that, if = +
(Exercise
1
of degree
2.35) and
is a Legendre 1) with
using the
polynomial
method of
learnthat the other has logarithmic
Frobenius one
can
because the radius of convergence singularitiesat x
=
of a Power
the distance to the nearest singular
series solution Otherwise,
there is no solution that is bounded point(Ab10witzand is given by
Fokas,
at both x =
fore,the eigenvalues of (2.9)are
1 and x = -1. 2003),
= + 1)
the corresponding eigenfunctions
with I = O,1,2, Thereare the
and
Legendre polynomials
Legendre polynomials are the
simplest of a
PI
broad class of
that come from
p0LYNOMlALS
ORTHOGONAL
Sturm-Liouville
are orthogonal with respect to
eigenvalue problems
various weighted
and
inner products.
Some
2.4.3 Existence and Uniqueness of
Solutions
HomogeneousBoundary Conditions
consider the nonhomogeneous second-order
differential equation with
the homogeneous boundary conditions
Blu=O
-O
(2.35)
Definethe null space of the operator
N(L) = {u I Lu -O,
BILL
-O, B2u=O}
and the null space of the adjoint operator
= {v IL*v -O, Bl*v-O, B2*v

then the following theorem characterizes existence and uniqueness of
solutions to (2.35) (Stakgold, 1998, p. 210-211).
Theorem 2.10 (Alternative theorem). For the boundary-valueproblem

in (2.35), we have the following two alternatives.
(a) Either
N(L) contains only the zero function in which case N(L*) contains
only the zero function and (2.35) has exactly one solution for every
134
Equations
(b) Or
linearly independent functions, in which case
n
contains
N (L)
independent functions
contains n linearly
N(L) = {111,112,
and (2.35)has a
solutionif and only if
is
and the general solution
u(x) = up(x) +
CXkUk(X)
particular solution and

in whichup (x) is any
are arbitrary
scalars.
problems that display the
Next we present two heat-conduction
two
alternatives.
Example 2.11: Steady-state temperature profile with fixed end tem-
peratures
Apply the alternative theorem to the steady-state heat-conduction problem with heat generation (x) and specified end-temperature boundary
conditions
-k
d2T(x)
dX2
What can you conclude about existence and uniqueness of the steadystate temperature profile?
Solution
First it is convenientto make the boundary conditionshomogeneous

by defining
u(x) = T(x) - Toa -x) -
and dividingby the thermal conductivity to give
Bill -O
=O
Spaces and Differential

2.4 Function
Operators
135
= /k and
inwhichf
Blu = u(0)
B2u= u(l)
Nextwe compute N(L). Setting Lu = 0 gives

u(x) = ax + b
Applyingthe boundary conditions gives
Blu
u(0) = b = 0
B2u = u(l) = a = 0
andwe see that u = 0 is the only element of N(L). We can therefore con-
dude that N(L*) also contains only the zero element,and the steadystate temperature profile exists and is unique for any heat-removalrate
Example 2.11 illustrates the first alternative in Theorem 2.10. The

followingexample illustrates the second alternative.
Example2.12: Steady-state temperature profile with insulated ends
Replacethe fixed-temperature boundary conditions in Example2.11
withinsulated-end boundary conditions. What can you conclude about
existenceand uniqueness of the steady-state temperature profile for
these boundary conditions? What is the physical interpretation of the
existencecondition. Why is the solution not unique?
Solution
Theboundary conditions for insulated ends are
are homogeneous,we have

and since the boundary conditions already
BIT = o
B2T= o
in whichf = /k and
d2
BIT = Tx(0)
B2T= Tx(l)
136
Nextwe compute N(L).
Setting LT = 0 gives
boundary conditions gives

as before. Applyingthe
BIT= Tx(0)= a = O B2T= Tx(l) = a -o

b is in N(L). With these boundary
and now we have that T(x)
null space consisting

ditions L has a one-dimensional gives {1}as the of the constant
basis function
function. Normalizingthis element
for
nullspace
N(L) and the one-dimensional
Sincethe problem is self-adjoint, N(L*) is identical to N(L). Applying
the alternativetheorem, we conclude that a steady-state temperature
exists only if
f(x)dx = 0
and the general solution is
T(x) = Tp(x)
where Tpis any particular solution. Since f corresponds to a rateof
heat removal (or addition when f < 0) to the domain, the restrictionon
f providesthe physicallyintuitivefact that if the ends are insulated,
just as much heat must be removed from the domain as is addedfora

steady-statetemperature to exist. For f satisfying this restriction,the
general solution indicates that a constant can be added to anysteadystate solution to provide another steady-state solution.
Nonhomogeneous Boundary Conditions
Nextconsider the nonhomogeneous second-order problem for u (x) on
x e [a, b] with the nonhomogeneous boundary conditions
B2u = Y2
(2.36)
The null spaces of the operator and the adjoint are definedas in the
case with homogeneous boundary conditions
2,
Function
Spa ces
define
WIIe11
and Differential Operators
137
{v IL*v = O, Bl*v-O, B2*v

the adjoint operator, we perform integration by parts
(Lu,v)
(u, L*v) =
by parts, we have that J (u, v) is linear in both

integration
the
d involves lower-order derivativesof u, v evaluated
interval. Setting J(u, v) Ibato zero is what deterboundary functionals
at the the adjoint
, es
Vu such that Blu = 0, B2u = 0
J(u,v)la -o
Vv such that Bl*v= 0, Blv = 0
solvability condition for the nonhomogeneousboundary
the
find
take the difference
we
conditions,
(Lu, Vk) (U,L*Vk) = J(u, 12k)la
element of the null space of the adjoint and u is the
any
is
Vk
which
Then, because Lu = f and VI,= 0, we have
(2.36).
to
solution
(2.37)
Y2,and
J(u, Vk) for u satisfying Blu = Yl and B2u =
Evaluating
O, B2*Vk 0, gives the solvabilityconditions for
Bl*Vk
satisfying
problem. The next example and Exercise 2.40
nonhomogeneous
the
conditions for problems with nonhomogeneous
derivethe solvability
boundaryconditions.
profile with fixed flux

Example2.13: Steady-state temperature
ends with fixed,
consideragain Example 2.12, but replace the insulated
nonzerofluxes at the ends
Tx(x) = Yl X = O
Tx(x) = Y2 x = 1
Forwhatf does the solution exist?
Solution
Thisfullynonhomogeneous problem can be written as
LT=f
BIT = Yl
B2T= Y2
138
and
in whichf = /k
d2
dx2
B2T= Tx(l)
BIT = Tx(0)
unchanged, so the constant function{1}

is
The null space N(L) is
and
function
basis
The problem was shown
and
to be self-adjoint so N (L* ) is one
J(u, v) for this problem.

(x) = 1. Nextwe compute
Integration
by parts gives
(Lu,v)
(u, L*v) = J(u, v) 10
Vkin N(L*), we have

For T satisfying the boundary conditions and
B2T= Tx(l)
BIT = Tx(0)
- dV1
B2*V1
dx
dV1
BI*VI- dx
=O
Substitutingthese into J gives

J(T,
dt'l
VI(1) Tx(l) -VI (0) Tx(0) - dx

10= ...-...y.-.>-...-...-y---.u
1
dV1
dx
Substitutingthis into the solvability condition, (2.37),gives

f(x)dx
= Y2 Yl
and the general solution remains
T(x) = Tp(x) + a
The restrictionon f now stipulates that the net heat generationmust
exactlybalance the heat removed through the two ends. Again,forf
satisfyingthis restriction, a constant can be added to any steady-state
solution to provide another steady-state solution.
2.4 Function Spaces and Differential
Operators
139
u(t)
du(t)
dt
y(t)
j (t) = f (t) + y(t)
f(t)
t
Figure 2.8: Solution to the initial-value

problemwith nonhomogeneous boundary conditions; top
figure shows u(t) with
step introduced at t = 0, and bottom
figure shows re-
sulting du/dt with impulseat t = 0.
NonhomogeneousBoundary Conditions Revisited

Wecan use the delta function and its derivativesintroduced in Section
22.5 to streamline the treatment of the nonhomogeneouscase. Basicallywe replace the nonhomogeneous boundary conditions with homo-
geneousones, but then compensate for this changeby adding appropriate impulsive terms to the forcing term of the differentialequation.
In this way, we have to recall only how to solve problems with homoge-
neous boundary conditions, and we can use Theorem 2.10 to analyze

existence and uniqueness even when a problem has nonhomogeneous
boundaryconditions.
It is perhaps easiest to introduce the approachwith an example.
Let's say we are interested in solving the first-order nonhomogeneous
differentialequation, with forcing term f (t), and nonhomogeneous
140
Equations
condition
boundary (initial)
du
dt
Figure 2.8. Imagine instead that we solve
in
sketched
is
The solution
homogeneous boundary condition u(0-) = O,and
the
with
problem
the
slightly to the left of zero. NowWeWish
0
=
t
at
boundary
we push the
y just after time Oto value u(0)
jump
solution
the
so
to make
with
problem
the
to
nonhomogeneous
solution
that it agrees with the
This idea is also sketched in Figure2.8
boundaryconditionat t = 0.
discontinuouslyby amount y at t = 0, we require
jump
u(t)
make
To
strength y at t = 0, which is y(t). Since
du/dt to have an impulse of
forcing term and chooseit
du/dt = f (t), we introduce a modified
to be
Weconjecture that solving the problem with this modified forcing term
and homogeneousboundary condition should give us the solutionto
the problem with the original f and nonhomogeneous boundary condition. Let's check this conjecture. By inspection, the solution to the
differentialequation is obtained by integration
du
dt
du =
u(t)
t
I o-
u(t) u(0-) =
u(t) =
Note that this solution satisfies
the homogeneous boundary condition
u(0-) = 0 as desired. Now we
substitute the definition of to obtain

Operators
141
the solution of the original problem
u(t) =
yJ
u(t) = y + J
f(T)dT
t
Byinspection, the last equation

is indeed the solution to the original
problemwith forcing term f and
nonhomogeneousboundary condition
We can generalize this approach
to cover any nonhomogeneity in the
boundary conditions by
adding appropriate impulsiveforcing terms

to
the original problem's differential equation.
We revisit Example2.13
to illustrate this technique.
Example2.14: Fixed flux revisited

Rederivethe existence and uniqueness conditions
for Example2.13using the alternative theorem, which applies only to
homogeneousproblems.
Solution
Wereplace the nonhomogeneous boundary conditions of Example 2.13
with the homogeneous version
BIT = Tx(O-) = O
B2T = Tx(l+) = O
In this example we require that TXjump from zero to value at the

left boundary, x = 0. That requires an impulse to be added to f so
that Txxsees an impulse and TXsees a jump at x = 0. We also require
for TXto jump from value to zero as x passes through x = 1 at the
right boundary. We add Y2(x1) to f to cause TXto jump by this
amount. The modified is therefore 5
) (x)
f (x) + Yl(x) Y2(x1)
5Note that if we had nonhomogeneous boundary conditions on T rather than TX,we

would required Tx to have an impulse and Txx to have a doublet,and we would add
Yl'(x) Y2'(x 1) to f.
142
self-adjoint so this is also
Equations
N(L*). The solvability condition applied

to
gives
Y2(xl))dx
(f (x) +
f(x)dx + Yl Y2
The last equation implies the solution exists

f(x)dx
for f satisfying
= Y2 Yl
and the general solution remains
T(x) = Tp(x) + (x
We see that we have reached the same solvability conditionfound
in Example2.13. By introducing f and using homogeneous boundary
conditions, we avoid the additional complication of introducingand
evaluatingJ(u, v) as explained in Section 2.4.3. EvaluatingJ(u, v) is
about the same work as determining the appropriate j. But using delta
functions expands the applicabilityof Theorem 2.10, and allowsthis

one theorem to cover both homogeneous and nonhomogeneous bound-
ary condition cases, which is not an insignificantbenefit.
Example 2.15: Nonhomogeneous boundary-value problem and the

Green's function
The following second-order nonhomogeneous boundary-value problem
arises in solving the transient wave equation for propagation of sound.
We wish to solve the followingBVPfor u(x), x e [0, 1]

Operators
143
in which the second-order differential

k2u, and the two boundary functionalsoperator is Lu = d2u/dx2 The constant k is real and the functionare Blu = u(0), B2u = u(l).
f (x) is an arbitraryforcing
function.
(a) Take the Laplace transform of
the BVPwith the
x variable playing
the role of time. Note that the
value of u(0) and ux (0) shows up
in the transform. Evaluate u(0)
and leave ux (0) as an unknown
constant.
(b) Invert the transform to obtain u(x).

(c) Solvefor ux(()) using the solution in the previouspart and the
other boundary condition. Plug the expressionfor ux(0) back
into your solution to obtain the complete solution to the problem.
(d) Next express the solution as
u(x) =
The function G (x, is known as the Green's function for the nonhomogeneous problem. 6 Write out the Green's function G(x,
for this problem.

(e) Establish that the Green's function G(x, is symmetricfor this
boundary-value problem, i.e., G(x, = G,x).
Hint: you may find the hyperbolic differenceformula useful:
sinh(a b) = sinhacoshb coshasinhb.
Solution
the differential equation gives

(a) Taking the Laplace transform of
s2(s)
su(0) ux(0) - k2(s) = f

(s 2
= f + ux(0)
ux(0)
greater detail in Chapter 3,
6The Green's function concept is explored in
3.3.5.
Section
144
(b) Usingthe
transform pair
and the convolution
u(x)
Equations
Sinhkx
theorem gives
sinh(k(x
ux(0)
k
Sinh loc
solution at x = 1 and solving for the unknown

(c) Evaluatingthe
ux (0) gives
-1
Sinh k
ux(0) sinhk o
Substituting ux (0) into the previous solution gives
u(x)
Sinh kx
k sinhk
(d) Combiningthese two integrals into one gives
u(x) =
with
1
sinh(k(x
kx
sinh(k(l
k Sinh k
Sinh
; ) )
Sinh kx Sinh
k Sinh k
9)
E<x
E)
(e) We work on the first part of G(x, ; ) using the Sinh difference
formula
sinh(a b) = sinhacoshb coshasinhb

Wehave for < x that
Sinhkx
k Sinh k
1
k Sinh k
sinh(k(l 9)
( sinhk sinh(k(x 9) Sinhkx sinh(k(l
2.5 Lyapunov Functions
and Stability
Using the Sinh

difference
145
formula on
the term in
E)) sinhkx
Sinh k(sinh
Canceling the cosh
sinhk
kx Cosh
-- Coshkx
Sinh kx(sinh
k Cosh
terms gives
parentheses
E)) =
Sinh ICE)cosh k Sinh
- E)) sinhkx
- E))=
Sinh k Cosh
kx Sinh ICE+
Sinh kx Cosh k Sinh
Factoring out the Sinh
term and using
the differenceformula
sinhk sinh(k(x ---E))

sinhkx sinh(k(l
E))
Sinh k; (
Sinh
Sinh
Sinh kx cosh k -cosh k Sinh k)

sinh(kx k)
sinh(k(l x))
Substituting this result into the
equationfor G(x, E) gives

sinhkE sinh(k(l x))
k Sinh k
sinhkx sinhk(l
k Sinh k
and we have established that G(x, = G(, x); the Green's

function for this operator is symmetric,a consequenceof the selfadjointness of L in this case.
2.5 Lyapunov Functions and Stability

2.5.1 Types of Stability
Consider a system model of interest to be an autonomous initial-value
problem
dx = f (x) x(0) = xo
(2.38)
dt
We are interested in the behavior of solutions to this system. Since

x) the
the solution depends on the initial condition,we denoteby
146
Stability
Equations
Asymptotic Stability
(5
(5
x(t)
x(t)
stability (left) and asymptotic stability
Figure 2.9: Solution behavior;
(right).
time t O,which has valuex at

solution to the initial-value problem at
problem aboveis given
time t = 0. So the solution to the initial-value
in the solution as wevary
by ((t; xo), t 0. But we are also interested
the initial value x. Steady-statesolutions to the model, if anyexist,

satisfy
f(xs) = o
We can always shift a steady state of interest to the origin by defining

a new coordinate, R = x xs, and f (R) = f (R + xs) so that
di dx = f (x) = f (R + xs)
dt dt
dt
So we assume without loss of generality that xs = 0, i.e., the originis
the steady state of interest. Unlike a linear system, when dealingwith
a nonlinear system, stability depends on the solution of interest,and
we may have some solutions that are stable, while others are unstable.
For a given linear system, the stability of all solutions are identical,and
to reflect this special situation, we often refer to stability of the system,
rather than stability of a solution.
There are several aspects to stability, and we define these next.The
first most basic characteristic of interest is whether a small perturbation to x away from the steady-state solution results in a smallsubsecomquent deviation for all future times. The general term stabilityis
monly reserved for this most basic notion; we use the more precise
2.5 Lyapunov
Functions and Stability
147
term Lyapunov stability or stable in the sense of Lyapunov if we need

to ensure that there is no confusion. The definition is as follows.
Definition2.16 ((Lyapunov)Stability). The origin is (Lyapunov)stable
if for every > 0, there exists (5 > 0 such that Ilxll (5 implies
c for all t 0.
The stability concept is illustrated on the left side of Figure 2.9. A
solutionthat is not stable is termed UNSTABLE.
The next characteristic
whether
small perturbations to the initial state die away
of interest is
as time increases. The idea here is whether the origin attracts solutions
starting nearby.
Definition 2.17 (Attractivity). The origin is attractive if there exists
Osuch that Ilxll (5implies that
lim
t 00
>
=0
Asymptoticstability is then the combinationof these two properties.
Definition 2.18 (Asymptotic stability). The origin is asymptotically stable if it is (i) stable and (ii) attractive.
The right side of Figure 2.9 shows a representative solution trajectory when the origin is asymptotically stable.7 One might wonder
whyLyapunov stability is a requirement of asymptotic stability, or even

whether the origin can be attractive, and not Lyapunov stable. The an-
sweris yes, the origin in a nonlinear system may be globally attractive and still not Lyapunov stable. The problem with these systems is
that there exist starting points, arbitrarily close to the origin, for which
the resulting trajectories become large before they asymptotically approach zero as time tends to infinity. Becausewe cannot bound how
largethe solution transient becomes by constraining the size of its initialvalue,we classify the origin as unstable.8 Note that the system must
7Asymptotic stability is probably the most common notion of stability that people
havein mind, and sometimes it is referred to simply as stability. Of course, this usage
may cause confusion because now the term stability is being used in two ways: as
Lyapunovstability and as asymptotic stability; and one is a subset of the other.
80ne is obviously free to define words as one pleases, but defining asymptotic stability in this way precludes a possible solution behavior that is not expected of "nice"
or "stable" solutions. Regardless of terminology, the important point is to be aware
that solutions can be globally attractive and not Lyapunov stable.
148
be nonlinear for a solution to be attractive and unstable. Forlin

tems, attractivity and asymptotic stability are identical; see earsys
Exercise
2.60.
A stronger form of asymptotic stability known as exponential

bility is often useful, especially when dealing with linear dynamics.
It
Definition 2.19 (Exponentialstability). The origin is exponentially

ble if there exists > 0 such that llxll (5implies that there sta.
exist
c, A > 0 for which
c llxll e
for all t
We leave it as an exercise for the reader to show that the

definition
of exponential stability implies also Lyapunov stability.
2.5.2 Lyapunov Functions
Now we consider a scalar function of x, denoted V(x), whosechar.

acteristics are going to enable us to analyze the stability of the origin
without requiring us to first solve completely the model = f (x). The
motivation for this class of functions is the role that mechanical
energy plays in a mechanical system. Consider mechanicalenergytobe
the sum of kinetic and potential energies, T and K, and let total energy
be the sum of mechanical energy and internal energy
If we start an isolated mechanical system, such as the particle on a track
depicted in Figure 2.10, at some system temperature with someinitial

kinetic and potential energies, and monitor the mechanicalenergywith
time, we observe that although the total energy E is conserved,the
mechanical energy EMsteadily drops as some of that form of energyis
converted into heat by friction.9 The temperature of the system slowly
increases due to the conversion of energy into heat, and the internal
energy U of the system increases to maintain the total energyconstant.
If we define the height of the track at its lowest point as h = 0, wethen
have EM= (1/2)mv 2 + mgh, and since h 0, m > 0, and v 2 0,we
have that EM 0. The mechanical energy is therefore a scalarfunction
satisfying
EMO
9This conversion of mechanical energy into heat is what causes the system's entropy
to increase.

149
Figure 2.10: A simple mechanical

system with total energy
E, internal energy U, kinetic energy

T = (1/2)mv 2, and potential energy K = mgh.
The mechanicalenergyis
EM = T + K, and the total energy

is E = EM+ U.
BecauseEMdecreases with time and is bounded belowby zero, we

expect that its only possible steady state is EM= 0, and EM= 0 implies
both v = 0 and h = 0. So by analyzingthe energyfunctionEMin this
fashion, we conclude that the marble at rest at the bottom of the track
is an asymptotically stable steady state, and we do not have to solve
the complicated equations of motion of the system to deduce this fact.
Wewish to generalize this concept, and the key idea is to define V(x)
Ro,with a negative time
to be a nonnegative scalar function V :
0. To compute the time derivativeof V(x(t)), we
derivative (x(t))
apply the chain rule giving10
V T dx
x) dt
f(x)
(2.39)
the concept of a Lyapunov

This generalization of mechanical energy is
definitionis as follows.
function for the system = f (x). A precise
Consider a compact (closed and
Definition 2.20 (Lyapunov function).
its interior and let funcin
origin
the
containing
bounded) set D C [Ril
Some
derivatives with respect to vectors.
for
notations
=
(x)
various
or
IOSeeAppendix A for
form (x) = VV
the
in
equation
this
readers may be more familiar with
(W)T or (x) = i.
150
tion V : [Vt
Equations
be continuously differentiable and satisfyll

[R()
V(0) = 0 and V(x) > 0 for x e D \ 0

V'(x) 0 for x e D
(2.40)
(2.41)
system = f (x).
Then V( ) is a Lyapunov function for the
The big payoff for having a Lyapunov function for a system is the
mediate stability analysis that it provides. We present next a fewrepres
sentative theorems stating these results. Wemainly follow Khalil(2002)
in the followingpresentation, and the interested reader may wishto

consult that reference for further results on Lyapunovfunctionsand
stability theory. Werequire two fundamental results from real analysis

to prove the Lyapunov stability theorems. The first concerns a nonin_
creasing function of time that is bounded below, which is a Property

we shall establish for V(x(t)) considered as a function of time. One
of the fundamental results from real analysis is that such a function
convergesas time tends to infinity (Bartle and Sherbert, 2000,Theorems 3.32 and 4.3.11). The second result is that a continuous function
defined on a compact (closed and bounded) set achieves its minimum
and maximum values on the set. For scalar functions, i.e., f : R
this "extreme-value"or "maximum-minimum" theorem is a fundamental result in real analysis (Bartle and Sherbert, 2000, p. 130), and is
often associated with Weierstrass or Bolzano. The result also holdsfor
multivariate functions like the Lyapunov function V : Rn Ro,which
we require here, and is a highly useful tool in optimization theory (Mangasarian, 1994, p. 198) (Polak, 1997, Corollary 5.1.25) (Rockafellar and
Wets, 1998, p. 11) (Rawlings and Mayne, 2009, Proposition A.7).
is called
\0ththisptoperty
to indicate tha
SET,
Nextno
Theorem 2.21 (Lyapunov stability). Let V( ) be a Lyapunov function

for the system = f (x). Then the origin is (Lyapunov)stable.
Proof. Given > 0 choose r e (0, E] such that
{X e
I llxll
an INV;
0. Therefore we
Therefore,
hiRlonllt0
gD
The symbol Br denotes a BALLof radius r. Such an r > 0 exists since

D contains the origin in its interior, The sets D and Br are depictedin
Figure 2.11. Define by
min
xeD,llxllr
V(x)
11Fortwo sets A and B, the notation A \ B is defined to be the elements of A that
are not elements of B, or, equivalently, the elements of A remaining after removing the
elements of B.
origin
ts
otiCQtl
151
VCO
Figure 2.11: The origin and sets D, Br, V (shaded), and B.
Notethat cxis well defined because it is the minimization of a continuous function on a compact set, and > 0 because of (2.40). Choose
e (0, a) and consider the sublevel set
Notethat, as shown in Figure 2.11, sublevel sets do not need to be
connected.Regardless, we can readily establish that Vis contained in

the interior of Br as follows. A point p not in the interior of Br has
llpll r and therefore satisfies V(p)
due to a's definition, and is
thereforenot in the set Vsince < a. Noticealso that any solution
startingin Vremains in Vfor all t - 0, which follows from (2.41)
since(x(t))
0 implies that V(x(t))
V(x(0))
for all t
0.
A set with this property is called an INVARIANT

SET,or sometimes a
POSITIVE
INVARIANT
SET,to indicate that the set is invariant for time
runningin the positivedirection. Next notice that Vcontains the origin
in its interior since LB> 0. Therefore we can choose (5> 0 such that the
ball Bis contained in V. Therefore, if we choose initial x e B, we
havefor all t 0
Ilxll
xeV
and Lyapunovstability is established.

Theorem 2.22 (Asymptotic stability). Let V( ) be a Lyapunov function
for the system
= f (x). Moreover, let V ( ) satisfy
v (x) < 0 forxeD\0

Thenthe origin is asymptotically stable.
(2.42)
Equations
152
is stable from the previous

origin
the
concludethat stability we need to show only that the origin
we
proof.
asymptotic continuous and vanishes only at zero,
prove
it is
to
so
V(.) is
t
00
as
Since
zero
to
goes
is attractive.establishthat
as in the proof of Lyapun0Vstability
sufficientto
choose
we
n.
is a
satisfyingllxll
Ofor all x(t),
g D. Since it is bounded below by zero. Therefore it COnVerges
and
tion of time,
converges to zero. Assume the contrary,that
it
that
Weneed to show
c > 0, and we establish a contradiction.
to some
V(x(t)) converges
{x I V(x) = c}. This level set does not
=
Vc
set
level
Considerthe
choose d > 0 such that maXllxllsdV(x) c
can
we
so
tain the origin, nonincreasingand approaches c as t
00,wehave
is
V(x(t))
Since
Bd for all t 0. Next define y as
that x(t) is outside
max V(x)
because V(x) is continuous due to (2.39)

Notethat y is well defined
f (x) are continuous. We knowy 0
and the fact that V(x)/x and
due to (2.42).Therefore
V(x(t)) =
+J
V(x(0)) yt
Theright-handside becomes negative for finite t for any x(0) e Bn,

whichcontradictsnonnegativity of V(), and we conclude c = 0 and
0, as t 00.
V(x(t)) 0, and hence x(t)
Underthe stronger assumption of Theorem 2.22, i.e., (2.42),estab-
lishingcontinuityof the solution c(t;x) in t for all t
0 andallx
in a level set Vcontained in Bn also implies that the level set Vis
connected.This followsbecause every point x e Vis then connected
to the origin by a continuous curve 4)(t; x) that remains in the positive

invariant set Vfor all t 0.
Nextwe consider a further strengthening of the properties of the

Lyapunovfunction to ensure exponential stability. We have the follow-
ing result.
Theorem 2.23 (Exponential stability).
function
for thesystem = f (x). Moreover, Let V ( ) be a Lyapunov

let V ( ) satisfy for all x e D
allxll V (x)
(2.43)
b llxll
(2.44)
v (x)
c llxll
for some a, b, c, >
0. Then the origin is
exponentially stable.
yapunov
L
Fu nctions
and Stability
153
arbitrary r > 0 and define function () by (r) =

consider an have that () is positive definite
We
and (0) = O.
proof,
r > 0 small enough so that V(r)g D. Such an r ex-
continuouS and V(O) = 0. We know that trajectories

. ce fl.) is
te
ding inequality on V(-) implies that llxll V(x)/b, which

gives
the bound on the time derivative of
scalar time function v (t) = V(x(t)) satisfies the ODE

the
that
Notice
v(t)
Translating this
_ (c/)v and therefore
v
to V( ) gives
for all t
statementback
the lower-bounding inequality for V( ) gives
-(c/b)t . Using the upper-bounding inequality

for all x e V and all t 0
gives
again
llxll e (c/(b))t
Osuch that the ball B is contained in Vas shown

Wecanchoose(5>
We then have that for all llxll

inFigure2.11.
< c llxll eAt for all t
inwhichc
(b/a)1/ > 0 and
of the origin is established.

stability
= c/(b) > 0, and exponential
2.53 Applicationto Linear Systems

function analysis of stability can of course be applied to linLyapunov
earsystems,but this is mainly for illustrative purposes. Wehave many
waysto analyze stability of linear systems because we have the analytical
solutionavailable. The true value of Lyapunov functions lies in
analysis
of nonlinear systems, for which we have few general purpose
alternatives.
To build up some expertise in using Lyapunovfunctions,
weconsideragain the linear continuous time differential
equation
dx
dt
= Ax
x(0) = xo
(2.45)
inwhichx e
and A e Rnxn. We have already discussed in Section22.2the stability
of this system and shown that x(t) = 0 is an
154
Equations
asymptoticallystable steady state if and only if Re(eig(A))
all eigenvalues of A have strictly negative real parts. Let's

see howwe'
construct a Lyapunov function for this system. Consider as
a candidate
in which S e [Rnxnis positive definite, denoted S > O.With
this
Ro, which is the first requirement, i.e., choice

V(0) 0
we have that V : Rit
and V(x) > 0 for x 0. Wewish to evaluate the evolution

of
with time as x evolves according to (2.45). Taking the time
derivative
of V gives
d
d
V(x(t)) = x Tsx
dt
dt
dt
+ XTs g-E
dt
= x TATsx + x TSAx
dt
= x T(ATs + SA)x
and the initial condition is V(0) = xo Sxo. One means to

ensurethat
V (x (t) ) is decreasing with time when x
0 is to enforce that the matrix

ATS + SA is negative definite. We choose some positive
definite
Q > 0 and attempt to find a positive definite S that satisfies matrix
A TS + SA = -Q
(2.46)
so that
V = xTQx
dt
Equation (2.46)is known as the matrix Lyapunovequation. It saysthat

given a Q > 0, if we can find a positive definite solution S > 0 of (2.46),
then V(x) = x TSx is a Lyapunov function for linear system (2.45),and
the steady-state solution x = 0 is asymptotically (in fact, exponentially)
stable. This requirement can be shown to be also necessary for the sys-
tem to be asymptotically (exponentially) stable, which we verify shortly.
We seem to have exactly characterized the stability of the linear system

(2.45)without any reference to the eigenvalues of matrix A. Of course,
since the condition on the eigenvalues as well as the conditionon the
matrix Lyapunovequation are both necessary and sufficientconditionS
for asymptotic stability, they must be equivalent. Indeed, we havethe
following result stating this equivalence.

2.5 Lyapunov
155
(Lyapunov function for linear systems).

The following
Theorem 2.24
equivalent
(Sontag, 1998, p. 231).
statementsare
stable, i.e., Re(eig(A)) < 0.
(a) A is asymptotically
e [Ritxn, there is a unique solution S of the matrix Lya(b) For each Q
punov equation
ATS +
= -Q
andif Q > 0 thenS > 0.

(c) Thereis someS > 0 such thatATS + SA < 0.
(d) There is some S > 0 such that V(x) = xTSx is a Lyapunov function
for the system = Ax.
Exercise2.62 asks you to establish the equivalenceof (a) and (b).
2.5.4 Discrete Time Systems
Nextwe consider discrete time systems modeled by
x(k + 1) = f(x(k))
x(0) = xo
To streamline
in whichthe sample time k is an integer k = 0, 1,2
the presentation we assume throughout that f ( ) is continuous on its
domainof definition. Steady states are now given by solutions to the
equationxs = f(xs), and we again assumewithout loss of generality
thatf (0) = 0 so that the origin is a steady state of the discrete time
model.Discrete time models arise when timeis discretized, as in digital
controlsystems for chemicalplants. But discrete time models also
arisewhen representing the behavior of an iterative algorithm, such as
the Newton-Raphsonmethod for solvingnonlinear algebraic equations
discussedin Chapter 1. In these cases, the integer k represents the
algorithmiteration number rather than time. Wecompress notation by
definingthe superscript + operator to denote the variable at the next
sample time (or iteration), giving
x + = f (x)
x(0) = xo
(2.47)
Noticethat this notation also emphasizes the similaritywith the continuoustime model = f (x) in (2.38).Weagain denote solutions to
(2.47)by
with k 0 that start at state x at k = 0. The discrete
timedefinitionsof stability, attractivity, and asymptotic stability of the
Originare then identical to their continuous time counterparts given in
156
Equations
Definitions 2.16, 2.17, and 2.18, respectively, with integer

ing real-valued time t 0. In discrete time, the definitio k
stability is modified slightly to the following.
Definition 2.25 (Exponential stability (discrete time)).

The origin
is
there exist c > 0, e (0, 1) for which
(5implies
that
K c llxllk for all k()

We see that Akwith
< 1 is the characteristic rate of
solutiondecay
Lyapunov functions. The main difference in constructing

Lyapunov
functions for discrete time systems compared to those
for
continuous
time systems is that we compare the value of V at two
ple times, i.e., V(x(k + 1)) V(x(k)). If this change successivesamis negative,then
we have the analogous behavior in discrete time that
we
negative in continuous time, i.e., V(x(k)) is decreasing have when is
along the solution x(k). We define the AV notation when evaluated
AV(x) =
- V(x) = V(x+) - V(x)
to denote the change in V starting at state x and

proceedingto successor state x + = f (x). Another significant change is
that we do not
require differentiability of the Lyapunov function V( )
in discrete time
since we do not require the chain rule to compute the time
derivative.
We do
require continuity of V( ) at the origin, however. For
consistency
with the earlier continuous time results, we assume here

that VG)is
continuous everywhere on its domain of definition.12The definition
of
the (continuous) Lyapunov function for discrete time is as follows.
Definition 2.26 (Lyapunov function (discrete time)). Consider a compact (closed and bounded) set D c
containing the origin in its interior and let V :
- R0 be continuous on D and satisfy
V (0)
0 and V (x) > 0 for x e D \ 0
AV(x) < O for x e D
(2.48)
(2.49)
Then V(-) is a Lyapunov function for the system x + = f (x).

12For those needing discontinuous V( ) for discrete time systems, see Rawlingsand
Mayne (2009, Appendix B)for the required extension.
2.5 Lyapunov
157
Noticethat AV(x) also is continuous on its
domain of definition
sinceboth V( ) and f ( ) are assumed continuous.
Theorem 2.27 (Lyapunov stability (discrete time)).
Let V(-) be a Lyapunov function for the system x + = f (x). Then the
origin is (Lyapunov)
stable.
Theorem 2.28 (Asymptotic stability (discrete time)). Let

V( ) be a Lyapunovfunction for the system x + = f (x). Moreover,let V (0 satisfy
AV(x) < O forxeD\O

Then the origin is asymptotically stable.
(2.50)
Theorem 2.29 (Exponential stability (discrete time)). Let V G) be a Lyapunovfunction for the system x + = f (x). Moreover,let V ( ) satisfy for
allx e D
a llxll V (x)
AV(x)
b llxII
(2.51)
c llxll
(2.52)
for some a, b, c, (T> 0. Then the origin is exponentiallystable.
The proofs of Theorems 2.27, 2.28, and 2.29 are essentially identicalto their continuous time counterparts, Theorems 2.21, 2.22, and
2.23,respectively, with integer k replacing real t and the difference AV
replacing the derivative V. An essential difference between continuous and discrete time cases is that the solution of the continuous time
is continuous in t, and the solution of the discrete time
model
model 4(k; x) has no continuity with index k since k takes on discrete
values.Notice that in the proofs of the continuous time results, we did
not follow the common practice of appealing to continuity of (t,x) in
t, so the supplied arguments are valid for both continuous and discrete
cases.
Linear systems.
The time-invariant discrete time linear model is

x + = Ax
x(0) = xo
and in analogy with the continuous time development, we try to find a

Lyapunovfunction of the form V(x) = xTSx for some positive definite
matrixS > 0. Computing the change in the Lyapunovfunction at state
x gives
AV(x) = V(x+) - V(x) =

= x T(ATsA -
- XTsx
158
Q>
Choosinga positive definite
0, if we can find S > 0 that
Equations
satisfies
AT SA-S= -Q
(2.53)
finding a V( ) with the desired

propemes.
xTQx
0
AV(X)
for
and
0
all
x. Equation
V(x) = x TSx
Lyapunov
matrix
equation.
discrete
(2.53)is known as the
Exercise2.63
of
version
Theorem
time
discrete
the
2.24,listing
asks you to state
the
the
discrete
of
solution
Lyapunovequation
connectionsbetween the
and the eigenvalues of A. These connections often come in handywhen
analyzing the stability of discrete linear systems.
then we have succeeded in
2.6 Asymptotic Analysis and Perturbation Methods

2.6.1 Introduction
Typicalmathematicalmodels have a number of explicitparameters.

Often we are interested in how the solution to a problem depends ona
certain parameter. Asymptotic analysis is the branch of appliedmathematics that deals with the construction of precise approximatesolu-
tions to problems in asymptotic cases, i.e., when a parameterof the
problem is large or small. In chemical engineering problems, smallparameters often arise as ratios of time or length scales. Importantlim-
iting cases arise for example in the limits of large or smallReynolds,

Pclet,or Damkhlernumbers. In many cases, an analyticalsolution
can be found, even if the problem is nonlinear. In others, the scaling
behavior of the solution (e.g.,the correct exponent for the power-law
dependenceof one quantity on another) can be found withouteven
solving the problem. In still others, the asymptotic analysisyieldsan
equation that must be solved numerically, but is much less complicated
than the original model. The goal here is to provide a backgroundon
the basic concepts and techniques of asymptotic analysis,beginning

with some notation and basic ideas about series approximations.
2.6.2 Series Approximations: Convergence, Asymptoticness,Uniformity

As this section deals extensively with how one function approximateS
another, we begin by introducing symbols that describe degrees of identification between different functions.
AsymptoticAnalysis and Perturbation Methods
159
a is equal to b
1, a is asymptotically equal to b (in some given/implied limit)

a is approximately equal to b (in any useful sense)
a b a is proportionalto b
to note that implies a limit process, while does
It is important
not. In this section we will be carefulto use the symbol'" in the
precisemanner defined here, though one must be aware that it often
meansdifferent things in different contexts (and different parts of this

which
book). Closely related to these symbols are ORDERSYMBOLS,
a qualitative description of the relationships between functions
in limitingcases. Consider a function f (E) whose behavior we wish to
(e). The
FUNCTION)
describerelative to another function (a GAUGE
order symbols "O", "o" and "ord" describe the relationships
f() =
as
if lim < 00
f (e) =
as 0
if lim
f(e) =
as 0 if f (E) =
but not
In the latter case, f is said to be strictly order (5.Often, authors write

"f() (5()"to mean "f (e) = O(())", though the latter only implies
equalityto within a multiplicative constant as 0, while as defined
here the former implies equality.
Asymptoticapproximations take the form of series, the most familiar of which is the truncated Taylor series approximation that forms
the basis of many engineering approximations. An infinite series
f (x) =
n=0
fn(x)
CONVERGES
at a particular value of x if and only if, for every e > 0,
there exists No such that
E fn(x) < E for

In contrast, an ASYMPTOTICSERIES
n=0
160
Equations
satisfies
= 0 for each M N
fM(c)
the last term

In words, the remainder is much smaller than
kept. This
of
asymptotic
usefulness
the
of
series. If this
property is the source
property is satisfied, we write
f (E) 's'
n=0
fn() as c 0
In general, we do not care whether the series converges if we let N
Often it does not. The important point is that the finitesumoftenthe

first term or twoprovidesa useful approximationto a functionfor
small E. This is in stark contrast to convergentinfinite series,which,
although they converge,often require a large number of termsto be
evaluated to obtain a reasonably accurate approximation.
We typicallyconstruct asymptotic series in this form
(2.54)
n
where
(50()> (51(E) > (52()>

(e) =
for small e. We also require that
as E --40. In practice, the s are not generally known a priori, but must be determinedas
part of the solution procedure to satisfy the requirement that the coefficients an be ord(l). This procedure is best illustrated by exampleas
we do in several instances below. In principle, we can construct a series approximation with N as large as we like, as long as the an remain
(e) < N(e) at the value of of interest. However,the
ord(l) and
most common application of asymptotic analysis is to the construction
of a one- or two-term approximation that captures the most important
behavior as
0.
As an example of the difference between convergent and asymptotic
series, we look at the error function erf (z), written here as
erf (x) = I
e t2dt
ByTaylor expandingthe integrand around the origin and integrating

term by term, a power series convergent for all x can be constructed
Analysis and Perturbation Methods

Asymptotic
2.6
for this
161
function
erf (x) =
(1)nx2n+1
O (211+ l)n!
COnvergent,this expression may require many terms for reaAlthough

to be obtained, especially when x is large. One could
sonableaccuracy and Taylor expanding
e w around w 0. This
w = 1/ x
try setting
immediate difficulty because
leads to
lim
e -1/w2
tvodw n
expansion is identically zero! This difficulty arises

for all n; his Taylor
decays to zero faster than any negative power of x as
because e -x2
00.
On the other hand, for x > 1, an asymptotic series for the function
maybe constructed by repeated integration by parts (a common trick
for the asymptotic approximation of integrals). This approximation is
erf (x) = 1
e t2dt
2
e x
e x
1
e
det2
2x
2
x 2
x 2
e -t2 dt
4X3
erf (x)
x2
dt
1-3 + 0(x -6 )
2x2 (2x2)2
If continuedindefinitely, this series would diverge. The truncated series, however, is useful. In particular, the "leading order" term 1 e
the expression that includes the first correction for finite but large x,
preciselyindicates the behavior of erf (x) for large values of x. Furthermore,the truncated series can be used to provide accurate numerical
valuesand is the basis of modern algorithms for doing so (Cody, 1969).
Nowconsider a function f of E and some other parameter or vari-
able,x
f (X,) Eann(e)
162
is asymptotic as 0 for each fixed x, then

approximation
Now consider the particular
If the
ASYMPTOTIC.
case
POINTWISE
say it is
asymptotic, but for fixed c, the second term blowsup

pointwise
is
This
it cannot remain much smaller than thefirst
obviously,
So
0.
-4
as x
requirement for asymptoticness. The approximanon
term, which is our
VALID.Put another way
is not UNIFORMLY
lim lim e/v/ lim lim e/v'
u (x, c) CONVERGESUNIFORMLY
to u (x, 0) on
To be precise, a function
a
is
D > 0 such that
x e [0, a], if, given E > 0, there
the interval
lu(x, E) ou(x,
Nonuniformity
< E, for E < D and allx e [0,a]
is a feature of many
practical singular perturbation

prob-
lems. A major challengeof asymptotic analysis is the constructionof

VALIDapproximations. We shall see a number of techUNIFORMLY
niques for doing this. They all have a general structure that lookslike
this
2.6B Scaling,and Regular and Singular Perturbations

Beforeproceeding to discuss perturbation methods for differential equations, we introduce some important concepts in the context of algebraic
equations.First, consider the quadratic equation
x 2 + exI = 0, e < 1
(2.55)
If = 0,x = 1. Wewould like to characterize how these solutionsare

perturbedwhen 0 < e < 1. To to so, we posit a solution of theform
(2.54)
x(e) = oxo + l (E)XI +
+ o(2)
(2.56)
whereXi = ord(l) (independent of e) and the functional forms of (E)

and
(e) remain
to be determined.
Substituting
neglectingthe small o(2) terms yields

(E)XI
into the quadratic and
(E)XI +2
= 0 (2.57)
2.6AsymptoticAnalysis and Perturbation

Methods
163
0, the solution is x = xo =
1. Sowe let (50
consider
the
root
1 and for the
xo
= 1. Now (2.57)
moment
At
becomes
= o (2.58)
observe that all but the first two terms are
0() or o(l). Neglecting
these,we would find that
(2.59)
SinceXl is independent of c, we set (51 e, in which

case Xl 1
becomes
Now (2.58)
+ 2(52x2+
=0
(2.60)
Now,since (52 o(2), we neglect the term containingit to
get
4 + 22X2= O
for which there is a solution if (52(E) = 2 and x2 =

constructedan asymptotic approximation
(2.61)
Thus we have
(2.62)
Observe that to determine (5| (e) and (52(E), we have found a DOMINANT
a self-consistent choice of k(e), where it is comparablein

BALANCE:
size to the largest term not containinga (5kand where all the terms
containingkS and es are smaller as e 0.
To find how the second root xo = 1depends on e, we use the
lessonslearned in the previous paragraphto streamlinethe solution
process. That analysis suggests that (5k(e) = Ek so we seek a solution
X = I + EXI + E2X2 + O (E3)
whichupon substitution into (2.55)yields

E
26X 1 + E 2Xl + E2X12
2E2X2 + O (E3) = O
Sinceby assumption the Xk are independent of E,this expression can

onlyhold in general if it holds power by power in e. Wehave already
zeroed out the co term by setting xo = 1.The El and 2 terms yield
1.
-2X1 -1=0
2.
2X2 Xl + Xl = O

164
these equations: the equationfor

between
= _1
coupling
to these are
solutions
one-way
The
Thereis
with 1 < k.
on
depends
root is
so the second
(2.62) and
solutions
0 both
(2.63)
(2.63) reduce to the solutions
REGULAR
PERTURBAT10N
this are called
as
such
when E 0. Cases
when c O is qualitatively
interesting
problems.
more
much
SINGULAR
PERTURBAThe situation is
like this are called
Cases
1.
differentfrom
Consider the equation
problems.
TION
In the limit c
(2.64)
1, while for any

exact solution x
unique
the
has
this
When = 0,
problem is singular because the
This
solutions.
two
0, it has
the highest power in the equation - when
multiplies
parameter
small
polynomial becomes lower degree so it has
the parameteris zero the this problem, we define a scaled variable
one fewerroot. To analyze
X = x/ where (5= (50and
X xo +
+ -X2= ord(l)
Thus measures the size of x as
0. Substitution into (2.64)yields
2x 2 + X-1
=O
Nowwe examine the possibility of finding a dominant balance between
differentterms with various guesses for (5. If we let = 1, then the

secondand third of these terms balance as E 0 while the first is
small. This scalinggives the root x = 1 + O(E). If we let (5 = 0(1)
then the first and second terms are small, while the third term is still
ord(l). Thereis no balance of terms for this scaling. On the other hand
if we let (5=
then we can get the first and second terms to balance.
Applying this scaling yields
whichclearlyhas the solution X = 1 + O(E)

or x -1 + 0(1). As
--0 the root goes off to infinity.
Although the first term in (2.64)
containsthe smallparameter, it can
multiply a large number so that
2.6 Asymptotic Analysis and
Perturbation Methods
165
the term overall is not small. This

characteristic is typical of
singular
subtle
more
A
singular perturbation
problem is
(1 e)x 2 2x
+1=0
(2.65)
When = 0 this has double root x = 1.

solution whereas when c > 0 there are When < 0 there are no real
two. Clearly(50= 0, xo = 1 so
solution
a
seek
we
X = 1 + (51X 1 +
(52X2 + O ( (52)
Substitution into (2.65) gives

+ 212XIX2 + (53X3
(5?iX12
2e1X1
22X2 + ...
Since1 > (51> (52,we can conclude that (5?1x12

and c are the largest
(dominant) terms. These balance if (52= 0(). Thus we set (51
which implies that Xl = 1. So the solutions are
x =1+
1/ 2 +0(2)
(2.66)
As an exercise, find that (52= and that the solutions to (2.65)can be

written as an asymptotic series in powers of 61/2
2.6.4 Regular Perturbation Analysis of an ODE

One attractive feature of perturbation methods is their capacity to provide analytical, albeit approximate solutions to complexproblems. For
regular perturbation problems, the approach is rather straightforward.
As an illustration we consider the problem of second-orderreaction

occurring in a spherical catalyst pellet, which we can model at steady
state by
1 d 2dc
2
(2.67)
r 2dr
dr
with c 1 at r = 1 and c bounded at the origin. If D,R, k, and CB

are the diffusivity, particle radius, rate constant, and dimensional sur-
2/D is the DAMKHLER

face concentration respectively, then Da kCBR
NUMBER.
The problem is nonlinear, so a simple analytical solution is
unavailable. An approximate solution for Da < 1 can be constructed,
however,using a regular perturbation approach.
166
co + ecl + dc 2
of the form c(r)
+
solution
a
seek
like
powers
yields
equating
Let c = Da. We
into (2.67) and
Substituting
O (3).
1 d 2dco
r 2dr dr
1 d 2dC1
r 2dr dr
1 d 2dC2r 2dr dr
-1
co(l)
co, Cl(l) = O
2c1c(), c2(1) = O
order has the same operator but difeach

at
solution
the
Observethat
solution at lower order. This Structureis
ferent "forcing"from the
problems. The solution at 0 is trivial:
perturbation
regular
of
typical
we have
co = 1 for all r. At 1,
1 d 2dC1
r2dr dr
(r 2 1) /6. The solution to the

The solution to this equation is Although this problem is nonlinear,
2problem is left to Exercise2.66.
a simple approximate closedthe regular perturbation method provides
form solution.
2.6.5 MatchedAsymptoticExpansions
The regular perturbation approach above provided an approximatesolution for Da < 1. We can also pursue a perturbation solution in the
opposite limit, Da > 1. Now letting E = Da-l we have
I d 2dc
r2dr dr
(2.68)
If we naively seek a regular perturbation solution c = co + ECI+ O (62),
the leading-order equation will be
This has solution co = 0, which satisfies the boundedness condition

at r = 0 and makes physical sense for the interior of the domainbecause when Da > l, reaction is fast compared to diffusion so we expect the concentrationin the particle to be very small. On the other
hand, this solution cannot be complete, as it cannot satisfy the boundary conditionc = I at r = 1. The inability of the solution to satisfy
the boundary condition arises from the fact that the small parameter
and Perturbation Methods

AsymptoticAnalysis
167
the highest derivativein the equation. It is thus absent

c
leading-order problem, so the arbitrary constants required to
the
from
b oundary conditions are not available.
the
satisfy resolution to this issue lies in proper scaling. Although is
The
multiplies a second derivative. If the gradient of the solution
it
small,
some region, then the product of the small parameter and
in
large
is
may result in a term that is not small. In the present case,
multiplies
largegradient
physical intuition to guess where the gradients are large. At
use
can
we
reaction occurs rapidly, so we expect the concentration to
highDa the
most of the catalyst particle. Near r = 1, however, reactant
be smallin
1 the
from the surroundings and indeed right at r
is diffusingin must be unity. Thus we define a new spatial variable

concentration
determined.
(1 r) /(c) where is a length scale that is yet to be
variable to (2.68)yields
Applyingthis change of
2
4 2 (1 r7)2 dn
Thefirst term contains 6<-2. If we take
(2.69)
= 61/2then this term is ord(l)
0 and can balance the term c2 to yield a nontrivial solution.

as
Thisscalingimplies that near r = 1 the steepness of the concentration
-1 / 2 . Proceeding with this scaling, (2.69)
gradientscales as
becomes
(1 E l /2 17)2 dc
(2.70)
=
Nowwe seek a perturbation solution of this rescaled problem: c (n)
co + 1/ 2c1 + O (e). The choice of 1/ 2 comes from the observation
the Taylor expansion of (1

the leading-order problem
El /217) = 1
d2 co
dn2
that
261/97 + O(e). This gives
(2.71)
co
Althoughthis equation is nonlinear, it has a special form that facilitates

solution.13Let w = c' where ' denotes d/dn. Nowwe can write
w' = co
with
13
co = w
13
If we had considered first-order kinetics instead, the solution would be simple.
168
As
has the special property that

system
this
constructed,
dil
dn
fl , H
co
co = w
H
cotv
13 = K, where K
{c
'2
=
H
of
is a constant
Therefore, curves
large, i.e., at positions much
, are
larger
solutions. As becomes
than
expect
the concentration
interface, we
distance1/2 from the
andits
take K = 0, so
gradient to go to zero, we
2 3/2
chosen so that co decays with increasing
The negativesign must be
This equationcan be integrated and the boundary conditionco(rl
0) = 1 applied to yield
-2
co(n) =
In terms of the originalvariables this becomes
co(r) =
-2
66
(2.72)
This decays to zero once 1 r is larger than 0(1/2). Thus thecon.

LAYERwith thicknessof
centration changes rapidly in a BOUNDARY
0( 1/2) that is located near the catalyst particle surface. Outsidethis
thin boundarylayer, in the interior of the particle, the concentration
is
very small, going to zero as - 0. One can carry this analysis to higher
order terms. For example,
the first effects of the particle shape on the
result appearat 0(1/2)but it should be clear that the primarystructure of the solution behavior has been captured by this leading-order
solution.
This example is a simple instance of a singular perturbation method.
The solution co 0 that we obtained before rescaling is calledthe

OUTER
SOLUTION.
It is valid away from the boundary r = 1. ThesoSOLUlution(2.72)that we obtained after rescaling is called the INNER
TION.In this simple example the inner solution decays to zero,automaticallymatching the outer solution. In general, the outer solutionis
not simply a constant, and a MATCHINGCONDITIONmust be imposed
to properlyconnect the two solutions to one another. This processis
the origin of the term MATCHED

EXPANSIONS.
ASYMPTOTIC
2, 6
Asymptotic
Analy sis and Perturbation Methods
169
be accomplishedwith a number of different procecan

Matching
here a simple approach that works for many probdescribe
We sophisticated and general approaches are described in
More
solution
In the simple approach, we denote the outer
(1991).
as up (x) and the inner solution as Up(s9 where x

to
up
terms
outer and inner variables, respectively. The simple
x/c are the
and
procedure j ust requires that at each order n = 0, 1, ... , N
matching
(2.73)
lim un(x) lim Un)
Hinch
inner limit of the outer solution equals the outer limit of

the
words,
In
solution.
theinnerinner variable as n, this expression is satisfied trivially. In
andthe
the inner nor the outer solution is valid throughout the
neither
general,
but the matching procedure provides a means to conentiredomain, valid solution. This so-called COMPOSITE
SOLUTION
uniformly
a
struct
isgivenby
(2.74)
unc(x) = un(x) + Un(9 - lim Un()
double counting of the overlapping parts of the

Thelast term avoids
are illustrated in the following example.
twosolutions.These ideas
2.30: Matched asymptotic expansion analysis of the reacExample

tionequilibriumassumption
Considerthe following reactions
B,
k2
1<-1are much larger than the rate constant

inwhichrate constants 1<1,
IQ,so the first reaction equilibrates quickly. In a batch system where
CB,and cc are the concentrations, the governing equations are
CA,
dCA
dt
dCB
dt
klcA + k-1CB
= klCA
k-1CB k 2CB
dcc = k2CB
dt
Thereactionequilibrium assumption takes CAand CBto be in equilibriumso that CB= KCAwhere K = kl/k-l. Further assume that 1<-1
is the largest rate constant. Initial concentrations in the reactor are
CA(O)
=
= CC(0)= 0.
-141 AL
170
nondimensionalization so that a systematic

proper
(a) Find a
can be performed.
bation expansion
asymptotic expansions to show that the reaction

matched
Use
(b)
corresponds to the leading-order
equilibrium approximation
equations. Also find the equations outer
kinetic
the
of
forthe
solution
solution.
outer
the
O(el ) terms in
solution for the dynamics
leading-order inner
the
Find
(c)
on thefast
the inner and outer solutions, andfind

time scale l/k-l, match uniformly valid for all
time.
composite solution that is
Solution
(a) Letu =
(0), w = CC/CA(0), so u(0) =

v CB/CA
w (0) = 0. Define a scaled "slow" time variable ts = k2t so that
an 0(1) changein ti corresponds to a time interval of O(l/k2),

and define the small parameter e = k2/k-1. In these variables,
the rate equations are
du
dts
dts
dw
dts
Sincecc is determined completely by CBwe do not includeits

evolutionin the following development.
(b) Multiplyingthe dimensionless equations by yields
du
dts
= Ku+ v
dv
dts
Assuminga power series form, the outer solution is obtainedby

letting
u(ts) = uo(ts) +
v(ts) = vo(ts) +
(ts) + O(E2)
(ts) + O(e2)
2.6 Asymptotic Analysis
and Perturbation
Methods
171
substituting and
considering only
the terms of o
yields
Kuo = vo
for both of these
equations.
sumption in dimensionless This is the reaction equilibrium
asform. Although
observe that this
physicallyreasonable,
assumption is not
consistent with the initial con1, v (0) = 0.
Similarly,because the time
are multiplied by c,
derivatives
they do not appear
in
the leading-orderouter
problem, so we do not
have differentialequations
tions include the arbitrary
whosesoluconstants
that are determinedby the
initial conditions. Keeping
this issue in mind, we collect O(el)
terms to yield
duo
Ku 1 + VI
dts
dvo
dts
Although this equation is valid, it is not yet useful becausewe do

not know the values of of uo and vo. To obtain these we consider
the inner solution.
(c) The problem with the outer solution can be traced to the loss of
the time-derivative terms. Recognizingthat the derivativescan
be large at short times because kl and 1<-1and muchlarger than
1<2,we define a new fast time scale tf Uk-I = ts/e. Nowtf

changes an O(1) amount in a dimensionaltime of aboutIlk-I.
Rewriting the equations with this new time scaling yields
du
dtf
= Ku + v
dv
dtf
= Ku V
Nowwe seek an inner solution

Uo(tf) + EU1(tf) + 0( 2)
O(e2)
v(tf) = Vo(tf) + EVI(tf) +
u(tf)
(0) terms yields

O
the
extracting
Substituting and
dUo = KUO
+ Vo
dtf
dtf
pair of
= 0. This coupled
transforms or
Laplace
by
solved, for example,
be
could
equations
Uo(0)
172
so tJ()+ Vo = 1 to just solve for

dtJo - -KUO+ 1 -
dtf
which has solution
-(1+K)tf +
-(1+K)tf
Usingthis, we obtain
-(1+K)tf
By analogy with the reaction-diffusion example above, this inner
solution corresponds to a boundary layer in time, rather than
space.
With inner and outer solutions in hand, we can use (2.73)to match
them. The "outer limit" of the inner solution is
lim
lim Vo
whichsatisfies the equilibriumassumption Ku = v. Theinner

limit of the outer solution is simply uo(0), vo(0) and usingthe
previous result yields
CIO
(0) =
vo(0) =
Nowwe have initial conditions for the outer solution. Addingthe

two differential equations at 0( 1) and differentiating the algebraic equation (reaction equilibrium result) at O (0) give
duo
dts
dvo
= vo
dts
duo
dts
dvo
dts
Solvingthese two equations for the two time derivativesgives

duo
dts
uo
dvo
dts
vo
Solvingthese with their respective initial (matching) conditions

uo (0) = 1, vo(0) = 0 gives the full leading-order outer solution
Llo =
vo =

AsymptoticAnalysis
173
0.9
0.8
0.7
uoc
0.6
CA(t)
CA0
0.5
0.4
uo
0.3
0.2
0.1
0.5
1.5
Figure 2.12: Leading-order inner Uo, outer uo, and composite solutions uoc, for Example 2.30 with E = 0.2,K = 1, and
or, reverting to dimensional form

CA (t)
e 1+K k2t
CB(t)
e 1+K k2t
Thisis precisely the solution that would be obtained via uncritical

application of the reaction equilibrium approximation. Now we
see this approximation in more precise terms.
Finally,we construct a uniformly valid composite solution via

(2.74).To leading order in dimensional variables
CA (t)
CB(t)
e 1+K k2t+
-(1+K)k2t/E
e 1+K k2t
Figure2.12 shows the leading-order inner, outer and composite

solutions for u(t) =
for E = 0.2, K = 1 and = 1.
Ordinary
174
Differential
2.6.6 Methodof Multiple Scales

The method of matched asymptotic expansions deals
in which different time or length scales dominate in
concurrentlyon disparate scales, a situation that

approach include dymamicalsystems with multiple naturai
or decay times14,nonlinear systems with widely separated
timescales
and problems of propagation (wavelike or diffusive) in
inhomogeneous
As an introduction to this approach, we consider a weakly

damped
X+
+ (k)2x = 0, X(0) =
On physical grounds, we expect two time scales to act simultaneously

in this problem: harmonic oscillation, with natural period 2Tr/Q)
(assumed to be ord(l)), and the exponential decay, with time scaleof
ord(e). If we proceed naively, looking for a regular perturbationsolu(t) + O(E2), we find that
tion x(t) = xo(t) +
+ (02xo = 0,
xo(0) =
+ Xl XO, XI(O) =
=o
=O
coswt + E (21wsin cot
tcos wt
with solution
x(t)
The equation at O (E) has the same differential operator as doesthe

= coswt that
zeroth order problem, but has a resonant forcing term
leads to the t cos cot SECULAR
term in the solution. When t = ord(l/),
this term destroys the asymptoticness of the expansion;the approximation is not uniformly valid, failing at large times. The methodof
multiple scales avoids this nonuniformity by explicitly recognizingthe

existence of two time scales in the problem, by letting to
and looking for a solution of the form x(to, tl,E). Now
dx
dt
x
tl
andMook
14Forextensive application of the method in this context, see: Nayfeh
(1979).
175
Analysis and Perturbation Methods

Asymptotic
2.6
look for
and we
a solution of the form
xo(to,tl) +
x(to,
(to,tl)
/to and DI = /tl, the leading-orderequation beDefiningDo = differential equation

comesa partial
Doxo+
=O
solution xo A(tl) cos (Oto,where A(O) = 1, but is as yet

Thishas the
At the next order, we have
otherwiseundetermined.
+ (02x1=
sin
sin
DoxI(O) = o
Again,a resonant forcing term is present on the right-hand side. Unand

less this is zero, a secular term again shows up in the equation
the approximationwill not be asymptotic. However,we now have the
possibilityof eliminating this term. Notice that if
dtl
the resonant term vanishes. This equation is called the SOLVABILITY

or secularity condition or integrability condition. It is an
coNDITION
amplitudeequation, determining the evolution of the amplitude of the
solutionover the slow time scale tl. From the leading-order result, we
At leading order, the solution
havethat A(O) = 1, so A =
is therefore
xo = e
coswt
Thisis the type of solution we expect intuitively: a very slowly decaying

harmonicoscillation. A couple final comments on this example: the
solutionXl is identically zero, but another resonance term shows up

in the equation for x2. This nonuniformitydoes not show up until
t = ord(l /E2), by which time the amplitude has nearly decayed to zero,
but if desired, it could be eliminated by including a "superslow" scale
t2 2t. This time scale arises because the damping causes a very
small(O(2)) change in the frequency of oscillation.
This simple example illustrates the procedure and resulting structure. The recurring theme is the existence of a secularity condition,
whosesatisfaction requires the solution of an amplitude equation. This
amplitudeequation determines the evolution of the system at its largest
176
scale. If the underlying problem is linear, so is the amplitudeequation.

a nonlinear equation leads to a nonlinear amplitude equation.Th
e fol
lowing example illustrates this.
Example 2.31: Oscillatory dynamics of a nonlinear System

From Section 2.22, we have a complete understanding of the linear
system = Ax. When A has complex conjugate eigenvalues
i(0, the origin is a stable or unstable spiral depending on the signof
101, the growth or decay of solutions occurson
(T. When la I
a
time scale much longer than the period of oscillation. In this situation
the method of multiple scales can be used to show very generallythe
dynamics of the nonlinear system = Ax +N(x), where N(x) contains
no linear part. In this example we apply the method of multiplescales
to the system of equations
dx
dt
(0
where (T= ELI,with < 1, while u and (0 are ord(l). The steadystate
x = 0 of this system is very weakly stable or unstable, dependingon
the sign of u. Since the problem is nonlinear, finding the proper scaling
of x is an important part of the solution procedure. The oscillatory
nature of the linearized equation suggests that a solution canbe found
in terms of amplitude llxll and phase
Solution
the
Althoughwe consider here a specificform for the nonlinearity,
multiple-scales solution will lead to equations for r and whosegen-
eral structure is both extremelysimple and extremelygeneral.Thetime

scaling of this problem is similar to that of the linear exampleabove,
= ct. The
so we consider a multiple-scales expansion with to
ti rescale to reflects the time scale of the oscillation, while the scale To
solution.
flects the scale for growth or decay of the amplitude of the
x = X,
determine the proper scaling of the solution amplitude, we let
where X = ord(l) as e 0. Nowthe equation becomes
Dox + EDlX =
u ox
Analysis
Asymptotic
2.6
177
and N3(X,X, X) are the quadratic and cubic terms writconvenient for perturbation expansions. For general vecten in a form
the nonlinear terms
-- [111, 112] , v = [VI,V2] , w =
tors u
problem are
this
for
(X, X)
where N2
N2(U,V)
nonlinearity can be written as a sum of terms with this

Anypolynomial
structure.
If we tentatively let (5= e and X = Xo + EXI + O (2) then the proband O(cl) become, respectively,
lemsat 0(0)
Doxo-
xo=o
and
xo - DI xo + N2(X0)
DOXI
Thesolution at 0(0) is
xo(to, h) = r(tl)
cos(wto
sin(wto
does not lead to
Turningto the O(e) equation,the term
resonancebecause it is quadratic and thus contains no terms with frequencyco.The solvabilitycondition for this choice of scaling is thus
xo
Thisequationis linear, leading to exponential growth on the time scale
ti whenu > 0 and eventuallyviolatingthe scaling assumption x
o (E).Thus the choice = is not self consistent.

Weneed a different guess for . The term N2 does not lead to resonanceat leading order, but the term N3 can. For example sin3 (Oto =
(3sinwto sin 3wto) /4. A balance between the linear term, which is
O(e) and this cubic term, which is O(3) would imply that = 61/2
Thus we seek a solution of the form x = El /2(Xo + El /2Xl + EX2 +
Withthis scaling the O(0) problem and its solution remain
the same as above, while the o (1/2)equation becomes
LXI =
178
Equations
where
(Xo,Xo)
As noted above, M
solution can be found
contains no resonant terms. A Particular

(cos 20 + sin 20)
- (cos20 + sin 20))
Xl (to, tl)
Observe that Xl has frequency 2(0.

where O= (Oto((tl).
havenot
amplitude and phase, r(tl) and
The leading-order
at
O
equation
the
which
to
turn
deter_
we
yet been determined, so
mines X2
LX2=
XO
right-hand side as R. Resonance will occur

For brevity, we denote the
(Oto, cos (Oto]T or [cos (Oto,sin wto]T
if it has terms of the form [sin
solvability conditions,
whichin general it does. To obtain the

terms
require that R be orthogonal to these
27T/W
sin (0 to
cos (Oto
2TT/(O
dto
wethus
cos (Oto
sin coto
dto = 0
Omitting the detailed calculation, which involves elementary but extensive trigonometric manipulations, we find that
dtl
dtl
= ur + ar
= br2
where for the nonlinearity given here
240
This is a remarkably general result. These simple differential equa-
tions govern the leading-orderbehavior for small and their form

is completely insensitive to the nature of the nonlinearitythe entire
structure of N2 and N3 is distilled into the constants a and b. Furthermore, even for a more general nonlinearity containing higher powers'
2.7
of Nonlinear Initial-ValueProblems
QualitativeDynamics
179
and cubic terms contribute. For example, a quartic

quadratic
onlythe
contribute to R.
not
does
thus
and
0(2)
until
appear
not
does
generality, it is known as the normal or canonical form
its
of
Because
class of nonlinear problemsl
this
for
the oscillation amplitude r is the most important.
Theequationfor
u/a. Including the
solutions r = 0 and r
steady-state
It has = 1/2 the latter solution becomes Ilxll = uc/a. Therefore
scaling
exist if uc/a < 0. We return to this example
solutions
nontrivial
real
in a more general context in Section 2.7.5.
andrelated ones
Initial-Value ProbNonlinear
of
Dynamics
Qualitative
2.7
lems
21.1 Introduction
equations can be extremely comThedynamicsof nonlinear differential
in
issues that arise

plex.In this section we introduce a number of the
address include
thesesystems. Questions that we
Howdo nonlinear systems differ from linear ones?
in
Whatgeneral qualitative (geometrical)structure can be found
nonlinear systems?
Whatkinds of steady-state and time-dependent behaviors are typical?
Howdo solutions change as parameters change?

2.7.2 Invariant Subspaces and Manifolds
Webegin with an introduction to the geometry of differential equations,
bydescribinginvariant manifolds,regions of phase space in which solutionsto an equation remain for all time. We shall see that these regions
organizethe dynamics of initial-value problems. For linear constant-
coefficientsystems,thinkingof the solution to
= Ax in terms of the
eigenvectorsleads toward a geometric view of solutions to differential

equations.An important point to notice is this: a point lying on a line
15Guckenheimerand Holmes(1983)give a general formula for construction of the
normalform,includingexplicitformulas for a and b, derived using a rigorous and
elegantmethod of nonlinear coordinate transformations.
180
Equations
eigenvectors Vi of A never leaves this line

defined by one of the
lines are invariant under the solution
never has left. These
operator
x(O) = ct'i, then
if
that
2.22
Section
eAt.Recallfrom
x(t) = eAtCVi= cettVi Vt
by Vi an invariant subspace of the
Thus we call the line defined
phase
of
subspaces
a
invariant
linear
important
systemare
space. The most
the
(possibly
be
uns
,
generalized) eigendefined as follows. Let 111,...
real parts; VI, ... , vnu be
vectors whose eigenvalues have negative
those
parts and WI, ... , wnc those
whose eigenvalues have positive real
whose
eigenvalueshave zero real parts. Now we can define three invariant

subspaces
Eli = span{vl, ... vnu}

EC = span{wl, wnc}
stable subspace
unstable subspace
center subspace
An initial condition in ES will remain in ES and eventually decayto

zero, one in Eu will remain in Eli and grow exponentiallywith time,
and one in ECwill remain in EC,staying the same magnitude or growing
with at most a polynomial time dependence. Figure 2.13 showssome

examplesof these invariant subspaces in linear systems. In general,
a systemwith eigenvalueswith zero real parts is not robust: a small
changein the system, e.g., the parameters, moves the eigenvaluesoff
the imaginary axis and the invariant subspace ECvanishes. A system
like this, for which an arbitrarily small change in the system changes
UNSTABLE.
In
the qualitativebehavior,is said to be STRUCTURALLY
contrast, if all the eigenvalues have nonzero real parts, no qualitative
change occurs if the system is changed slightly. Such a system is said
STABLE.
to be STRUCTURALLY
Similarly,if a system linearizedaround
a steady state has no eigenvalues with zero real parts, the steady state
is said to be HYPERBOLIC.
Otherwise it is nonhyperbolic.
So what happens when we allow nonlinearity to creep in? Considera
nonlinear system in the vicinity of a steady state xs. Letting z = xxs,
we can write the system
or
zj z=0
= Jz
f (x) as
1
2fi
zjZk + O(llz11 3)
2 zjZk z=0
+ O(llz11 3)
2.7 Qualitative Dynamics of Nonlinear Initial-Value Problems
u
(a)
181
(2)
(b)
Figure 2.13: Examples of invariant subspaces for linear systems:

1,2 = O; (b) 1,2= l i,3 l.
whereJij fi/XJ z=0is the Jacobian of f(xs) and N2(z, z) contains

allterms that are quadratic in z. SinceJ z = O(z) andN2 (z, z) = O(z2),
the leading-orderbehavior for small z is determined by the linearized
system,as long as all the eigenvalues of J have nonzero real parts, i.e.,
the steady state xs is hyperbolic. The rigorous and general statement
of this fact is called the HARTMAN-GROBMAN
THEOREM
(Guckenheimer
and Holmes,1983). If there is an eigenvaluewith zero real part, then
the linearized problem gives that
llz112 0
dt
for some values of z, in which case the quadratic term N2(z, z) appears
at leading order.
Restrictingourselves to the usual situation, when xs is hyperbolic,

we now generalize the ideas of the stable and unstable subspaces to
the nonlinear case. We define the STABLE

16
ANDUNSTABLE
MANIFOLDS
of xs as follows:
The stable manifold w s (xs) is the set of points that tend to xs as

+00.
Theunstable manifold wu (xs) is the set of points that tend to xs

as t 00.
16A manifold
for our purposes is simply a curve or surface. Weuse the term because
ws and Wu are not generally linear subspaces of IRn,while ES and Eu are.
v/
182
Equations
(b)
(a)
Figure 2.14: Invariant subspaces of the linearized system (a) and invariant manifolds of the nonlinear system (b).
These have the same dimensions
ns and nu as the subspaces

ES and
Eu of the linearized system, and are tangent to them at x = xs. The

relationship between them is shown in Figure 2.14. Convinceyourself
that the definitionsof ws and wu are equivalent to those givenabove
for ES and Eli for a hyperbolic linear system.
For many interesting situations, a steady state of interest is stable;
there is no unstable manifold. Recall,however, that in the linearcase
each individual eigendirection defines an invariant manifold so EScon-
tains within it further invariant subspaces. This fact givesus a tool

for understanding the approach to a steady state and possibly for constructing simplifiedmodels of the dynamics near a steady state. As
an example, consider the following pair of differential equations,with
Let [xl,X2]
(0,0) be a stable steady state. Furthermore, assume
that we have written the equation in coordinates where
-1
Thus the eigenvalues are e-l and 1,with corresponding eigenvectors uf = [l, OJTand us = [0, IJT. The "s" and "f" stand for "slow"and
"fast"respectively, because the dynamics in the us direction occuron
2.7 Qualitative Dynamics of
Nonlinear
Initial-ValueProblems
183
time scale, while those

an
in
time scale. We can thus define a the u f directionoccur on an O(c)
"slow" subspace
space ESf, with nonlinear extensions
ESSand a fast suband
(sufficiently close to the origin)
wsf. Initial conditions
approach
Wssin a time of O(e) so
ter this transient all the dynamics
afare
leading order in e, w ss is defined by along the slowmanifold wss. To
the
the result we would get by just setting equation fl (Xl, x2) = 0. This is
to zero above and can be
as an outer solution in a matched
found
asymptotic
expansions analysis.
Close to
the origin, this equation

can be rewritten Xl = h(X2),so
we can reduce the pair of equations to a
single equation
Whatwe have just done is a form of the
quasi-steady-state approximation used in all areas of chemical engineering
analysis. It illustrates a
very important and general property of initial-value
problems: beyond
initial transients, solutions often evolve on a subspace
or manifold that
has many fewer dimensions than the entire phase space.
This
both conceptually and computationallyimportant. It means fact is
that the
behaviorof large systems can often be understood by only considering a few dimensions, and also that computations might be performed
with many fewer degrees of freedom than formally required.
2.7.3 Some Special Nonlinear Systems

Gradient Systems
Imagine a small particle suspended in a viscous fluid, and moving under
the influence of a force that can be written as the gradient of a scalar

potential function U(x). This situation is describedby
(2.75)
In general, systems of this form are called gradient systems. Recall

that the vector VU is always normal to surfaces of constant U so trajectoriesof this type of system are alwaysmoving"downhill"on the
"energylandscape" defined by U. In other words, the potential U is a
Lyapunovfunction for (2.75). The only steady states of gradient systems are sources, saddle points, and sinks (can you show this?). A
two-dimensional example is shown in Figure 2.15. Some more insight
into the behavior of a gradient system is gained by asking how the "potential energy" of a trajectory evolves with time. The rate of change of
184
Equations
-1
-2
1
-1
Figure 2.15: Contours of an energy function v (Xl,X2) or

Black arrows denote directions of motion on the energy
surface for a gradient system, while gray ones denote
= 0.2x2 motion for a Hamiltonian system;
+ 10X?2.
U on a trajectory is
dt
U dXi
Xi dt
Xi Xi
= - llVU112 go
with equality only at steady states, where VU = 0. So whateverthe

trajectory of the vector equation for X, it satisfies this scalarequation
showing that the rate of change of potential energy is the square of the
a
gradient of the potential. Trajectories roll downhill until they reach
minimum in U. In a high-dimensional problem, the potential surface
there
can be very complex, with many minima, and saddle points where
are many "downhill" directions for the system to choose from.
2,7 Qualitative Dynamics of Nonlinear
Initial-Value
Problems
185
Hamiltonian Systems
consider again the landscape of
Figure 2.15.
rather than V. Now imagine a
dynamical call the energy
H,
but
to
normal
are
system
not
function
tangent to
them. so where trajectories
we modify
are
the gradient
X2
Bythe same exercise we performed
X1
(2.76)
above for U
along trajectories,
We
dH(x, t)
dt
the energy function H is conserved

on trajectories
The above situation is a special
that follow(2.76).
case of a very
classof equations. Consider a system
general and important
of
particles,
eachparticle, there is a number of
e.g.,molecules.
For
coordinates
describesthe position of the particle,
three)
that
an associated momentum. we denoteand for each coordinatethere is
the full set of
and the momenta as p. The total
coordinatesas q
(kinetic plus potential)
the system is a function of the
positions and momenta energyH of
the HAMILTONIAN.
and is called
In the absence of friction
(always
true at the
level),the sum of kinetic and potential
energyis conserved,so atomic
dil
dt
H dpi
Pi dt
Ingeneral,this holds only if
H dqi
qi dt
Pi
qi
These equations are called
HAMILTON'SEQUATIONS.
A system whose
dynamicsare described by a nhodel of

this form is said to be Hamiltonian (Goldstein, 1980).
In additionto the property that H is constantalongtrajectories,
Hamiltoniansystems have another important attribute: phase space
Equations
186
In other words, a "blob

of
volumeis
and rotate with time, but it cannot shrink
deform
may
initial conditions looking at the divergence17 in phase space of the
by
We can see this Hamiltonian system
trajectories.
conserved along
vector field for a
Pi
Cli
THEOREM.In general vector fields

LIOUVILLE'S
as
known
is
This result
conservative;if V f < 0 the systemis
be
to
said
are
with V f = 0
f for a gradient system?
dissipative.18What is V conservative vector fields is the velocityfields
An important class of
two dimensions, equations for motionof
of incompressibleflows. In
with the Hamiltonian function being
a fluid element are Hamiltonian,
three-dimensional incompressibleflow
simply the stream function. A
generally be Hamiltonian. Why
field, although conservative, cannot
not?
Single Degree-of-FreedomHamiltonian Systems

A mechanical system with only one degree of freedom, such as a particle moving along a line or a pendulum restricted to swing in a single
plane, illustrates some of the important features of nonlinear differential equations. In this case p and q are scalars. Often the Hamiltonian
can be written in this simple form
1
p2+ V(q)
Alongany trajectory, H is constant, so we can solve for the momentum

in terms of the position
Trajectoriesin phase space are thus symmetric across p = 0. Furthermore, this formula can be used to construct the energy landscape,the
curves of H = constant on the (q, p) plane. From Hamilton's equations
See Section 3.2.
18Grmelaand ttinger (1997)have developed

a
formalism
modelsOf
for continuum
materials, in which the vector field is
simply a sum of a Hamiltonian part and a gradient
part.
Dynamics of Nonlinear Initial-Value Problems

Qualitative
2.7
187
for p, we can see that steady states occur when

and the expression = 0. For the
pendulum, V(q) Kcosq, where
v'(q) and V/q
the energy landscape and phase-plane trajectories are

Kis a constant; 2.16 for K 2. Note in
particular the trajectories
shownin Figure
"hill" or "valleys," connecting two saddle points. These
that round the
ORBITS.
Denoting the two
specialtrajectories are called HETEROCLINIC
both
steadystates involved as P and Q, the heteroclinic orbit is part of
the unstable manifold of P and the stable manifold of Q. If the pochanged to V(q) q2+ {q4, the landscape and
tential energy is
trajectorieSare as shown in Figure 2.17. Now we have two trajectories
ORBITS.The
connectinga saddle point to itself, called HOMOCLINIC
homoclinicorbit is part of both the unstable and stable manifold of

the steady state. Homoclinicand heteroclinicorbits are examples of
globalfeatures of a dynamical system, because their existence cannot
be deducedby only looking at behavior in a small neighborhood of a
particularpoint. Hamiltonian systems are not structurally stable; physicallywe can understand this by noting that any dissipation of energy
leadsto "downhill"motion on the energy landscape and the special

propertiesthat H = constant on trajectoriesand V f = 0 are lost.
similarly,homoclinicand heteroclinicorbits are not structurally stable features,but they remain important for general systems because
theycan arise at particular points in parameter space, called GLOBAL
(Guckenheimerand Holmes, 1983).

BIFURCATIONS
27.4 Long-Time Behavior and Attractors

A questionof significant practical interest when studying a mathematicalmodelof a process is: what happens to the dynamics after a long
time,i.e., as t 00?In one- or two-dimensionalphase spaces, the possibilitiesare quite limited and we describe them essentially completely.
In three or more dimensions, very complex behavior is possible and we
shallonlytouch on the topic.
One Dimension
If x is a scalar, then the autonomous equation
canalwaysbe written in gradient system form
dx
188
Equations
-2
-4
-4
-2
=
Figure 2.16: Energy landscape for a pendulum; H
Kcosq;
where V(x) f f (X') dx'. All initial conditions must end up at a

steady state, or roll downhill forever toward 00.
Two DimensionsPlanar Systems
Not every two-dimensionalvector field can be written as the gradi19) systems are not
ent of a potential, so two-dimensional(or PLANAR
quite as restricted as one-dimensional ones. Nevertheless, they are
still fairly constrained by the topology of the plane. Let us write a twodimensional system as
19Notall two-dimensionalsystems are planar. For example, consider a systemwhose

trajectories are restricted to the surface of a torus, i.e., a doughnut with a hole. This
surface cannot be mapped onto an unbounded plane. We discuss this case whenconsidering three-dimensionalsystems. On the other hand, it turns out that the surfaceOf
a sphere can be mapped onto a plane, but the mapping is singular. Another nontrivial
two-dimensionalsurface is a Mbius strip.
2,7 Qualitative
189
0.5
-0.5
-1
-1.5
-1
-0.5
0.5
1.5
Figure 2.17: Landscape for H = Bp2 + aq
e R2. The steady states of this system are simply the

where
- O. Near
intersectionsof the curves fl (Xl,X2)= 0 and
thesesteady states, the behavior is described by the linearizations, if
the eigenvalueshave nonzero real parts. In addition to steady states,
weknowthat closed trajectories (oscillations)can arise, as we saw in
theHamiltonianexamples described previously. Can anything else happen as t - co? Can, for example, a periodic orbit have a figure-eight
shape?The answer to this is no; for trajectories in phase space to cross
wouldrequire two values of the vector field (fl , f2) Tfor the same point
, x2)T, which cannot occur. This prohibition on trajectories crossing

appliesin any number of dimensions, but in two dimensions it severely
constrainsthe possible behavior. One very important consequence of
the constraint is the POINCAR-BENDIXSON
THEOREM
Theorem2.32 (Poincar-Bendixson).If D is a closed region of R2 and

a solution(Xl(t),x2(t))T e D for all t > to, then the solution either is a
closedpath, approaches a closed path as t 00,or approaches a fixed
point (steady state).
190
Equations
theorem, consider the System

this
of
As an application
+ 2y2)
= x y x(x2
coordinates
Transformingto polar
gives
e) = 1 + Yr 2 sin 2 0 sin 20
= r(l
r 2(1 + sin2 20))
that the origin is the only

the equations we see
of
steady
form
this
From
do
the
trajectories
where
So
go?
Note
unstable.
that
state and that it is
must be bounded.
the trajectories
Furthermore
< 0 for all r > 1 so
and
2/6
rl
r =
0 on
0 for all O on the circle entering this annulus (let the circle
us call it D)
= 1. So all trajectories
is the area between the two gray circleson
never leaveit. This region
D, there can be no
throughout
steady statesin
Figure 2.18. Since > 0
Poincar-Bendixson theorem thus requires that there
this region. The
in this
(periodic orbit)
region. Numerical
be at least one closed path
integrationreveals that for this problem there is one asymptotically
Partof
stable periodic orbit, which is also known as a LIMITCYCLE.
as
a trajectorythat starts near the origin, as well the limit cycleit approaches, are shoml in Figure 2.18.
At this point we have seen two types of behavior that trajectories
may tend to as t 00.'a steady state and a limit cycle. These are simple
A good working definition of an attractoris
examples of ATTRACTORS.
the following.
An attractor A of a dynamical system is a set of points that

is invariantunder time evolution of the system, and that is
00 of all initial conditions
the ultimate destination as t
that begin sufficiently near it, i.e., in a neighborhood U 20
For planar systems, the Poincar-Bendixson theorem dictates that the
onlyattractors are steady states and limit cycles. Note that the twodimensionalHamiltoniansystems discussed above also have periodic
orbits;these are not attractors because an initial condition closeto one
such orbit does not approach it as t 00.The fact that trajectoriesof
Hamiltoniansystems lie on constant energy surfaces precludesthem
from having attractors.
20see,for example, Guckenheimer and Holmes (1983) for a discussion of variousdefinitionsof attractors and the difficulties
in developing a satisfactory generaldefinition
2.7
Qualitative
191
0.5
-0.5
-1
-1
-0.5
0.5
x
Figure 2.18: A limit cycle (thick dashed curve) and a trajectory (thin
solid curve) approachingit. The region D is bounded

by the two gray curves.
Three Dimensions
Trajectoriesin the three-dimensional phase space R3 are much less
topologically
constrained than are those in one or two dimensions.
Thereis no three-dimensional analog of the Poincar-Bendixsontheoremand thus no restriction that all attractors be either steady states
or periodicorbits. We look first at a simple, geometricallydefined example.Consider a torus (a donut-shaped surface) floating in three dimensionsand assume that all trajectories asymptotically approach the
surfaceof the torus, so that we only need consider what happens on
thetorus itself. Further assume that there are no steady states on the
192
2
2
1
1
Tr
and quasiperiodic (right) orbits on the

Figure 2.19: Periodic (left) The orbit on the right
eventually passes
face of a torus.
domain.
the
in
through every point
can be represented by twoangus

torus. Nowany point on the torus
e [0, 2Tr), so we can represent
lar positions, O e [0, 2m) and
the
trajectory
that
leaves
squareany
one
a
by
side
space
of the
phase
a
Consider
very
side.
opposite
simple
the
on
square reenters
eVolution
of these variables
wherep and q are constants. Eliminating time and integrating wefind

an explicitsolution for the trajectories: O = P-(4 40) + 00,where
(00,40) is the value of (O,4) at a chosen value of t. Nowsince

and are in [0, 2m), the trajectory will return to (00,40) if - 00=
2nm when (()= 27Tn,where m and n are (as yet unspecified)
integers. Thisrequires that qm = pn, which can only hold if p/q =
m/n for somepair of integers m and n. That is, p/q must be a rational
number; this situation is a form of resonance. Otherwise, the trajectory

will never repeat and will eventually pass through even,' point on the
because O(t) and 4(t)

torus! Such an orbit is called QUASIPERIODIC,
are individuallytime periodic, but the pair (O(t), 4(t)) is not. Figure
2.19 shows trajectoriesfor the cases p/q = 9/7 (left) and p/q = u
(right).The qualitativedistinction should be clear. From this example
we see a new type of dynamical behavior, the quasiperiodic orbit.
Finally, we present one example of an even more complex typeof
attractor that can occur in phase spaces of dimension 3 or higher. Con-
2.7 Qualitative Dynamics of Nonlinear
Initial-Value
Problems
siderthe system
193
52= x + ay
=b+z(x c)
system. If a =
knownas the RSSLER
b = 0.2, c
displaysa limit cycle, as shown in
Figure
1, the system
2.20.
the
has
-system
If c
attractor
the
in
Figure
2.21. This
neither periodic nor quasiperiodic; in
attractor is
fact,
nearby initial
followsimilar paths but will
eventually diverge conditions will
is
known
from one another.
as SENSITIVITY
Thisproperty
To
INITIAL
is characteristic of CHAOTICdynamics.
CONDITIONS
and
Loosely
speaking,
on which the dynamics are chaotic is
an attractor
called a STRANGE
(Guckenheimerand Holmes, 1983;
ATTRACTOR
Strogatz, 1994).
21.5 The Fundamental Local Bifurcations

of Steady States
Wenow have seen a variety of possible
behaviorsfor nonlinear
namicalsystems: steady states, periodic
orbits, quasiperiodic dystrange attractors, heteroclinic
orbits,
orbits, homoclinic
orbits.. .. Our focus
now shifts to understanding the ways in which
the qualitative behavior
of a system changes as we change parameters.
This branch of the theory of differential equations is called BIFURCATION
THEORY(IOOSS
and
Joseph, 1990).
Webegin the discussion just by thinking
generallyabout the steady
states of
wherewe now explicitly indicated the dependent of the vector

field f
on the parameter u. For definiteness, assume that derivatives

of f of all
ordersexist. Let xs (u) be a steady state, i.e.,

= 0. Wecan
determinefrom the linearization of f at xs whether this steady state is
hyperbolic. If it is, then we know, from the Hartman-Grobman theorem,
that a small change in u does not change the qualitativebehaviornear

xs. Thus our attention focuses on behavior near values of u where xs
is not hyperbolicwhere the linearization has eigenvalueswith zero
real part. This is where qualitative changes in the local behaviornear
xs can occur.21 We denote a value
where xs is not hyperbolic as a
BIFURCATIONPOINT.
If the system has a special structure, like a Hamiltonian,then additionalconditions
mustbe satisfied for bifurcation to occur.
21
Equations
194
1.5
0.5
cycle for the

Figure 2.20: A limit
Rssler system, a = b = 0.2, c = 1.
30
20
10
10
-10
10
15 -15
-10
Figure 2.21: A strangeattractorfor the Rssler system, a = b = 0.2,

c = 5.7.
2,7
Qualitative
Dynamics of Nonlinear
Initial-Va/ue Problems
195
be productive to begin our examinationof bifurcations

It
with
systems: x e RI, p e RI. We
dimensional
shall
see later how this
_
to
generalizes
higher-dimensional
discussion
systems. Withoutloss of
specify
can
that
the
bifurcation point is at
generality,wedependent
= 0 and
variable
y
=
x
new
xs(po). Taylor
definea
0, p = 0 and using the facts that f = fx = 0 give expanding
j'
+
fpl + 72(fxxy2 2fxpuy + fuuu2) +
1
(fxxxy3 + 3fxxuy u + 3fxpuyu2 +
(2.77)
Herethe subscript denotes partial derivative,fu = (f/p)

Wenow examine the structure of solutions to this in the most
etc.
important cases.
Saddle-NodeBifurcation
Webeginwith the most general case: the partial derivativesoff (other
thanfx) involved in the leading-order behavior are nonzero at the bifur-
bifurcation behavior; the behavior

cationpoint. This gives the GENERIC
that arises in the absence of special conditions on f. For small and
y, the dominant balance in (2.77)is
5" =
Thishas steady states
+ fxxy 2
(2.78)
-2fuu
xx
(Tosee this, check that when y = O(v) the terms in (2.77)that we

neglectedto get (2.78) are small compared to the ones that we kept.)
Therefore,depending on the sign of fu /fxx, there are two real solutions
for > 0 and none for < 0 or vice versa. The point = 0 is thus
quitespecialin that on one side of it there are no steady states near
y = 0 and on the other there are two. This type of bifurcation point
is called variously a LIMIT POINT, TURNINGPOINT, or SADDLE-NODE
bifurcationpoint. It arises when the conditions fx = 0,fu O,fxx 0

are satisfied. By rescaling, we can write the NORMAL
FORMfor this
bifurcation as
2
(2.79)

196
0.5
-0.5
-1
-0.2
0.2
0.4
0.6
0.8
1.2
diagram for the saddle-node bifurcation. EvFigure 2.22: Bifurcation

looks like this modulo a verery bifurcationof this type
across y = 0, u = O.
tical and/or horizontal reflection
The branch of stable solutions is the solid curve, the
unstable branch is dashed.
Nowthe steady states are simply y = VIP;the positive root is stable

and the negativeunstable. When u = 0, there is a single (repeated) root,
whichis stablefrom the right but not the left, and when u < 0 there
is no steady state, although trajectories that pass close to y = 0 move
very slowlythrough that region. The time spent in the interval [-1, 1]
is Tr The BIFURCATION
DIAGRAMassociated with the saddle-node
bifurcationis shown in Figure 2.22. It summarizes the position and

stabilityof the steady states as u changes.
Transcritical Bifurcation
In the above scenario, steady states exist only
on one side or the other of
the bifurcation point. What type of
bifurcation do we expect to see if we
know,on physicalgrounds, for example,

that solutions exist on both
sides of the bifurcationpoint?
the additionalconditionthat To capture this situation, we impose
fu = 0 at the bifurcation point. Nowwe
find that y = O(u) and
the leading-order equation for the dynamics
becomes
(fxxy2 + 2fxuuy + fuuu2)
Initial-ValueProblems
Qualitative
197
0.5
-0.5
-1
-0.5
-1
Figure 2.23:
0.5
Bifurcation diagram for the transcritical

bifurcation.
ery bifurcation of this type looks like this modulo Eva ver-
tical and/or horizontal reflectionacross y = 0, u

=
The stable branch is solid, and the unstable branch0.
is
dashed.
states
Thishas steady
1
fxxfuu) u
sothe steady states are (locally) lines in the (y, u) space, which cross
at (Y,u) = (0, 0). Since steady states persist on both sides of the bibifurcation. It
furcationpoint, this scenario is called TRANSCRITICAL
ariseswhenthe conditions fx = 0, fu = O,fxu * O,fxx * 0 are satisfied.Wecan make the presentation simpler without loss of generality
bysettingfug 0 and rescaling, which gives us the normal form for
thetranscritical bifurcation
5' = y (u + ay),
a = 1
(2.80)
Wecanshowthat the steady state y = 0 is stable when u < 0 and
unstablewhen u > 0, and the nontrivial steady state y = u/a has the
oppositestability characteristics. The solutions are sometimes said to
"exchange
stability" at the bifurcation point. The bifurcation diagram
forthe transcritical bifurcation is shown for a < 0 in Figure 2.23.
Equations
198
0.5
-0.5
-1
-1
-0.5
0.5
0.5
-0.5
-1
-1
-0.5
0.5
Figure 2.24: Bifurcationdiagrams for the pitchfork bifurcation. Top:

supercritical bifurcation, a = 1. Bottom: subcritical
bifurcation,a = 1. The stable branches are solid, the
unstable are dashed.
Pitchfork Bifurcation
Manyphysicalproblems have some symmetry that constrains the type

of bifurcation that can occur. For example, for problems with a reflection symmetry,a one-dimensionalmodel may satisfy the condition
f (x xs;u) = f((x
for all valuesof u. Withy = x xs we have that f (y; u) =

so y = 0 is alwaysa solution and f is odd with respect to y = 0.
Therefore,at a bifurcationpoint y = 0, u = 0 we have that 0 = f =
fxx = f xxxx ... and 0 = fu = fug =
Our Taylor expansion
2.1
Qualitative
Initial-Value
Problems
199
becomes
tus
5" = fxuuy + GfxxxY3

O(Al/2). Rescaling,we find the normal form
5' = y (u + ay2), a =
y = o, y = u/a.
has steady states
(2.81)
The steady
states and
this bifurcation are shown in Figure 2.24.
for
stability
For obvious reais
called
scenario
PITCHFORK
BIFURCATION.
sons,this
It arises when
=
O,
f
fx
(y;
u)
=
f
(
y;
conditions
the
* O,fxx = O,fxxx O
1,
then
=
a
the
If
nontrivial
steady-state branch exists
aresatisfied.
is
and
stable;
this
case is
>0
said to be SUPERCRITICAL.
onlyfor u
If
branch
nontrivial
exists
for u < 0 and is unstable;
+1, the
this is
case. Note that in the latter case,
sUBCRITICAL
the
the linearlystable
not
be
will
approached
by
branch
initial conditions
trMaI
> u/a; so although small perturbationswith magnitude
from the
decay, larger ones grow.
0
=
y
state
steady
HopfBifurcation
In all of the above scenarios, solutions either monotonicallyapproach
a steadystate or go off to oo(or more precisely, to where higher-order
termsin the Taylor expansion are important). We now consider the
casewhere we expect oscillatory behavior, i.e., where the linearized

versionof the problem has complex conjugate eigenvalues = ico.
we must move from one- to two-dimensionalsystemsfor
Obviously,
thisbehavior to occur. As above, we expect a bifurcation when the
steadystate is nonhyperbolic, so = 0 and the eigenvaluesof J are
purelyimaginary. In this instance, the steady-state solution persists on
bothsides of the bifurcation point, as long as co * 0 when is small.
Welet = with u = 0(1) and write the model as
x
+N3(x,x,x)
+ O(lx14)
Thebehaviorof the linearized system is characterizedby oscillation

on a time scale of co-I (which we assume remains finite as 0), and
slowgrowth or decay on an O
) scale. In Example2.31, we used the
methodof multiple scales to show that for small c, balancing the linear
growthterms with the nonlinearity requires that x = 0(1/2),as in the
Ordinary Differenti
a/ Eq
200
QQti0hs
cases above, and that the solution

saddle-node
hasth
pitchforkand
form
x(t) = e l /
2r(t)
phase 4) of the solution are given by

and
r
amplitude
wherethe
= epr + aer
2
d) = ber
(2.82)
(2.83)
and b are
a
constants
The
functions of the nonlinearity and of Q)

(Gucks
1983;looss and Joseph, 1990). These equations

Holmes,
and
enheimer
form for the so-called HOPF BIFURCATION,
normal
the
comprise
steady states (r O) to periodic the
connecting
orbits
genericbifurcation
identical
is
in
r
for
equation
form
the
to
thatfor
(r 0). Noticethat
<
0,
we
have
a
a
if
So
supercriticalHopf
the pitchforkbifurcation. increasing from a stable
with
steady state to
bifurcation, a transition
amplitude is El 27"
ue/a. For the
suba stable limit cycle whose
but
it exists for <

periodic solution,
0
criticalcasea > 0 there is a
we
see that on thelimit

phase equation,
andis unstable.Turningto the
of the solution is co + bu/a. It
cycle, = pebt/a, so the frequency
changeslinearlyas p increasesfrom zero, with a rate determinedby

b/a.
2.8 NumericalSolutions of Initial-Value Problems

Wehave seen that for linear constant-coefficient problems, a complete
theoryexistsand the general solution can be found in terms of eigenvalues and eigenvectors.For systems of order greater than four, however,
there is no general, exact way to find the eigenvalues. So even in the
most well-understoodcase, numerical approximations must be intro-
duced to find actual solutions. The situation is worse in general,becauseno simple quantitative theory exists for nonlinear systems. Most
of themneed to be treated numerically right from the start. Therefore
it is important to understand how numerical solutions of ODESare constructed. Here we consider initial-value problems (IVPs). We focus on
the solution of a single first-order
equation, because the generalization
to a system is usually apparent.
The equation
2.8 Numerical Solutions of Initial-Value Problems
integrated from a time

canbe formally
t to a future
time t + At to
f(x(t'),t')dt'
20)
read
(2.84)
in
Thecentralissue the numerical solution of IVPsis
evaluationof the integral on the right-hand side of the approximate
this
a goodapproximation and a small enough time step At,equation.With
the aboveformulacan be applied repeatedly for as long a time
interval as we like,
i.e.,x(At) is obtained from x(O), x(2At) is obtained
from x(At), etc.
x(kAt).
Weuse the shorthand notation x(k)
2.8.1 Euler Methods: Accuracy and Stability

Thethree key issues in the numerical solution of IVPsare SIMPLICITY,
and STABILITY.
We introduce each of these
AccURACY,
issues in turn,
in the context of the so-called Euler methods.
The simplest formula to approximate the integral in (2.84)is
the
rectanglerule. This can be evaluated at either t or t + At, givingthese
twoapproximationsfor x (t + At)
x
x
x( k) +
(2.85)
t(k+l)
(2.86)
Thefirst of these approximations is the EXPLICIT

or FORWARD
Euler
second
is
the
the
IMPLICIT
or BACKWARD
scheme,and
Euler scheme.
TheexplicitEuler scheme is the simplest integration scheme that canbe

obtained.It simply requires one evaluation of f at each time step. The
implicitscheme is not as simple, requiring the solution to an algebraic
equation(or system of equations) at each step. Both of these schemes

schemes, as they involve quantities at the
areexamples of SINGLE-STEP
beginningand end of only one time step.
Toconsider the accuracy of the forward Euler method, we rewrite it
likethis
x (k) +
t( k)) + eAt
wherec-is the LOCALTRUNCATIONERRORtheerror incurred in a single
timestep. This can be determined by plugging into this expression the

Taylorexpansion of the exact solution
31
EquQti0hs
202
yielding
(k)At+X
of
(k) t(k)), the first two terms on each
we find
and
cancel,
this equation
that
Side
+ o(At 2)
O. The implicit Euler method obeysthe

At
as
as Atl
are said to be "first-order
methods
scales
Euler
Thus
Thus the
is simpler, is there any reason to use
scaling.
method
same
the explicit
and arises when we look
rate." sincemethod? The answer is yes
at the
stability.
the implicitmentioned above,
equation = Ax, so f (x, t)
third issue
Ax. If
linear
single
not
asking
is
It
too
00.
Considera
much that
-.-0 as t
x(t)
then
property.
0,
same
The Eulerap_
Re(A)< approximation maintain the
a numerical
special case are
this
for
proximations
x (k+l) = x(k) + Atx(k)
x (k+l)
x (k) + AAt x (k+1)
the iteration formula can be writtenin

scheme,
Euler
explicit
For the
(k+l)= Gx(k),where in the present case G = (I +
the generalform x
or AMPLIFICATION
FACTORfor
GROWTHFACTOR
AAt). We call G the
approximation. By applying this equation recursively from k =
the
k 0
00as k
k
if IGI > 1, then x (k) ..>
0, we see that x ( ) = G x( ), so (k)
0 as k
0. Thus thereis
if IGI < 1, then x
00. Conversely,
CRITERION:IGI < 1. This is equivalent to
STABILITY
a NUMERICAL
Gi + G} < 1, where subscripts R and I denote real and imaginary parts,
respectively. For explicit Euler, GR 1 + RAt,G1 = IAt, yielding
stabilitywhen
(1 + RAt)2 + (IAt) 2 < I
On a plane with axes RAtand IAt, this region is the interior of a circle
centeredat ARAt= 1,IAt = 0. If At is chosen to be within this circle,

the time-integrationprocess is numerically stable; otherwise it is not.
If is real,instabilityoccursif
> 0; this is as it should be, because
the exactsolution also blows up. But it also happens if < 0 but At <
-2/, whichleads to G < 1.This is pathological, because the exact
solutiondecays. This situation is known as NUMERICAL
INSTABILITY.
2.8 Numerical Solutions
of
Initial-value
Problems
2.5
'2
1.5
203
Exact
Explicit
Implicit
11
0.5
x(t)
-0.5
-1
-1.5
-2
-2.5
5
10
15
20
Figure 2.25: Approximate

solutions to
implicit
Euler methods
with At = 2.1,
along with the
A numerically unstable solution

is not a
faithful approximation
of the
For a system of equations =

Ax,
only if the time step satisfies the IGI numerical stabilityis obtained
<1
values i. Observe that for systems with criterion for all of the eigenpurely imaginaryeigenvalues,
i.e., purely oscillatory solutions, the
explicit Euler method is never
numerically stable.
Now consider the same analysis for the
implicitEulerscheme. We
can again write x(k+l) = Gx(k), but now G = (1
At)-1.Therefore
IG 12 = GG = (1 2RAt + 1/\12 At 2) -1
whenever
< 0. That is, if the exact solution decays, so does the
approximation. The stability of this method is independentof At, so it
is said to be ABSOLUTELY
STABLE
or A-stable.
Figure 2.25 shows plots of x(t) for the case = 1startingfrom
initial condition xo = 1 using explicit and implicitEulermethods with

e-t. The implicit
Euler
displays
numerical instability.
solution
Euler
explicit
Systems
whilethe
and Stiff
Accuracy,
model whose shortest time
Stability,
equation
time step At that
2.8.2
is
cannot choose a
we
a differential
jump right over the
will
have
Obviously,
solution
we
Say
is tint, our approximate
that At < tint. But if we
or
of interest
accuracyrequires
step smaller than t

requires a time
min,
stability
problem.
For example
entire
method,
the
of
scale
usean explicit
smallesttimemight be the reaction time for a free-radical
the
whichis problem,this
so fast that its concentration always
are
kinetics
kinetics
in a
tmn
whose
Problems where
intermediate
equilibrium.
used to solve such problems,
remainsnear methodsare always
time steps.
So
small
STIFF.Implicit requireunreasonably
explicitmethodsthe problem
for
In general,
wecan
= Ax
a single-step
For example,consider
scheme as
(k)
the system
= Ax
-100
so its characteristic time

ThematrixA has eigenvalues -3 and -100,
-100t, so it is negligible after
scalesare 1/3 and 1/100. In fact X2(t) e
onlya very short time. The explicit Euler method must capture this time
scaleto remain stable. Specifically,G = I + AtA, whose eigenvalues are
I - 3Atand 1 - 100t,givinga stability limit At < 2/100. If implicit

Eulerisused instead, G =
whose eigenvalues are 1/(1+
3At)and 1/(1 + 100At), which are both always less than one. Again,
the implicitEulermethod is always stable.
2.8.3 Higher-Order Methods
TheEulermethods are
simple to implement and convenient for introducingthe conceptsof
simplicity, accuracy, and stability, but they are
2.8 Numerical Solutions of
Initial-Value
Problems
not necessarily the most

efficient for
ample,if an implicit method
is required,solvingreal
formula
(AM2)
order
is much
problems.
the
For
preferable.
ADAMS-MOULTON
rule rather than the
This
therefore has second-orderrectangle rule to formula uses thesecondtrapeis usually given by a numberaccuracy. The evaluatethe integral
accuracyof
and
Therefore, the Euler p, the exponent
IVP
in the
methods have
P 1 and expression =
AM2has P
= 2.
= x(k) + At
2
Likethe backward Euler method,

this formula
requires the
at
ward Euler method, it is A-stable. each time step. Also solutionof
It is preferable
like the backmethod because it has higher
to
the backward
accuracy, the
Euler
same
work.
no more
AM2 is widely used
stability,
for stiff problems. and requires
formulas of arbitrary order are
Adams-Moulton
available. The
third-orderformula,
for
(2.84)by polynomial approximation.
These methods are
however, and since they are expensive,
not A-stable,
are
rarely
used (exceptin the
contextdescribed later in this section).
The second-order ADAMS-BASHFORTH
(AB2)method is an
method that also uses the trapezoid rule,
explicit
but
it
extrapolates
to the
point f (x (k+1) t (k+1)), using current and past
of f. Denoting
t(k)) by f (k) AB2 approximates f(k+l) byvalues
f(k) + (f(k) _ f(k-l))
(linearextrapolation), so it is a two-step scheme.
Usingthis extrapolation in the trapezoid rule formula aboveyields
X(k)+ At (3f(k)
x
2
The price that is paid for higher accuracy without more work is a sta-
bilitylimit that is twice as restrictive as the forwardEulerlimit,e.g.,

for real the stability limit is At < 1instead of 2.This stricter
limit arises from the extrapolation that Adams-Bashforthuses, as seen
in Figure 2.26. Adams-Bashforth formulas of arbitrary order also are
available. The third-order formula, for example, uses f (k) f( k- l ), and
f(k-2)
Stability can be improved by combining an explicitmethod for "pre-
dicting"x
with an implicit method for "correcting"it. Suchap-
methods. Often the order

proaches are called PREDICTOR-CORRECTOR
corrector.We
Ofthe predictor is chosen to be one less than that of the
206
1 order predictor combined with
denote APCnas the n
Equations
the
1. A predicted value of the solution at the next time step is denote

d
At (3f(k) f(k-l))
2. This value is now corrected, using the implicit third-order Adams.
Moultonformula
x(k) +
x
where
At
+ 8f(k) f(k-l))
12
t( k+l ))
APC3displays third-order accuracywith only one more functioneval_

uation than explicitEuler and comparable stability. Figure 2.27shows
the stability regions for the APC2, APC3, and APC4 methods. If Atfor
each eigenvalue of the Jacobian of f (k)is within the region, the method
is stable. If the solutions are expected to be very smooth and function

evaluationsare expensive,the APCmethods are very economical,
because of their high-orderaccuracywith only two function evaluations
per time step.
Adams predictor-corrector methods are multistep methods because
(RK)meththey use information from prior time steps. RUNGE-KUTTA
ods also have higher-degree accuracy than Euler, but are one-step meth-
ods, a useful feature in situations where one may want to changethe

time-step during the course of the integration. The simplestof these,
RK2,uses the trapezoid rule to obtain second-order accuracy,extrapolating to f(k+l) using a simple forward Euler step: f (1<+1)f(k)+
Atf (k). Letting
f (x (k) + Atkl,t + At)
the trapezoid rule formula becomes
At
RK2is in fact identical to APC2(because a first-order Adams-Bashforth
formula is simply an explicit Euler step), but RK4,the fourth-order
2.8 Numerical Solutio

ns of
Initial-value
Problems
207
0.5
AB2
AB3
Im(At)
-0.5
-1
-1.5
-1
-0.5
Re(At)
0.5
Figure 2.26: Stability

regions for
see also Canuto Adams-Bashforthmethods;
et al. (2006,
= Rx;
Fig. D.l).
Runge-Kutta formula, has a larger

stability limit than
the corresponding
APC4method; see Figure 2.28. The
RK4formula is
x(k+l)
(k) At
in which
+ Atk2, t( k) + At)
+ Atkl, t(k) + At)
+ Atk3, t( k) + At)
If f were independent of x, this would reduce to the Simpson'srule

formula. RK4 requires four function evaluations. Becausethey have
better stability properties than APCformulas,Runge-Kutta
methods
are generally preferable for nonstiff problems unless evaluationof f is
expensive. If f is stiff, AM2 is the method of choice.
208
Equations
1.5
APC2
APC3
APC4
0.5
Im(At)
0
-0.5
-1
-1.5
-2
-2.5
-2
-1.5
-1
Re(At)
-0.5
0.5
Figure 2.27: Stability regions for Adams predictor-correctormeth-
ods; = Rx; APCn uses (n l)st-order predictorand

nth-ordercorrector; see also Canuto et al. (1988, Fig.
4.7)
2.9 Numerical Solutions of Boundary-Value Problems

2.9.1 The Methodof Weighted Residuals
There are basicallytwo ways to make a continuous problem, likean

ODE,discrete. One is to choose a finite number of points (valuesof
the independent variable)and find an approximate solution at those
points. This is what we did to solve initial-value problems (IVPs).We
picked a point a distance At from the current time step, and usedvarious approximateintegration techniques to find the solution at that
point. This is a natural approach for IVPs,because the solution at each
time depends only on the solution at the immediately previoustime.
The situation with boundary-value problems (BWs) is different. In this
case, the solution at any point is coupled to the solution at all other
points in the intervalbecause the boundary conditions are imposed
at both ends of the interval (think of a diffusion problem). Soif the
2.9 Numerical Solutions of
Boundary-Value
Problems
209
RK4
1
RK3
RK2
Im(NAt) 0
-1
-2
-3
-3
-2.5
-2
-1.5
-1
-0.5
Re(At)
0.5
Figure 2.28: Stability regions for Runge-Kutta

methods; = Rx; see
also Canuto et al. (2006, Fig. D.2).
solution at a point changes, so does the solution at the neighboring

points. A natural way to take this fact into accountis to approximate
the solution as the sum of a finite number of functions, i.e., to choose a
set of functions over the interval and represent the solution as a linear
combination of those functions. A general and systematicapproach

OFWEIGHTED
to this approximation process is given by the METHOD
RESIDUALS(MWR).
Considerthe linear ODE
Lu = f (x) x e [a, b]
We choose a set of TRIALFUNCTIONS
in which to represent the
solution
solutionu(x) and let un(x) be the approximate
n
un(x) = E cj(x)
and the trial funcu(x)
solution
the
For the moment we require that
though it is easyto
conditions,
tions satisfy homogeneous boundary
Ordinary
210
Differential
relax this requirement. As n 00,we expect un to approach

solution. For finite n, we expect a finite error, or residual,R
define pointwise as
R = Lun -f
Obviously,if un = u, then Lun = f, the equation is solved
In any case, we want R to be as small as possible. In what andR
sensedo
Ctlons,
{(/'i(x)}, and require that the WEIGHT
or TESTFUNCTIONS
FUNCTIONS
(2.87)
This condition is equivalent to requiring that the residual be

to all of the test functions, with respect to the chosen inner
product
Weexpectthat an approximate solution un (x) that satisfiesthese
ditions will converge to the exact solution as n 00because a
function
that is orthogonal to infinitelymany basis functions must be zero.
Us.
ing the expressions for R and un, the condition becomes
n
Setting
and
AiJ
(Lcj(x),
bt = (f (x), (Vi(x))
(2.88)
(2.89)
results in the linear algebraic system Aijcj = bi. Weknow,of course,

how to solve this. Once we have done so, we have the coefficientscj in
the series for un and therefore we have our solution.
As yet, the trial and test functions have been left unspecified.We
already have introduced several examples of trial functionsandshall
shortly see another. As for test functions, there are two common
choices,which lead to two types of formulations:
If the trial functions are orthogonal,
1. Galerkin: (Pi(x) = (hi
this approach simplyforces the first n terms in the representation
of R in the trial function basis to vanish.
= 1,2,..., n is a
2. Collocation: (Pi(x) = (x Xi), where
Xi)) = R(Xi),the
POINTS.Since (R,
set of COLLOCATION
collocation method simply requires the residual to be zeroat the
chosen set of points.
2.9 Numerical Solutions
of
Boundary-Value
Problems
We introduce a number
of specific
modelproblem
211
MWR
implementations
Sincethe boundary
conditions are
not
1). Now u(0) = u(l) = O
homogeneous,
and the
let u
equation
becomes
using the
(2.90)
(2.91)
Galerkin Method
Finite element Galerkin

method. In this
are low-order piecewise
method,thetrial
polymmials localized
functions
domain, known as the elements,
to small subsets
and are zero
of the
space 1.2(0, 1) and the set of
elsewhere.
Considerthe
functions (i(x)
where
o,
o,
o,
otherwise
h, xj < X xj+l
otherwise
h' XN-I S X
otherwise
with xj = jh and h = I/N. These functionsare called"hat"functions

and are shown in Fig. 2.29 for N = 2. Observethat
and
are nonzero in overlapping regionsthese regions are the "elements"
to which the name of the method alludes. Thesefunctionsare not

orthogonal. Attractive features of this set are that the functionsare
spatially localized (important for multidimensionalproblemsin complicated domains) and simple, and that the coefficientscj are the actual
values of the (approximate) solution at the points xj: cj = Un(Xj).
For (2.91), the boundary conditionsu(0) = u(l) = 0 obviatethe

in the basis, since they do not satisfythe boundary
and
use of
(i(x), so the weighted
conditions. In the Galerkin approach, (Pi(x) =
residual conditions become
= 0, i = 1,2,...,n
212
Ordinary
Differential
40 (x)
0.5
0
0.5
Figure 2.29: Hat functionsfor N
= 2.
where n = N 1. Thus
o,
(W"'i + "i)dX
otherwise
and
= ih 2
Note that integrating by parts is unnecessary if we are willingtodeal
with the delta functionnature of W' for the hat functions.Nowwe
have a linear algebra problem Aijcj bi, which can be solvedbyLU

decomposition, for example. For this particular choice of basis,Ahasa
special structure: only the diagonal elements and those just aboveand
below the diagonal are nonzero. Such a matrix is calledTRIDIAGONAL
and can be LU decomposed quickly, i.e., in O (n) operations, sincemost
of its entries are already zero. In general, an n x n matrixthatonlyhas

n nonzero elements is said to be SPARSE.
Becausethe trialfunctionS
in this case are piecewise linear, the 1.2norm of the error Ilun co.
as n
decays rather slowly as n increases: llun u112=
=
-The maximum (LOO)
error decays even more slowly: llun
2.9 Numerical
Solutions
of
Boundary-value
Problems
213
0.06
Exact
- 12
UN(X) 0.04
0.02
0.2
0.6
0.8
Figure 2.30: Approximate

solutions
ment method with hat to (2.91) using the finite eleThe exact solution alsofunctions for N = 6 andN = 12.
is shown.
as n +oo (Hughes, 2000; Strang and Fix,

2008). Figure 2.30 shows
finite element solutions for this problem, as well as the exact

solution
u(x) = x+ csc(l) sinx.
The finite method bears some similaritiesto FINITEDIFFERENCE
methods, which instead of expanding solutionsin basis functions,considers function values at distinct grid points in a domainandreplaces
derivatives by difference formulas (Press,Teukolsky,Vetterling,and
Flannery, 1992). For example, u' (x) can be approximatedas
u'f(xj)
or
U(Xj+1) u(xj) + O(h)
u'b(xj) 'z
(2.92)
(2.93)
two equationsareknown
These
above.
The
as
where x j and h are defined difference formulas, respectively.
as FORWARDand BACKWARD
214
formula for the first derivative is givenby

difference
CENTRAL
u(X i) u(X l) + O(h2)
u'c(xj)
(2.94)
easily verified by Taylor expansion. Applying

are
formulas
These
central difference formula for the second this
the
gives
formula
tive
U(Xj+1) 2u(xj) + U(Xj-1)
u'c'(xj)
112
(2.95)
approximate the second derivative in (2.91)

Using this formula to
yields
equations
of
set
the following
U(Xj+1) 2u(xj) + U(Xj-1)

112
with u(xo) = u(xn+l) = 0. For comparison, writing the finiteelement

formulationabovein the same format gives
U(Xj+1)2u(xj) + U(Xj-1) 4U(Xj-1) + u(xj) + 4U(Xj+1)
6
112
- -jh j
Observethat the term corresponding to the second derivativeis identical in the two cases, as is the right-hand side. In many situations,
finite difference and finite element formulations lead to similar sets of
discretizedequations.A great advantage of the finite elementmethod,

however,is its flexibilityin dealing with multidimensional problemsin
complexgeometries,as one does not need to develop multidimensional
analoguesof the difference formulas.
Fourier-Galerkin method and eigenfunction expansion. Here,instead of the hat functions, we use the sine functions as trial and test
functions,i.e.,
= sinfrrx; we seek a solution in the formof a
truncated Fouriersine series. In the present case these trial functions
are eigenfunctionsof L. Choosing the trial functions to be the eigenfunctions of the linear operator is called EIGENFUNCTION
EXPANSION
and in this situation the matrix A defined by (2.88)becomes diagonal.
For the example
J 2 Tt2
(1)i-1
i7T
Numerical Solutions of Boundary-Value

Problems
215
diagonal nature of A makes the

solution
ace the above integrals have been performedprocedurefor c simple
--- j 2 Tr2 jTr
-3 for large j, the 1.2error scales

Because cj
as Xln3. This error
for
the
finite
than
element
method,but not as smallas it
is smaller
could be, because the solution to the problem is not a smoothperiodic

function, as the use of the Fourier basis implicitlyassumes.
Legendre-Galerkin method. The trigonometric functionsused in the
presious example were the eigenfunctionsof a regular
happens if we instead use the eigenfunctionsof a sinproblem. What
problem, for example the Legendrepolynomials?
gular
is set in the domain (O,1), whilethe natural
problem
example
Our
polynomials is (1, , so we changecoordinates,
Legendre
for
domain
X,which gives the new equation
2x
=
z
letting
d2u
=
Welet (j(z)
Pj-1(Z), so
Cj+1Pj(z)
conditions,so
boundary
the
satisfy
TAU
polynomials do not approach, calledthe GALERKIN
The Legendre a slightly modified
use
we need to
i 1,2,... ,n-2
for
only
conditions
method:
X.
unknownscj.
n
the
for
2 equation
expressionsfor
equations
these
2. Supplement
on
conditions
ary
Ordinary
216
Differential
Equ
2 rows of A and b contain the weighted

n
first
the
Now
last two rows the equations needed
the
and
equations,
to
equations resulting from the weighted
the
construct
To
properties of Legendre
conditions, the following
Qti0hs
polynomials
-1
(2k +
k=0,jk odd
These can be derived from
results can be used to yield
For the sample

problem,these
-(2k+
41 1(
+k+
jk even
ik +
jk even
for i = 0, n 3, j = 0, n 1 and
bi+l
-1 +
(z
dz
-1
(Po(z) + PI
dz
1
- -i0 -il
3
conditions leadto
for i = 0, n 3. The expressions for the boundary
bn-l
bn = 0
and exact
Wedo not plot the comparison between approximate visually
inare
two
the
5,
=
n
for
even
lutionsfor this case, because
j for n
distinguishable.Rather, Figure 2.31 shows lcjl versus plot,indicatFor j 4, the plot is nearly a straight line on a semilog or spectral
ing that cj decays exponentiallywith j. This exponential
Numerical Solutions of
Boundary-Value
Problems
10
-1
217
o
o
10
-3
10
10
10
10
-11
10
Figure 2.31: Dependenceof

on j for the
approximation of (2.91)
with n = 10.Legendre-Galerkin
convergenceis characteristic of MWRmethods
that use trial
functions
eigenfunctions
be
to
of
a singular
chosen
Sturm-Liouvilleproblem
(Gottlieband Orszag, 1977). For this reason these methods
are often called
METHODS.The rapid convergence
spECTRAL
reflects the fact that
the
Galerkinapproximation yields a solution
very close to the truncated

the
solution in the trial function
Fourierseries of
basis. Theveryhigh
spectral
methods
of
accuracy
does come at a costthe

matrix A is not
sparseso it cannot generally be factorized in O(N) operations.

Collocation Method
Galerkinmethods require evaluation of many integrals of productsof

trialfunctions. This fact is particularly cumbersomein nonlinearproblems.In the collocation method, the integrals of (2.87)are simplified
greatlyby the fact that the test functions are delta functions.Another
attractivefeature of the collocation approach is that the solution canbe
directlyrepresented by its values at the collocationpoints,ratherthan

as coefficientsin a series. To illustrate the structureof a collocation
(x), (x), (x)}and
formulation,consider the trial function set
threecollocation points Xl,
x3. The approximatesolutionis thus
un(x) =
The coefficients - c3 are
+
+
218
Equations
uniquely determined if the values of un are known at three

we can see by writing in matrix form the equations for the points,
valuesof uas
(Xl)
(2(x1) (3(x1)
(h1(X2)
This equation can be written Sc = U, where S is the (invertible)

transfor_
mation that relates the coefficientvector c, with the
vector of
values at the collocationpoints U
solution
U = [un(X1) un(X2) un(X3)]

We also can write the equations for dun/dx at the collocation
points
(Xl) ('2(X1) (Xl) Cl
4)'1(X2)
('2(X2)
(X2)
uln(X2)
u'n(X3)
or Sac = U'. Using the fact that c = S-1 U, we can write U'
= SdS-1U
or U' = DnU, where Dn = SdS-1 is called the COLLOCATION
DIFFERENTIATIONMATRIX.
With this formula, we can compute the derivative
of
the function un (evaluatedat the collocation points) directly
fromthe
function values at the collocationpoints. All of the informationabout
what basis functions have been used is absorbed into the operatorDn.
Similarly,the second derivativematrix is simply . Note that within
the space of functionsthat are spannedby the set of trial functions,

the differentiation is exact. For example, if we use a polynomialba-
sis, the derivative of any quadratic function is evaluated exactlyby the

collocation differentiation operator constructed above.
The choice of collocation points depends on the basis functionsand
is based on the followingidea. A weighted integral (inner product)of
functions
u(x)(x)w(x) dx
can be approximated as a sum
(2.96)
where wj * w (xj) in general. It can be shown that for certain choices

of u(x) and v (x), the points xj and weights wj can be chosen so that
(2.96)is exact. These points are the ideal choice for collocationpoints.
2.9 Numerical Solutions of Boundary-Value
example, let u and v be periodic
u(x)=
n/2-1
Problems
219
functionsthat
canbe
keikx
written
n/2-1
i'keikx
0 < x < 2TT.Equation

in the domain
(2.96),modified
n,
which
is
redundant
due to periodicity, to excludethe
term J
integralif xj = 2Ttj/n and wj = 2TT/n.Similarly, yields the exact
if u and v are
nomialsof degree n, (2.96) can be made
polyexact
using
the GAUSSIAN
FORMULAS.Canuto
INTEGRATION
et al. (2006)
provide a detailed
dis-
cussion.
chebyshev collocation.
Chebyshev polynomials
are aparticularlypoptrial
functions
as
for
the
choice
ular
collocationmethod. These
functionsare the solutions to the Sturm-Liouville

equation
dy +
2 d2y x
v2y = 0
dx
(2.97)
Whenv is an integer, this equation alwayshas a polynomialsolution
called a Chebyshev polynomial (of the first kind) Tv(x); see Exercise
2.36.These polynomials form an orthogonal basis in the domain-1 <
x < 1 with the weight function w (x) = (1 x2)-1 and have the form
To(x) = 1
Tl(x) = x
= 2xTv(x) - Tv-I(x)
Aswith Legendre polynomials, Chebyshev polynomials also arise from
Gram-Schmidt orthogonalization of the set {1,x, x2,... }, but now us-
ing the weighted inner product. A particularly important propertyof

Chebyshev'sequation is that when using the coordinatetransformation
x = cos e, it reduces to
d2y
d 02
andthe Chebyshevpolynomials become

(ve)
Tv(0) =
collocation
< O < TT.In this domain,the optimal
in the domain TT
domain -1 < x < 1
original
the
in
which
spaced,
points are uniformly
resultsin the points
xj = COS,j= O,...,n
220
These points are very closely spaced near x = 1, makin
collocationan attractiveapproach for problems in w

dients near boundaries are expected. The differentiation
given by
CL (-1) l +j
operator
is
I-EIT,
Dn,lj
2(1-Xj)
2n2+1
2n2+1
+ jn
where - = 1 +
As with the Legendre-Galerkin method, the natural setting for
Cheby_
Shev collocation is the domain (1,1). For our example problem,
(2.91)
this domain,the equations of the
transformedinto
tion approximationare
4(D2) U + U
Chebyshevcolloca_
This gives n 1 equations; the additional two equations come fromthe
boundary conditions: [Jo= Un = 0. This is a set of

1 algebraicequations in n + 1 unknowns and can be solved in the usual way. Because
it uses orthogonal polynomials as trial functions, the Chebyshevcollocation method also achieves the exponential convergence illustratedin
the Legendre-Galerkinexample.
2.10 Exercises
Exercise 2.1: A linear constant coefficient problem
Find the general solution to
where
= Ax
-1 -1
1 -1 -1
o
Express it so that only the arbitrary constants are (possibly) complex. Youshouldbe
able to solve the problem without explicitlyperforming any similarity transformationS,
i.e., you should not need to invert any matrices.
Exercise 2.2: Phase plane dynamics of a linear problem

Find the general solution to
= Ax
2, 10 Exercises
221
where
14 -16
sketch the dynamics on the phase
Plane
to showthe invariant directions and in the original
the stability
Exercise2.3: Members of function

spaces
coordinate system,
along those
being careful
directions.
Determine which of the following

functions are
in the linear
space spanned
by the set
2x
1. cos
2. cosx(cosx + sin x)
3. 1 + sin2 x
4. 1 + cosx
Hint:remember to look at the basic trigonometric

identities.
Exercise 2.4: Weighted inner products and

approximation of singular
func-
Considerthe function f (t) = 1 It in the interval (0,

1].
(a) Show that f (t) is not in L2(0, 1), but that
it is in the Hilbert space
Law(0,1),
where the inner product is givenby
(x,y)w x(t)(t)w(t)dt
andw(t)
= t2.
(b) From the set {1, t, t2 t3, t 4}, construct a set of ONbasis functions
for L2,w(0, 1).
These are the first five Jacobi polynomials (Abramowitzand Stegun,
1970).
(c) Find a five-term approximation to 11t with this inner product and basis. Plot
the exact function and five-term approximation. Computethe error betweenthe
exact and approximate solutions using the inner product aboveto define a norm.
This type of inner product is sometimes used in problemswherethe solution

is known to show a singularity. As your analysis will show,polynomialscan be
used to get a fairly good approximation except very near the singularity.
Hint:this problem is a good excuse to begin using a symbolicmanipulationprogram
like Mathematica. The calculations are not hard, but they are tedious and that is exactly
the kind of problem Mathematica is good at.
Exercise2.5: Fourier series of a real function

For a real function f (x) with Fourier series representation
the Fouriercoefficients satisfy Ck= c -k.
Exercise2.6: Fourier series of a sawtooth function

Considerthe "sawtooth" function in the domain 0 < x
if x < Tt
x
f(x) = 2TTX
ifxr
27T
Cke ikx,
show that
222
for this function using the basis functions
coefficients
Findthe Fourieras 1/1<2 as Ikl 00.
decay
that they
series of a square wave

Fourier
Exercise2.7:
for the "square wave" function
exercise,but
Repeatthe above
if X < TT
1ifxr
the fact that this function

the integrals by using
of
is
all
redoing
Avoid
(so its Fourier series is the derivative of that
derivativeof the sawtooth Fourier coefficients decay as l/k as Ikl

the
Showspecificallythat
approximation to this function , 1s.e, 10 s k
s 10
to plot the 10-term
MATLAB
of the finite element method

Exercise2.8:Basisfunctions
described in Section 2.9.1.
Considerthe hat functions
inner products
(a) For N = 2, find the
the set orthogonal?
= 0,
' ..,N.1s
the function f (x) = 1 + x(l x) in terms

(b) Approximate(in L2(0,1))
of thehat
solve
a
and
linear
find
system
is,
That
2.
=
for
N
with
the
functions
coefficients
Ciin the expression f (x)
integrals.
Hint: use symmetry to save time evaluating
Exercise2.9: Parseval's equality

Consider a function f (x) , represented in an orthonormal basis as a generalized Fourier
with Ci = (f (x), (i(X)).
series:f (x) = E
ICi12. This result is called PARSEVAL'S

EQUALITY.
(a) Show that llf112=
(b) Consider a truncated approximation, f (x)
E i=0
where
mightbe
different from ct.Showthat in fact, the truncation error

Ei=o
when
=
Ci.
This
result
shows
that
is smallest
the generalized Fouriercoe
cients are the optimal coefficients for the truncated representation of f.
Exercise2.10: Fourier series of a triangle wave

Consider the Fourier sine series approximation for the triangle wave depictedin Figure 2.32.
fM(x) =
n=l
an sin(nrrx)
(a) Find the coefficientsan, n = 1, 2
To save time you may find the following
integral formulas useful
(mx + b)
sin(nrx)
x e [0, 11
= mx + b cos(nrx)
nrr
1
= -nm,
sin(nTtx)
+ (TITT)2
n, m = 1,2,..
2.10
Exercises
223
Figure 2.32: Triangle wave on [0,

1].
(b) plot the function fM(x) for M = 5, 10, 50 with parameter a = 0.1 to demonstrate
convergence to f (x). How many terms are required to obtain
good accuracy?
Exercise2.11: Differentiating integrals
Usethe Leibniz rule for differentiating integrals to solve the following two problems.
(a) Checkthat the solution to the differential equation
dy +
dt
= q(t)
with initial condition y (0) = yo is
Rememberto show the solution satisfies both the differential equation and initial
condition.
(b) Derivea Leibniz rule for differentiating the double integral

b(t) d(t,p)
f(t) =
h(t, p, s)dsdp
a(t) c(t,p)
Youranswer should not contain the derivatives of any integrals.
Exercise2.12: Convolution theorem

(a) Usethe definition of the Laplace transform to derive the convolution theorem
f(t')g(t t')dt'l =
(b) Use the definition of the inverse Laplace transform to derive the convolution
theoremgoing in the other direction
- t')dt'
Whichdirection do you prefer and why?
Ordinary
224
Differential
eqt.4Q
Exercise 2.13: Final and initial-value theorems
Ohs
Twouseful theorems are the final and initial-value theorems

lim f (t) = lim sf (s)
if and only if sf (s) < oofor all s such that Re(s)

otherwise lim f (t) does not exist
00
and
lim f (t) = Slim

00sf (s)
(a) The conditions on sf (s) for the final-value theorem are crucia
below, state which satisfy the conditions and give their final 1.Forthe
values functions
1.
2.
3.
4.
1
1
s(s a)
1
s(s + a)
Re(a) > O
Re(a) > 0
(b) What are the initial values, f (0+)?
(c) Invert each of the transforms to get f (t) and check your results.
Exercise2.14: Network of four isomerization reactions

Consider the set of reversible, first-order reactions
taking place in a well-mixed,batch reactor. The reactions are all elementaryreactions

with corresponding first-order rate expressions. Let the concentration of the species
be arranged in a column vector
CA
CC CD CE
(a) Write the mass balance for the well-mixed, batch reactor of constant volume
dc
What is K for this problem?
dt
(b) What is the solution of this mass balance for initial condition c(0) = co?What
calculation do you do to find out if this solution is stable?
(c) Determine the rank of matrix K. Hint: focus on the rows of K. Justifyyour
answer. From the fundamental theorem of linear algebra, what is the dimension
of the null space of K?
(d) Whatis the condition for a steady-state solution of the model? Is the
state unique? Whyor why not?
steady
2.10 Exercises
Exercise2.15: Network of
consider the generalization first-order
of Exercise
taking place in a well-mixed,
chemical
reactions
2.14 to
the
following
set of n
batch reactor.
The reaction
rate for the
ith
Letthe concentration of the

species be
arranged in a
columnvector
CAI CA2
(a) Write the mass balance
for the
well-mixed,
batch reactor
dt
reactionis
of constant
volume
(b) What is the solution of this

mass balance
for initial
conditionc(0) =
(c) What is the steady-state
co?
solution of the
model? Is the
steady state
unique?Why
(d) What calculation would You do
to decide if the
steady state is
stable?
Exercise2.16: Using the inverse Laplace

transform formula
Establish property 4 of the Laplace
transform pair given

in Section2.2.4,
whichstates
df(s)
ds
This formula proves useful in Exercise 3.19.
Exercise2.17: ODEreview
Solvethe following ODEs: unless boundary conditions are
given, find the general solution:
(a) y' = ex+0 ' (separable)
(b)
= y 2,y(0) = 1 (separable)
(c) (y 2x)dy (2y x)dx = 0 (exact)
(d) (x 2 y 2)dy = 2xydx (integrating factor)

(e) xdy + (y + eX)dx = 0 (integrating factor)
Exercise2.18: General solution to a first-order linear system of ODEs

Findthe general solution to j' = J y, where
oil
x 10
226
A linear
Exercise2.19:
systemdynamics on the phase plane
system
Considerthe
-1
1 -1
x + h(t)
problem, i.e., with

solution to the homogeneous
h(t) O,
general
the
(a) Find
acterize its stability.
solutions on the
-- x2 plane.
qualitative behavior of
Where
the
Sketch
(b)
2.1-2.2?
Figs.
does
on
this system fit
and characterize
inhomogeneous problem with h(t) = (1,
the
solve
Now
(c)
its
stability.
of a freely
Exercise2.20: Dynamics
equations
Consider the system of
rotating rigid body
-13)
(13 11)
13)3 = (01 (02 (Il 12)
equations describes the motion of a rigid body

11 > 12 > 13 > 0. This set of
freely
of inertia of the body relative to eachof

rotatingin space. The Is are the moments
the
the
angular
are
velocities
(os
the
with
and
body
the
respect
of
to those
principalaxes
axes.
(a) If( = ((l, )2,(3) is a steady state of this system, find the linearizedequation
= ((l, (2,()3)from the steady state.
for
(b) Find three steady states of the system that satisfy (01 + c02 + (03 = 1. Whichare
linearly stable?
(c) Sketch,in the ((l, (2, (3) phase space, the qualitative behavior of trajectories
that begin near each of the steady states, using the linearized equationsasyour
guide.
(d) Yourresults can be tested experimentally. The principal axes of a bookare,in

order of decreasing moment of inertia, the axis passing through the frontand
back covers, the right and left sides, and the top and bottom. Experimentally
assess the stability of free rotation of a book with respect to these threeaxes.
(Youhave to do something to keep the covers from flying open whilethe book
spins.) Do the theory and experiment agree?
Exercise 2.21: Duffing's equation

DUFFING'S
EQUATION
describes the dynamics of an undamped beam
x + x 3 = 0
0
wherex is proportionalto the
beam. When >
displacement
the
of
the
middle
Of
the beambuckles:the "unbuckled"
state x = = 0, is unstable.
(a) The two nontrivial steady
states are x =
linearizations around those states.
Ofthe
= 0. Find the eigenvalueS
2.10 Exercises
(b) In this model there is
no friction
so the
total energy
(kineticPlus
elastic)is
x4
A given initial condition

will have a
tory must have the
specified
same
a curve of constant H. value of H for all value of H, and
the
time,
Show that
the trajectoriesso a trajectory resultingtrajecstates are closed curves
and thus that
near the two in phase spaceis
the linearized
nontrivialsteady
equations
givethe correct
Exercise2.22: Predator-prey model
The following model describes a

"predator-prey"
= Xl(l
In this model,
> O and
> 1, and
system: species
1 eats the
grass
and
X2)
and
(a) There are three steady states to this
represent the sizes
model.Findthem.
(b) Find the linear stability of each of

the steady
system, the trace and determinant
of the prey and
states. Since
criterioncanbe used.this is a 2-dimensional
(c) Draw the phase-plane behavior near

each of these steady
states.
Exercise 2.23: Cell in shear flow

The following differential
equation arises from a model

of a cell moving in a shear
flow
= A+ cos 20
where O is the orientation angle of the cell with respect

to the flow
a parameter that is determined by the geometry and mechanics direction and A is
of the cell (Kellerand
Skalak, 1982).
(a) For A = 0, there are four steady states in the domain0 <
and determine which ones are linearly stable.
2TT.Findthem
(b) Draw the trajectories in phase space for A = 0, along with the steady states.
Here phase space is simply the line, and since e is periodic can alternatelybe
considered to be just a circle with unit radius.
(c) For A larger than a certain value, this equationhas no steady-statesolutions.

What is that value? What do the phase-space dynamics look like, i.e., draw a
picture, when A exceeds that criticalvalue?
Exercise 2.24: Steady-state heat conduction in an annulus
in FigConsiderthe steady-state conduction of heat in a solid annularregionshownrate is

heat-generation
ure 2.33. There is uniform heat generation in the solid. The
givenby
Q = so(l +
- To))
228
Ordinary
Differential
Equ
Qti0hs
Figure 2.33: Annulus with heat generation in the

solid
in which is a dimensional constant. The inner wall of the annulus
is
the outer wall is at constant temperature To. The material has thermal insulated
conductivityand
(a) Writethe steady-state heat equation with the source term.
k
(b) Define dimensionless variables
soR2
Showthat the model reduces to

+ 2 = 1
with boundary conditions
(c) Whatis the complementaryfunction?

(d) Byinspection, what is a particular solution?
(e) Using the two boundary conditions, specify the two unknowns in the comple
mentary function.
(f) Plot
for the followingvalues

K = 0.8
[1, 3, 5, 7, 7.5]
profile
Exercise 2.25: Existence of a positive steady-state temperature
Consider Exercise 2.24 again.
and = 7.5,
(a) Plot and compare the solution 9) if you set K = 0.8
What happens as you increase in this problem?
for
assuming
you
are
What
(b) Lookagain at how you solve for the constants , c2.
this solution to exist?
2, 10
229
Exercises
ranging from O to 0.99, find and plot the critical value of such
exist. If you exceed this critical value of F,
solution for , c2 does nottransient
(c) For
the
that the
in
heat-conduction problem?
think happens
whatdo you
es
K valu
through a porous medium in a tube
2.26: Flow
Exercise modificationof Darcy's law for flow in porous media is
an's
Brillkrn
be containing
flowin a tu
axial
For
1 d
r dr
a porous medium this becomes

dvz
dr
permeability of the porous medium

viscosity of the fluid
pressure difference + gravity driving force
z-component of the "superficialvelocity"v
radius of tube
length of tube
boundary conditions are vz (R) = 0 and vz (O) < 00.
Reasonable
dimensionless velocity and radius
(a) Introducea
vzuL
in which
A'2R2
and rewrite the differential equation and the boundary conditions in terms of
the dimensionlessvariables. Howmany dimensionless parameters does the new
differentialequation contain?
(b) Obtaina particular solution of the differential equation obtained in (a) by inspection.
(c) Obtainthe solution of the homogeneous equation; it should contain two constants. Oneconstant can be immediately evaluated from the boundary condition
at p = 0. Why?
(d) Evaluatethe remaining constant using the boundary condition at p = 1. Write

thegeneralsolution (p) to the differential equation. Plot (p) for permeability
K/R2 = 0.01,0.1, 0.3, 1.0. Also include on this plot the velocity profile for HagenPoiseuille flow.
(e)Evaluatethe averagedimensionlessvelocity W) and show that
m(p) dp
Jolpdp
1-
11(+)
PlotW) versus K/R2 with a log scale for the

x-axis for 10-4
K/R2 102
Ordinary
230
(f) Showthat in the limit of
Differential
small permeability,K, the result
(g) Showthat in the limit as K

an empty tube).
00,
(which is exactly
in (e)
the result
for flow
in
Exercise2.27: Laguerre's equation

The ODE
xy" + (1 x)y' + Ay = 0
whereAis a constant, is called Laguerre's equation. It arises in determinin
function for the electrons of a hydrogen atomthe orbitals that you
lear:
quantum mechanics (and thus the structure of the periodic table) emerge
in partfrom
(a) Showthat x = 0 is a regular singular point.
(b) Determinethe roots of the indicial equation and one solution to this
problem
(c) Showthat when is a positive integer, this solution reduces to a polynomial

These polynomials are called the Laguerre polynomials.
Exercise 2.28: Hermite's equation

Hermite's differential equation is
u" 2xu' + 2ku = 0
Among other places, it arises in the solution of Schrdinger's equation for a particlein
a potential well.
(a) Write Hermite's equation as Lu + Au
0, where
form of a Sturm-Liouvilleoperator, Lu
w(x) = e-x2. Whatare p(x) and r (x)?
= 2k and L takes the standard
(b) Consider the inner product
(a,
= lim
a(x)b(x)w(x)dx
where w (x) is as given above. What boundary conditions must we imposein

=
the limit C 00so that L is self-adjoint, i.e., so that (Lu,
(c) The point x = 0 is an ordinary point for this equation. Find the generalsolution
by series expansion around this point. Show that if k is an integer, onesolution
to the equation is a polynomial. These polynomials are known as the Hermite
polynomials.
Exercise 2.29: Series solution

Find the general solution to the differential equation
(x 2 x)u" xu' + u = 0
Start by seeking a solution of Frobenius form, expanding around x = O.
2.10 Exercises
231
Exercise2.30: Another series solution

general solution to
Findthe
5x2y"
+ (x3
Expandaround x = 0 and keep up to quartic

terms.
=O
Exercise2.31: Bessel's equation: singular

solution
equation of order
TheBessel
zero is
and the associated Cauchy-Euler equation is

x 2y" + xy'
=0
(a) Findthe general solution to this Cauchy-Euler

equation.
(b) Motivated by this result, seek a second
solution to the the
the form Y2(x) = Jo(x) Inx + g(x),
Besselequation,of
where g(x) has a
power series solution.
be convenient to note that g is even
It
and write it as
2n
x
g(x)
first
two
the
terms
Find
in the power series
= En=ocn (2)
for g.
Exercise2.32: Sturm-Liouville problem with mixed

boundary condition
Sturm-Liouville
Considerthe
eigenvalue problem
u" + Au = 0,
u(0) = O,u(l) + u' (1) = 0
Findthe eigenfunctions of this problem and the nonlinear

algebraic equation that determines the eigenvalues X. (This equation cannot be solved
analytically.)Draw a sketch
that there will
be an infinite number of these eigenvalues,

that indicates
and use your
sketchto propose an approximation for the eigenvaluesthat is valid in the situation
Exercise2.33: A higher-order variable coefficient problem

Findthe general solution to the third-order equation
x y +3x2y" 3xy' = 0
Exercise2.34: A fourth-order variable coefficient ODE
The following differential equation arises in the analysis of time-dependent flow of a
polymericliquid
x 2D 2 -x 2 +2-2xD)
-2iD
whereD = a-. This equation has solutions of Frobenius form. Find the roots of the
indicialequation.
232
Legendre's
Exercise2.35:
Legendre's equation
have the form
is
equation
(1 x2)y" 2xy' + 1(1+ 1)y = 0
| (1+1)
Y2(x) = x
4!
5!
3!
recursion relation, show that for every integer l, one of these

Byexaminingthe
series
a polynomial. These are the Legendre polynomials.
willtruncate,becoming
equation
Exercise2.36:Chebyshev's
Chebyshev's equation is
(1 x2)u" xu' + v2u = O

the approximation of functions and in numerical
Its solutions are important in problems.
solution
methods for boundary-value
problem Lu + Au = 0 in the
(a) Put this in the form of a Sturm-Liouville What
domain
(1
boundary
=
w(x)
v2,
=
conditions
with
1],
[1,
mustu
and u' satisfyat x = 1 for self-adjointness to hold?
(b) Byexpandingin a power series about x = 0, obtain two LI solutionsof this

equation. Showthat when v is a nonnegative integer, one of these is always
a
polynomialof degree v. Because these satisfy a Sturm-Liouville problem,these

polynomials form an orthogonal basis for L2,w (1, 1), with w (x) = (1 x2)-112
(c) The points x = 1 are regular singular points for this equation. As a firststep
toward findingthe behavior of the solution near these points, find the rootsof
the indicialequation for a solution in Frobenius form expanded aroundx = 1.
Exercise 2.37: Laplace's equation as second-order, variable coefficient ODES

+
Expressthe radial part of Laplace's equation V2y y = 0 in the form
al
+
= 0.
(a) What are ao, al , (.12for one-dimensional rectangular coordinates, cylindricalcoordinates, and spherical coordinates?
(b) What are two linearly independent solutions for each coordinate system?
Exercise 2.38: How many solutions?

Considerthe second-orderdifferential equation
d2u
(a) Howmanylinearlyindependent
solutions exist for the single boundary
u(0) = u(l)
condition
2. 10 Exercises
(b) How many
lin early
in
deendent
(c) How many
solutions
lin early
ZZ(O)
in
deenaent
(d) What can

you
order differential
conclude
ab
operator
Exercise 2.39:
Heat
solutions
QRist
for the
Oun
twob
conditions
ound
conditions
conduction
With
for
heat
conduction
Supposewe set up
heat
the body at the samethe problem
Witha
temperature.
(a) Identify the
temperature
controller
tional Bl, so appropriate
that keeps
this problemdifferential
the ends
operator, L,
can be written
of
and
as
associated
boundary func-
BIT = O
(b) Notice that we
do
not
solve (2.98) uniquely. have enough
boundary conditions
Define the adjoint
to expectto be
operator and adjoint
ableto
boundary
func(Lu,v) = (u, L*v)
for every admissible
u
ary condition on u (x) , (x) and v
that sinceYouare
you will require Notice
missinga boundthree boundary conditions
on v (x). What
(c) What are the null spaces of L

and L* with their associated
For which f can (2.98) be solved?
boundaryconditions?
Is the solution
general form of the solution?
unique?If not,whatis the
(d) Solve (2.98) using any method at your disposal.Laplacetransforms

would
work,
for example. Check your solution by substitutingintothe differential
equaand
tion and boundary condition. Does your solutionagreewiththe existence
uniqueness result you determinedpreviously?
g apidentifythefunction
(e) What is the Green's function for this problem,i.e.,
pearing in the T(x) solutionas

1
f
not involving
g (x, E)f (E)d; + terms
234
conditionsand general solution for

Solvability
2.40:
Exercise
differential operator
second-order
Considerthe
Lu --
and two boundary

conditions
dX2
Blu = u(TT)
du
dx
B2udx
conditions, L* , Bf, and B*

operator and boundary
adjoint
(a) Find the
and N(L*).
null spaces N(L)
the
Find
(b)
can you solve the

(c) Forwhat f
nonhomogeneous problem
Lu = f (x)
Bl(u)
(d) Forf
B2(u) = Y2
cosx) = Y2
answer:(f, sinx) = Yl (f,
what is the general solution?
satisfyingthis solvability condition,
answer:u(x) =
f (9 sin(x 9d + a cosx + b sinx
Exercise2.41: Steady-state temperature profile
in Examples 2.11 and 2.12 using Laplace

Solvethe steady-state heat-conduction problems
transforms.
Txx= f
Txx=f
Tx(0) = O
Exercise2.42:Heat-transfer boundary conditions

Considerthe one-dimensionalsteady-state heat-conduction problem
-k d2T(x)
dx2 = (x)
d2T
dX2
=f
f = -/k
Consider Newton's law of cooling boundary conditions
ho(Teo- TO)) = -kTx(O)

= -kTx(1)
in whichho, hi are the
heat-transfer coefficients at the two ends, and Teo,Tel arethe
temperatures
providingthe heat-transfer driving forces at the two ends.
2.10
Exercises
235
(a) Write
this problem as
BIT = Yl
What are D, Bl and B2, and
B2T =
and )'2?
(b) Solvefor the steady-state temperature profile.

(c) For what f(x) does the solution exist? For
these
Exercise2.43: Orthogonality of Sturm-Liouville
f(x), is the solution

unique?
eigenfunctions
of a Sturm-Liouville
showthat two eigenfunctions ul and
problem(pu')' + ru +
orthogonal
if
the
inner product weighted
= 0 are
with
w
is used.
zeroboundary conditions u(a) = u(b) = 0. Multiplythe equation Consideronly
for (setting
by 112',multiply the equation for
(setting = 2* M)
by 111',subtract and
integrateover the interval. Use the boundary conditions
and integration by parts to
proveorthogonality.
Exercise2.44: The convection-diffusion operator

Forproblems with convection and diffusion, an important differential operator
is
d2u
du
dx2 + pe
dx
boundary condition u(0) = u(l) = 0. Pe is the PECLET
number,measuringthe
relativeimportance of convection and diffusion.
(a) Find the adjoint of this operator, first with an inner product with a constant
weight function w (x) = 1, and thenwiththeweight functionw(x) = exp(-Pe x).
(b) Solvethe eigenvalue problem Lu + Xu = for arbitrary Pe. Hint: since the equa-
tion has constant coefficients, express the solution as y(x) = eiwc. Plot the
eigenfunction corresponding to the first (closest to zero) eigenvaluefor Pe = 10.
Exercise2.45: Testing a CSTRoperating condition for stability22

Thereaction
r = kCA= koe-EITCA
is carried out in a CSTR.The mass and energy balances are given by

dCA
dt
kCA
dt - VRPp
Tf-T kCA
pep
conditions in the followingtable.

Find the three steady states corresponding to the
steady states is stable or unstable.
Determinewhether each of these three

22
See also Exercise 6.7 in Rawlings and Ekerdt (2012)
Ordinary
236
Parameter
E
f
CA
110
Differential
Value
7550
298
Units
kmol/m3
0
-2.09 x 10 8
4.48 x 106
4.19 x 103
103
J/kmol
vs
J/(kg K)
kg/m3
18 x 10-3
m3
60 x 10-6
m3/s
solver
Exercise 2.46: Choosing an ODE
the dynamics of a chemical reactor

You are given the task of modeling
in
rate constants for the reactions which a large
The
occurring.
are
reactions
of
number
vary between
code on a fourth-order Runge-Kutta
Is-I and 107s -1. Willyou base your
scheme,an
scheme? Why?
explicit Euler scheme, or an Adams-Moulton
Exercise 2.47: Numerical stability criterion for RK2

Derive the numerical stability criterion for integrating the single equation
vsith the second-order Runge-Kuttamethod. Allow
to be complex. Hint: the = Ax

general
solution to a linear constant-coefficient difference equation anx (n) + an- x(n- l)

0 is of the form x
Exercise2.48: Dynamics of a nonlinear problem

Consider the pair of ODES
521= (1 -M) - 10Y12Y2

522 = -0.05Y12Y2
(0) = 0.2, Y2 (0) = 1

with initial conditions
(a) Find the Jacobian of the RHSat t = 0. Show, using the eigenvalues of the Jacobian, that the you expect the problem to be stiff.
(b) Writea computer program to use the Adams-Moulton second-order methodto

solve the initial-value problem. Integrate the equations out to t = 20 and plot
the solutions. Can you find any stability limit on the time step?
(c) Write a second-order Runge-Kutta program and attempt to use it for the above
problem.Whattime step do you have to use to get a stable result?
(d) Modifyyour RKcode to use variable time steps. Use the criterion that At <
/5. Estimate tmtn from the values of y/ S' at each time step.
Exercise 2.49: Solutions of difference equations
many
When examining the numerical stability of integration schemes, as well as in
other situations, we run across the linear constant-coefficient difference equation
(2.99)
aMYn+M+ aM-1Yn+M-1 +
+ aoYn = O.
2.10 Exercises
237
Forexample, could be the value of y at the
nth
(a) Showthat this equation can be written in vector time step of someprocess.
form
= Gxn
Whatare x and G in terms of y and a coefficients?
(2.100)
(b) Giventhe initial condition xo, find the solution
to this equation
of n and xo) in the situation where A has
distinct eigenvalues (i.e.,xn in terms
X.
the
case
for
where
Repeat
(c)
(d) What is the general criterion for asymptotic stability

of the steady state
Exercise2.50: Numerical integration for undamped oscillations

initial-value
x = O?
problems = f (u) are important

second-order
for many applications.
f
(u)
case
q2udo the following:
Forthe specific
(a) Find the exact general solution.
(b) Byletting = v convert the equation to a pair of first-order
equations and show
that the forward Euler method is always unstable for integrating
these.
(c) Consider the following numerical integration formula
tin +1 2un + un-1 =
For f (u) q2u find a quadratic equation for the growth factor G for this
method,i.e., look for solutions of the form un+l = Gun. Up to what threshold
(qAt)2 are the numerical solutions stable?
(d) Byexpanding all terms in Taylor series around time step n, find the local trun-
cation error p of this formula (the first power of At that does not cancel).
Exercise2.51: The velocity Verlet algorithm of molecular dynamicssimulation
is very commonly used to perform numerical time

VERLETALGORITHM
The VELOCITY
integration for molecular dynamics simulations. Consider the numerical stability prob-
lemfor a very simple case
=v
= ax
wherea e R.
(a) What property must a satisfy so that the true solution x = O,v = Ois stable?
(b) For this problem, the velocity Verlet algorithm becomes
xn+l = xn + vnAt + axnAt

Vn+l = vn + (aXn+ axn+1)At
Putthis expressionin the form
wherex = (x, v)
Ordinary
238
(c) Find the criteria that
Differential
aAt2 must satisfy for numerical stability

of the
Exercise2.52: Stability of predictor-corrector methods
fourth-order) predictor-corrector
Denote the general (up to
x
formulas for
the
(nl)
n
= x (n ) + W PIX ( ) + P2X
x (n) + W
x 01+1)
+ p4x(n-3)
+ C2X(n)+ c3x(n-l)
in which w = aAt. The coefficientvectors of the first four Adams-Bashforth
predictors
c{l}=
p{3} -
10, 0]
-59, 37,-9]
(1/12) [5, 8, -1,

0]
19,-5, 1]
Show that combining the two steps gives

x
x (n ) (1 + WCI + w(cltvpl
+ Q)) + X (n-l) (w (cltvp2 +
x 01-2)(tv(c1tvp3 + C4)) + x(n-3) (wc11vp4)

Let z(n) = (x(n) x(n-l) x(n-2) x(n-3)), and find matrix G such that
z(n+l) =
The eigenvalues of G(w) then determine the stability of the method.
Exercise 2.53: Stability boundary of predictor-corrector methods

Given G(w) from the previous exercise, to map out the boundary of the stabilityre-
gion, consider (0 = e i0 for 0 O 27T,so co has unit magnitude, and solve the single
algebraic equation det(G(w) WI) = 0 for the complex value w as a functionof pa-
rameter e. The stability boundary of the APCmethod then comprises these valuesof
w. That is how Figure 2.27 was prepared, for example.
orderin
Now consider the class of predictor-corrector methods that use the same with
predictor
the predictor and corrector. Recall the methods in Figure 2.27 used a
through
order one less than the corrector. Find the stability boundaries for first-order
Contrastthe
fourth-order methods. Compareyour calculated results to Figure 2.34.
standpoint, which
stability results displayed in Figures 2.27 and 2.34. From a stability
class of methods do you prefer and why?
boundary. Why
You will need to increase the O interval to [0, 47T]to close the stability out the square
do you suppose this increased interval is required? Consider mappingclose?
root function on the unit circle using O e [0, 2TT].Does this boundary boundaryto
in the
Youwill need to clip off some unstable regions made by loops
match Figure 2.34.
2.10 Exercises
239
1.5
APC4'
APC3'
0.5
APC2'
-0.5
-1
-1.5
-2.5
-2
-1.5
-1
Re(NAt)
-0.5
0.5
Figure 2.34: Stability regions for Adams

predictor-corrector
methods;
Rx; APCn' uses nth-order
predictorand nthorder corrector.
2.54:Airy's equation
Exercise
Theequation
arisesin optics, quantum mechanics, and hydrodynamics, and is known as
Airy's equation.
(a) Find an approximate power series solution to this problem (expandingaround

x = N),
keeping terms up to fourth order.
(b) Usethis solution to approximate the first two eigenvalues \ of the equationwith
boundary conditions y (0) = y (1) = 0. Use Newton's method if necessary.
(c) Usethe finite element method to construct an algebraic eigenva\ueproblem for

the Airy equation. Find the approximate eigenvalues and eigenfunctionsusing
six hat functions as the approximate basis and also using 12 functions. Are all
of the eigenvalues of the algebraic problem good approximations of the exact
eigenvalues? You may use Mathematica or a programming language,whichever
you prefer.
240
Exercise 2.55: Applying Galerkin and collocation methods

Solve the problem
x2y" + xy' + x2y =
=0
using
(a) The Legendre-Galerkin method.
(b) The Chebyshev collocation method. Recall that the Chebyshev
Exercise 2.56: Modeling a tubular reactor: convection, diffusion,

and
The equation
2u' = u" + l,u(l) = 0, u' (1) = 0

models the temperature profile in a tubular reactor in which an exoth
ermic
occurs.
reaction
(a) Find the exact solution.
(b) Use the Galerkin tau method to construct an approximate solution. Usethe
= 1, (x) = x, (2(x) = (3x2
Legendrepolynomialbasis set:
- 1)/2.
and
u'
(0)
u (1) to compare the approximate
at
look
Sketch the solution and
and
solutions.
exact
Exercise 2.57: Converting a differential operator to an algebraic operator

Solve the eigenvalue problem
x 2y" + xy' + x 2y = 0, y' (0) = y(l) = 0
using the Legendre-Galerkinmethod. You should be able to reduce this problemtoa

linear algebraproblem of the form Ac + ABC= 0. Note that because of the boundary
conditions, B will be singular, but A will not. How many basis functions do you need
to compute the first three eigenvalues to four-digit accuracy? Plot the first foureigenfunctions. This is the eigenvalueproblem for Bessel's equation of order zero.Inthe
chapter, we showed that the eigenvalues of this problem are related to the rootsofthe
Bessel function Jo.
Exercise 2.58: An eigenvalue problem with finite elements
Solve the above problem again, using the finite element method with the "hat functions"
described in Section 2.9.1. Study how the approximation converges as the numberof
node points N increases. Also look at the computation time as a function of N,
Exercise 2.59: Chebyshev collocation for a nonlinear problem
program to solve
Using the Chebyshev collocation technique, write an Octave or MATLAB
the boundary-value problem (a steady-state reaction-diffusion problem)
ET" + T- T3 = O,TGI) = T(l) = O

Studyhowthe
for e = 0.05. Use the initial guess T = 1 to find a nontrivial solution.increases.Also
approximationconvergesas the number of collocation points N + 1
look at the computation time as a function of N.
2.10 Exercises
41
Lxercise2.60: Attractivity and asymptotic
stability for
showthat asymptotic stability and attractivity
linear system:
are identical
for linear
systems, = Ax
Stability
Exercise2.61:
and asymptotic
stability for linear
systems
dt
Is this system asymptotically stable? Why

or why not?
(b) Is the system (Lyapunov) stable or unstable?
Prove it.
(c) Generalize this example and provide a
checkable condition
to test for (Lyapunov)
(d) Given this result, characterize the class of linear
systems that are
stablebutnot
Exercise2.62: Lyapunov equation and linear systems

Establishthe equivalence of (a) and (b) in Theorem 2.24.
Exercise2.63: Discrete time Lyapunov function for linear
systems
Statethe discrete time version of Theorem 2.24. Show that (a)

and (b) are equivalentin
the discretetime version.
Exercise2.64: Nonsymmetric matrices and definition of positive

definite
Forreal, square matrix S, consider redefining S > 0 to mean that XTsx > O
for
O. We are removing the usual requirement that S is symmetricin the all x e
definition
definite in Section
ofpositive
1.4.4.
(a) Define the matrix B = (S + ST) 12. Show that B is symmetric and xTBx = xTSx
for all x e Pt. Therefore S > 0 (new definition)if and only if B is positive
definite (standard definition).
(b) What happens to the connection between this new definitionof S > Oand the
eigenvalues of S? Consider statement 1. from Section 1.4.4
S > 0 if and only if \ > 0,
e eig(S)
Does this statement remain valid? If so prove it. If not, provide a counterexample.
Exercise2.65: Stabilities of a linear system

Considerthe linear, time-invariant system = Ax. Characterize the class of Amatrices
forwhichthe systems exhibit the following forms of stability.
(a) Stable (in the sense of Lyapunov).
(b) Attractive.
(c) Asymptotically stable.
242
stable.
(d) Exponentially
(e) which of
stability are
these forms of
Equations
equivalent for the linear,
system?
solution to higher order

regular perturbation
a
Extending
in Section 26.4, compute
Exercise2.66:
solution of (2.67) presented
the
perturbation
For the regular
c2(r).
next term in the series,
a two-time-scalesingularper.
the outer solution in

as
QSSA
2.67:
Exercise
turbation
simple reaction
Consider the following
volume, batch reactor
mechanism taking place in a well-mixed, constant_
low-concentration species for which we wish to examine

and assume k2 > kl so B is a
the OSA.
show
(a) SolveA's material balance and
= cAoe-klt
Apply the usual QSSAapproach, set RB = 0, and show that
CBsCAs =
k2
The concentration of C is always available if desired from the total speciesbalance
ccs(t) = CA(O)+ CB(O)+ CC(0) CAs(t) CBs(t)
(b) The B species has two-time-scalebehavior. On the fast time scale, it changes
rapidly from initial concentration CBOto the quasi-steady-state value for which
RB 0. Divide Vs material balance by 1<2,define the fast time-scaletimeas
T = k2t, and obtain for B's material balance
dCB
= k1CA
CB
dT
Wewish to find an asymptotic solution for small e. We try a series expansionin
powers of for the inner solution (fast time scale)
CBi = YO+ EYI +
The initial condition, CBi = CBO,must be valid for all e, which gives for the initial
conditions of the Yn
Yo(0)
Yn(0) = O, n
Substitute the series expansion into B's material
balance, collect like powersOf
and show the following differential
equations govern the Yn
o.
1.
dT
dY1
dT
dT
n 22
2.10 Exercises
243
(c) Solvethese differential equations and show
Yo =
kl
kl/k2 -1
Yn=O n2
-- e-k1T1k2
Because vanishes for n

2, show you
concentrationfor all e by using the first twoobtain the exact solution for the B
terms.
B's
analyze
large-time-scale behavior,
(d) Next we
also called the outer
Divide B's material balance by
again but do not rescale
solution.
time
and
obtain
dCB
dt
= ek1CA -s CB
Expand CBagain in a power series of

CBO
BO +
substitute the power series into the material balance

and collectlike powers of
e to obtain the following equations
dBo
1.
dt klCA Bl
dBn
Solvethese equations and show
dt = -Btl+l
Bl = klCA
Sowe see the zero-order outer solution is CBO= 0, which is appropriatefor a
QSSAspecies, but a rather rough approximation.
(e) Showthat the classic QSSAanalysis is the first-order outer solution.
(f) To obtain a uniform solution valid for both short and long times, we add the
inner and outer solution and subtract any common terms. Plot the uniform
zeroth-order and first-order solutions for the followingparametervalues
= 10
Compareto the exact solution and the first-order outer solution (QSSAsolution).
(g) Showthat the infinite-order uniform solution is also the exact solution.
Exercise2.68: QSSAand matching conditions in singular perturbation

Consideragain Exercise 2.67 with a slightly more complex reaction mechanism
andassume that either 1<-1> kl or > kl (or both) so B is again a low-concentration

or k2 may be
speciesfor which we wish to examine the QSSA.Notice that either
for B. Onlyif
largewith respect to the other without invalidating the QSSAassumption
for this mechanism.
k-l > kl >
is the reaction equilibrium assumption also valid
244
on species B and show
(a) Apply the QSSA
CA() + CBO
CBs
kl
1+1<2
1+1<2
CA() + CBO
1
1+1<2
) e-ft#2t
in whichK2 = k2/k-1.
the A and B species have
(b) With this mechanism, both both CAand CB.Let thetwo-time-scale
inner solution behavior,
we use a series expansion for
be givenby
+
2X2
C
XI
+
+
CAi = XO
CBi= yo + EYI+ 2Y2+...

inverse of the largest rate
in which the small parameter E is the
constantin the
mechanism. In the following we assume 1<-1is largest and E = 1/1<-1.
is order unity or smaller. If 1<2wereDefine
K2 = k2/k-1 and we assume that
large
we should have chosen e = 1/1<2as the small parameter. Collect terms of like
power of and show
axo
dT
dT
1.
dXl = klX()+ Yl
dT
dXn -klXn-1 +
=
dT
dY1
dT
dYn
dT
= klXn-1 -(1+K2)Yn
What are the initial conditions for the Xn and Yn variables?

(c) Solvethese for the zero-order inner solution and show
1
XO = CA() + CBO
1+1<2 k
-(1+K2)T
YO=
-(1+K2)T
(d) Next we construct the outer solution valid for large times. Postulatea series
expansion of the form
CBO= BO + EBI + E2B2 +
Substitute these into the A and B material balances and show
Bo=o
1.
dt
dBn-l
dt
dt
= k 1AnI-(1+K2)Bn
2.10 Exercises
245
(e) Solvethese and show for zero order
Again we see that to zero order, the

B concentration
is zero after
unlike
that,
also
Note
in Exercise 2.67,
a short time.
we
require an initial
outer solution An differential equations.
conditionfor the
we obtain the
missing initial
condition
lim XO(T)=
limAo(t)
In other words, the long-time solution
(steady state) on
the fast time scale
short-time solution (initial condition) on
is the
the slow time
scale. Using this matching
= CAO + CBC)
(0 Find also the first-order solution, Bl, and show that

the
QSSAsolutioncorresponds to the zero-order outer solution for CAand

the first-order outer solution
for
Exercise 2.69: Michaelis-Menten kinetics as QSSA

Considerthe enzyme kinetics
ESE P+E
in which the free enzyme E binds with substrate S to form bound substrate ESin the
first reaction, and the bound substrate is converted to product P and releases free enzyme in the second reaction. This mechanism has become known as Michaelis-Menten
kinetics (Michaelis and Menten, 1913), but it was proposed earlier by Henri (1901). If the
either the free or bound enzyme is present in
rates of these two reactions are such that
with the QSSA.
small concentration, the mechanism is a candidate for model reduction
and
Assume kl > 1<-1, so E is present in small concentration. Apply the QSSA
reduces to a first-order, irreversible decomposition
show that the slow time scale model
satisfies
reactor, show the total enzyme concentration
batch
well-stirred
a
For
(a)
CE(t) + CES(t) = CE(O)+ CES(O)
corresponding
concentration of E. What is the
QSS
the
for
expression
(b) Find an
concentration of ES?
model's singlereactionis
reduced
the
for
(c) Showthe rate expression
kcs
1 + Kcs
k = k2KEo
k-l + k2
Eo =
+ CES(O) (2.101)
Ordinary
246
Differential
which depends solely on the substrate concentration. The

Stant K is known as the Michaelisconstant. The productio inverse Of
n rates
th
and product P in the reduced model are then simply
Ofreact
Rp= r
(d) Plot the concentrationsversus time for the full model and
followingvalues of the rate constants and initial conditions.QssAmodel
kl = 5
CE(O)= I
-1
CES(O)= O
10
cs(0) = 50
th
cp(0) = 0
Exercise 2.70: Michaelis-Mentenkinetics as reaction equilibrium

Consider again the enzyme kinetics given in Exercise 2.69.
E+ S
kl
ES
k2
k-1 >
Nowassume the rate constants satisfy 1<1,
scale
of
the
time
the
second
equilibrium on
reaction.
so that the first

reactionis
at
(a) Find the equilibrium concentrations of E and ES
(b) Showthe production rate of P is given by
Rp =
kcs
1 +1<1cs
k = k2KIE()
= kl/k-l
(2.102)
in which 1<1is the equilibrium constant for the first reaction.

Notice
is identical to the production rate of P given in the QSSAapproach. thisform
son, these two assumptions for reducing enzyme kinetics are oftenForthisreamistakenly
labeled as the same approach.
It is interesting to note that in their original work in 1913, Michaelis
and Menten
proposed the reaction equilibrium approximation to describe enzymekinetics,
in
which the second step is slow compared to the first step (MichaelisandMenten,
1913). Michaelisand Menten credit Henri with proposing this mechanismto
explain the experimental observations that (i) production rate of P increaseslin-
early with substrate at low substrate concentration and (ii) production rate ofPis
independent of substrate concentration at high substrate concentration(Henri,

1901).
The QSSAanalysis of enzyme kinetics was introduced by Briggsand Haldane

in 1925, in which the enzyme concentration is assumed small comparedto the
substrate (Briggsand Haldane, 1925). Since that time, the QSSAapproachhas
become the more popular explanation of the observed dependenceof substrate
in the production rate of product Rp in 2.101 and 2.102 (Nelson and Cox,2000).
The reader should be aware that either approximation may be appropriatedepending on the values of the rate constants and initial conditions. Althoughboth
2.10 Exercises
247
reduced models give the same

form for
quite different in other respects.
the production
Finally,
rate of P, they
particular k-l
for some
k2, both the
are often
values
Qss assumption
of rate
constants, in
and the reaction
equilib(c) show that the slow-time-scale
reduced model
for the reaction
equilibriumas-
tri
the following rate expressions
tri =
tr2 =
tr2
kcs
k = k2KIEo
kcs
= kl/k-l
Notice here we have not reduced the number

of reactions;we still have
reactions, but as before we have reduced the
two
number of rate constantsfrom
to
1<2)
1<-1,
two
(k, 1<1).The first rate
three (kl,
expression here depends on Cs
than
rather
CE
only
and
Cs as in the previous QSSA
reduction. Thereforethe
production rates of E, ES, and S depend on CEas well as
cs. Only the production
rate of P (Rp = tr2) loses the CEdependence.
(d) Plot the concentrations versus time for the full model and reaction equilibrium
model for the following values of the rate constants and initialconditions.
kl = 0.5
CE(O)= 20
k 2 = 0.5
CES(O) = 10
cs(0) = so
cp(0) = O
Recall that you must modify the initial conditions for the slow-time-scalemodel
by equilibrating the first reaction from these startingvalues.
Exercise2.71: Asymptotic expansion of an integral

Findan asymptotic expansion of the integral
f(x) =
integration by parts. Showthat the approxfor large positive values of x. Use repeated
imationis asymptotic as x
00.
always power series

Exercise2.72: Asymptotic series are not
to the two solutions of
Find the leading-order approximations
xe X= E
for
x =
1. Seek solutions of the form
where (E)
1.
1 and one where () >
one
find two dominantbalances:
248
eigenvalue problems
Perturbed
Exercise2.73:
problem
eigenvalue
Considerthe
whereA
Hint:
matrix and
is an n x n
Ax + B(x) = Ax
B(x) and x are Il-vectors. Assume that the
and uniqueness
existence
the
roiew
theory for linear equations.
analysis of a problem with a pitchfork

Multiple-scales
bifur
Exercise2.74:
cation
Consider the system
of equations
1/2 2
cy
e l / 2 xy
are both ord( 1). (They have already been scaled by 112.)perform
Assumethat x and y
letting to = t, ti = el/ 2t, t2 = ct. Show that
the solvability
expansion,
a multiple-scales
conditions require that
yo
to
yo
tl
dyo = RYO+ Yo
dt2
solutions of the amplitude equation for yo?

when to > 1. What are the steady-state
1and +1.
between
Sketchthe steady states as varies
Exercise2.75:Degenerate pitchfork bifurcation

Considerthe one-dimensional system
wheref (x; u)
and fxxx
0 at x = 0. Although this equationhas
the correct symmetry to display a pitchfork bifurcation, (2.81) does not hold because
fxxx = O.
(a) Derivethe correct normal form in this case and draw the correspondingbifurcation diagram(s).
(b) Nowlet fxxx be nonzero, but very small. How are the above bifurcationdiagrams modified?
Exercise2.76: Multiplescales to determine stability of a time-periodicsolution
Consider the stability of a periodic orbit of a nonlinear system. Let XP(t) = xp(t + T)
be a time-periodicsolution of the differential
equation
Nowlet x = xp(t) +
1.
2.10
Exercises
(a) show that
the linearized
equation for
z
whereA(t) = A(t + T) is a mat
249
takes the
A(t)z
rix operator
(b) The DAMPEDMATHIEU EQUATION

is a
time-periodic coefficients.
With
form
time-periodic
coefficients.
It is (writtenparticular case
as a single of a linear
+
second-orderequation With
+ (0 2 +
equation)
ecos2t)x =
< < 1,u =
Letting =
0
determine the stability of the ord
point z = O.
1/2. (Although this equation
Showthat
put
in
in second-order
the form z Ois stable
a solution of the form x(to, to form.) Use time scales
easiertowork
= A(tl) COs
to t,
to +
andassume
+
Exercise2.77: Oscillator with slowly
varying
Usethe multiple-scales approach with ti = t,

solutionto the problem of an oscillator with
2 d2Y
frequency
=
to find
slowlyvarying the leading-ordergeneral
frequency
Assumethat w (t) > O in the domain of interest.

Show that a
leading-order
the form yo r(tl)
will not work,
but that a solution solutionof
yo
form
r(tl)
general
more
of the slightly
will. You
see
the
that
quantity r2(k) is independent
from
scalesresult
of tl, to leading the multipleorder:it is a so-
Exercise 2.78: Multiple-scales solution to a nonlinear

oscillator problem
Usethe method of multiple scales to find a leading-order
solutionto the nonlinear
oscillationproblem
k + (x 2
Usetimescales to = t,
+ x = 0,
x(0) = 1,
= 0
= ct.
Exercise2.79: Synchronization of oscillators

Huygenswas the first to observe that two oscillators (mechanicalclocksin his case)
whose natural frequencies (01 and (02 are close but not identical can be synchronized
has since
("phaselocked") if they are coupled to one another. Suchsynchronization
been observed in a diverse range of applications, including coupledchemicalreactors.
Asimplemodel for a pair of coupled oscillators is
I
(01 + Kl sin(02
01)
sin(1- 02)
Thus these equations

where and 02 are the phase variables for the two oscillators.
when the phase difference
describe trajectories on a torus. Synchronization occurs
of to determine
Analyze the dynamics

= 02 - 01 attains a stable steady-state value.
Drawthe bifurcation
synchronized.
are
oscillators
the range of parameters in which the
synchronized
system passes from the
the
as
torus
the
on
happens
diagram. Draw what
to theunsynchronized state.
Bibliography
M. J. Ablowitz and A. S. Fokas. Complex Variables: Introductio

n and
tions. Cambridge University Press, Cambridge, 2003.
AliCQ.
M.Abramowitz and I. A. Stegun. Handbook of Mathematical Fu
nctions.
Bureau of Standards, Washington, D.C., 1970.
National
R. G. Bartle and D. R. Sherbert. Introduction to Real Analysis.

Sons, Inc., New York, third edition, 2000.
John Wiley
C. M. Bender and S. A. Orszag. Advanced Mathematical Methods

for Scientists
and Engineers. I. Asymptotic Methods and Perturbation
Theory. Springer
_
Verlag, New York, 1999.
G. E. Briggs and J. B. S. Haldane. A note on the kinetics of enzyme

action.
Biochem.J., 19:338-339, 1925.
C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. Spectral Methods

in
Fluid Dynamics. Springer-Verlag, Berlin, 1988.
C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. SpectralMethods:

Fundamentals in Single Domains. Springer-Verlag, Berlin, 2006.
W. J. Cody. Rational Chebyshev approximations for the error function.Math.

Comp.,
1969.
P. A. M. Dirac. Principles of quantum mechanics. Oxford, ClarendonPress,

fourth edition, 1958.
M. V. Dyke. Perturbation Methods in Fluid Mechanics. Parabolic Press, Stanford,
CA,annotated edition, 1975.
C. Gasquet and P. Witomski. Fourier Analysis and Applications. SpringerVerlag, New York, 1999.
H. Goldstein. Classical Mechanics. Addison-Wesley, Reading, Massachusetts,

D. Gottlieb and S. A. Orszag. Numerical Analysis of Spectral Methods:Theory
and Applications. SIAM,Philadelphia, 1977.
NewJerM. D. Greenberg. Foundations of Applied Mathematics. Prentice-Hall,
sey, 1978.
250
Bibliography
M.Grmela and H.-c. ttinger.
Dynamics
fluids. 1.Development of a
and
general
formalism. thermodynamics
of complex
Phys.Rev.
E,
J. Guckenheimer and P. Holmes.
Nonlinear
and Bifurcations of vector
Oscillations,
Fields.
Springer
Dynamical
Verlag, New
systems
York, New
York,
M.V. Henri. Thorie
gnrale de
l'action de
quelques
E. J. Hinch. Perturbation Methods.
Cambridge
M. W. Hirsch and S. Smale. Differential
diastases.
University Press,
Equations,
Comptes
Cambridge,
Dynamical Systems
and
T. J. R. Hughes. The Finite Element

Method. Dover,
Mineola, New York,
2000.
E. L. Ince. Ordinary Differential Equations.
Dover Publications
Inc., New York,
G. looss and D. D. Joseph. Elementary

stability and Bifurcation
Springer-Verlag, Berlin, second edition,
Theory.
1990.
S. R. Keller and R. Skalak. Motion of a
tank-treading ellipsoidal
particle in a
shear-flow. J. Fluid Mech., 120.27-47,1982.
J. Kevorkian and J. D. Cole. Multiple Scale and
Singular Perturbation Methods.
Springer-Verlag, New York, 1996.
H. K. Khalil. Nonlinear Systems. Prentice-Hall, Upper Saddle
River, NJ, third
edition, 2002.
O. Mangasarian. Nonlinear Programming. SIAM,Philadelphia, PA,
1994.
L. Michaelis and M. L. Menten. Die Kinetik der Invertinwirkung. Biochem.Z,
49:333-369, 1913.
A. H. Nayfeh and D. T. Mook. Nonlinear Oscillations. John Wiley& Sons, New

York, 1979.
A. W. Naylor and G. R. Sell. Linear Operator Theory in Engineering and Science.
Springer-Verlag, New York, 1982.

D. L. Nelson and M. M. Cox. Lehninger Principles of Biochemistry. Worth Pub-
lishers, New York, third edition, 2000.

E. Polak. Optimization: Algorithms and ConsistentApproximations.Springer
Verlag, New York, 1997.
BibliogtQh
252
ess
Cambridge, 1992.
Analysis and
Ekerdt. Chemical Reactor
Design Fund
G.
J.
and
second
J. B. Rawlings
WI,
edition,
Madison,
2012.
Q.
Publishing,
mentals. Nob Hill
Control: Theory
Q. Mayne. Model Predictive
and Design
J. B. Rawlings and D.
2009.
WI,
Madison,
Nob Hill Publishing,
J.-B.Wets. VariationalAnalysis. springer-Verlag,
R. T. Rockafellar and R.
1998
E.D. Sontag. Mathematical Control

edition, 1998.
Theory. Springer-Verlag, New York,

second
I. Stakgold. Green's Functions and Boundary Value Problems. John Wiley&

Sons, New York, second edition, 1998.
G. Strang. Introduction to Applied Mathematics. Wellesley-Cambridge press
Wellesley, MA, 1986.
G. Strang and G. J. Fix. An Analysis of the Finite Element Method.

Wellesley.
Cambridge Press, Cambridge,
MA, 2008.
S. H. Strogatz. Nonlinear Dynamics and Chaos: With

Applications to Physics,
Biology,Chemistryand Engineering. Westview Press,
Cambridge, MA,1994.
3
Vector Calculus and Partial
Differential
Equations
3.1 Vector and Tensor Algebra

3.1.1 Introduction
Manyof the partial differential equations (PDEs)
that we encounter as
biological
chemical and
engineers arise from field

equations such as the
Navier-Stokesequations of fluid
dynamics or the Schrdinger

of quantum mechanics. These equations govern quantities equation
(velocity,
wavefunction) that vary with position in three-dimensional
physical
space. In general, such a quantity is known as a FIELD.
Therefore,this
chapterbegins with a
discussion of the properties of vectorsand

related objects (tensors) in physical space. In general,a TENSOR
is an
objectthat has an intrinsic geometric definition,independentof coordinate system. It may be a velocity vector, a dot product between two
vectors (a scalar) or, as we shall see, even a linear operator.
3.1.2 Vectors in Three Physical Dimensions

In this chapter, we consider only vectors in three-dimensionalphysical
spaceand following convention in the physics and engineeringliterature, represent these vectors using bold type. We begin with a brief
reviewof vectors, tensors and their algebra. For now, let us consider
only a Cartesian basis for the space, with position independent,orthonormal basis vectors el, e2, e3. Any vector u can be represented as
u = Ei=l Cliei,or, using the summation convention,uiei. In CARTESIANTENSOR

notation, we streamline the notation even further, denotingthe vector as ui. The unsummed index i on indicatesthat u is a
253
VectorCalculus and Partial Differentia/
254
EquQti0hs
u; = uati.
a vector is Ilull =
of
length
The
vector. The
two vectors is determined by the dot
degree
between
of alignment
UiVj(ei ej)
U V = UV
Usingsome elementary
utt'i = llull llvll

coso
geometry, it can be shown that

1
(llu112
+ llv112
v can be expressed without referring

This result shows that u
lengths of vectors. Therefore,to a
the
to
only
but
system,
ordinate
thedot
coordinate system; it is a GEOMETRIC

product is independent of
INVARI.
of
ANT.Recallthat the inner product

the dot product.
Chapter 1 is the generalization

of
In Chapter 1 we also introduced the outer product between two
tors, also called the DIRECT PRODUCT or DYADIC PRODUCT. The outer
product between vectors u and v is the DYADIuv. A dyad is a SECONDa quantity that incorporates information regarding two
TENSOR:
ORDER
directions. (Avector, which has one magnitude and one direction,is a
first-ordertensor). A dyad can act as a linear operator
(uv) w = u(v w)
Similarly,
Notethat uv
w (uv) = (w u)v
vu. Based on this definition, we can write uv out,
includingbasis vectors
UV
In Cartesiantensor notation, uv is denoted as uiVj (the presenceof

the basis vectors ei and ej is implied by the presence of the twosub-
scripts). Whena dyad operates on something, the rightmost index(and

basis vector) is involved. An example of a useful dyad is the projection
operator, where is a unit vector. The product () v is the

componentof the vector v in the direction. You can checkthisby
applyingthe definitionof the outer product.

1As noted in
Chapter 1, sometimes
v.
the dyad uv is denoted by uvT or u
3,1 vector
and Tensor Algebra
255
general second-order tensor T can be written as a linear

combina-
of the
0011
basis dyads eiej
Tijetej
tensor notation, the summations and base

vectors are imIn artesian can denote the tensor
by
its
we
component
matrix Tij. The
pliedand
T
v
between
a
=
u
second-order
tensor and a vector is
dotproduct
=
TijVj.
Similarly,
u = v T is, in Cartesian
anothervector:
coordiThe second-order identity tensor is
vjTji.
=
nates:
denoted and
property a = a
= a for all a. In Cartesian

is simply the Kroneckerdelta coorij, or
elel
+
e2e2
+
e3e3.
=
equivalently
Alsoimportant is the cross product, u x v. Recallthat, while the
satisfiesthe
dinateS,the ij component of
dotproduct is a scalar, the cross product is a vector, with magnitude
ullllvllsino and direction orthogonal to both u and v and deter-
minedby the "right-hand rule." The cross product is not commutative:

vx u. Because of the invocation of the right-hand rule in
its definition,the cross product is strictly speakinga PSEUDOVECTOR,

becauseits definition is affected by the handedness of the coordinate
systemin which it is computed.
It is useful to view the cross product as a matrix-vector multiplication.Usingthe Cartesian components
o
113
VI
-112
Wecan write the cross product more compactly if we introduce the

followingoperator, called the LEVI-CIVITA
SYMBOL
1,
Ejk -1,
0,
ijk = 123,231 or 312

ijk = 132,321 or 213
i = j, i = k or j = k
Thisis the Cartesian coordinate representation of the

ALTERNATING
UNITTENSOR
or PERMUTATION
TENSORE. As with the cross-product
itself,qjk is not actually a tensor, but rather a pseudotensor,
because
itsdefinitionis based on the
use of right-handed Cartesian coordinates.
Nowthe operator (ux)
can be written EijkUj. This quantity has two free
and Partial Differe

Vector Ca/CUlUS
ntial
256
indices, so it is a
cross product as
A useful identity
EquQti0hs
second-order pseudotensor. Finally, we can

CijkUjVk
Writethe
involving Eijk is
int jl
ijkklm
whicharises in the computation of double cross products such

(b x c). Sincethe Kronecker delta is not handedness dependent,the
doublecross product between three vectors is a true Vector.
3.2 Vector Calculus: Differential Operators and Integral

Theorems
3.2.1 Divergence, Gradient, and Curl
Considera vector that is a function of position, v(x), a vectorFIELD.
Physically,this vector field could be a fluid velocity (mass flux)oran
electriccurrent (charge flux), for example. An important physicalcon-
siderationis the total flow into or out of a closed region. Wedenote

this region as V, its boundary surface as S and the outwardunitnormal vector to S as n, as illustrated in Figure 3.1. The volumeof Vis
Vol(V)= fvdV. If v is a flux of some quantity, then n v dS is the
amount of that quantity crossing the boundary element dS perunit
time and thus
vol s
n v dS
is the amount of that quantity leaving V, per unit volume. Nowletthe

region be centered at a position xo and let V shrink to zero aroundthat
of v at point xo is defined by
point. TheDIVERGENCE
1
n v dS
div v = lim
Vol
vol-o
s
(3.1)
volumethat
Thus the divergenceof v measures the amount per unit
coordinates,so
of
independent
leaves the point xo. This definition is
the divergence is a tensor.
GRADIENT
For a scalar field (x) there is an analogous quantity, the
of 4, defined by
(3.2)
grad
= lim
vol-o Vol S
n dS
2 Differential Operators and Integral

Theorems
257
xo
Figure 3.1: Volume V shrinking to zero size
arounda pointxo.
Givena unit vector s, the quantity s grad ( is the derivative

ot v a\ong
i.e., the DIRECTIONAL
DERIVATIVE.
the s direction,
The
of (t)is
vector whose direction shows the direction
of the maximumchange
a
magnitude
is
the
whose
magnitude
of that change.
in 4)and
important
operation,
the CURL,measures the
The final
rotation ot
v at a point. It is defined
by
vectorfield
curl v
lim
n x v dS
Becauseof the cross product involvedin its definition,
is a
pseudovector.
The above definitions of div, grad, and curl are independent ot co-
xo
ordinatesystem and illustrate the concepts under\yingthem,
actuallywork with these operators we need coordinate systems.
three of the above operations can be expressed in terms ot the GRAoperator, V, also called "nab\a" or "del." is also sometimes
DIENT
denoted
x
by
In Cartesiancoordinates, it is given
3
or in Cartesian tensor notation

Xi
et is
vector
basis
the
The presence of
and curl operators
gradient,
unrepeated index i. The divergence,
and Partial Differential

VectorCalculus
Equations
258
by
then given
Vi
Xi
grad 4) =
Xi
Vk
curlv = V x v = Eijkxj
important operator is the LAPLACIAN
operator div
extremely
Another
grad, given by
div grad = V V xixt
notation for the Laplacian operator is V2. UnfortuThe most common

somewhat misleading, implying that the opera.
nately,this notation is
div grad. Some literature uses the symbol
tor is grad grad rather than
follow engineering convention and use 92
A for the operator. We
Non-Cartesian Coordinates
3.22 The GradientOperator in
In manyapplications,Cartesiancoordinates are not the most practical
for solvinga problem.2 We are familiar with cylindrical and spherical
coordinatesystems,but there are many others, including bipolarand
parabolicsystems. We consider here only orthogonal coordinate systems;the basis vectors may change from point to point, but at each
point they are mutually orthogonal. We denote an arbitrary set of orthogonal coordinates by 111, 112, and the (orthonormal) base vectors
by eul , eu2, eu3. The most important distinction between Cartesian and
other coordinatesystemsis the actual distance traversed in moving
from one coordinateline to another. For example, in Cartesian coordinates (Xl,x2,x3) = (x, y, z), the distance between the coordinatelines
y = 1 and y = 2, keeping x and z fixed, is always 1. But in cylindrical
coordinates,
= (r, e, z), the distance traveled goingfrom
= 1 to e = 2 (at constant r and z) depends on r! This dependenceis
quantified in the SCALE

FACTORS
for a coordinate system, defined by
x2
u i) 2
2Appendix
A of Bird, Stewart, and
deal of useful
informationabout this topic. TensorLightfoot (2002) contains a great
analysis is not restricted to orthogonal coordinate systems;if you want
to learn about tensor analysis in general coordinates,some
good references are Aris
(1962); Block (1978); Simmonds (1994); Bird, Armstrong, and
Hassager
(1987).
3.2 DifferentialOperators and Integral

Theorems
259
Thisquantity determines the distance traversed

coordinatecurve. For example, in cylindrical in movingalongthe
ut
= 1, h2 = r,
computethat
= 1. The coordinates,it is easy to
distancecovered
in moving
vectorby the scale factor), then we can write

the basis
1
h2
vectors in terms
xj
u ej
i
Notethat despite the notation, the number hi is

not a componentof
a vectorbut rather is a property of the particular
coordinatesystem
underconsideration.
For any orthogonal coordinate system, we can
now write the gradient operator as
ut
(summationimplied). In general, the gi depend on
position. The importance of this fact becomes clear when we consider
operators like
Laplacian
the
v.v=gt ut
gJ
hihj
ui uJ
Thesecond term in this expression does not appear in Cartesiancoordinates,where the base vectors are independent of position. In terms
of the scale factors, the derivative of a basis vector with respect to
positioncan be written as follows
gJ
hJ
uk
hk hk
hJ
i=l
ut
gj
Summationis not implied, as ukis not a component of a tensor.
Example3.1: Gradient (del) and Laplacian operators in polar (cylindrical)coordinates

(a) Without referring to Cartesian coordinates at all, derive a formula
for the gradient (del) operator in polar coordinates shownin Figure 3.2 so that one obtains for the differential of a scalar function
260
VectorCalculus and PartiaI
DifferentiQl
quqtio
rdOe
drr
x = rr
Figure 3.2: Polarcoordinates (r, O) and unit vectors er and
in which dx is the differential of the position vector in polar

ordinates.
(b) Using this formula for V, derive the formula for the Laplacianin
polar coordinates.
(c) Finallycheck these two results by relating them to Cartesianco.

ordinates using the hi and gi formulas given previously.
Solution
(a) As shown in Figure 3.2 we have for the differential of position
dx = drer + rdee
From the definition of partial derivative, we have the formula for
the total differential of an arbitrary function (r, O)
= eral + eea2 and solve for al, a2, the two

We substitute
vector components of
= (eral + eea2) (drer + rdee)
dr + ---d
= aldr + a2rd
dr + dO
3.2 Differential
Operators and
Integral
Theorems
261
Comparing the two sides, We

have
1
a2
which gives for V in polar coordinates
r
(b) Next we use the definition of the Laplacian
to obtain
(3.4)
v2=v.v
Taking the derivatives, and noting the dot
re
producter ee = 0
because the unit vectors are orthogonal, gives
er
r r
1 eo
r2 r r
er
r
1 ee
r 2 r r
Now we require the derivatives of the unit vectors with respect

to
(r, O). As shown in Figure 3.2 these are given by (see also Exercise
3.2)
er
eo
er
ee
(3.5)
Substituting these derivatives into the previous result and collecting the nonzero terms gives
r2
rr
r2 2
Note that we can combine the first two terms for an equivalent
form
r22
(3.6)
Vector Ca/CU/USand Partial Differentia
262
of the coordinates are

(c) The partial derivatives
x
y = sin O
ax = cos e
-= rsin O
1q
QQtio
Substitutinginto the previously given formulas for h
hi = 1
cos
andg
1
g2 r2 - r sin
gl = cos ex+ sin ey
g2 = eo
= er
We then have
which agrees with (3.4).

For the Laplacian,we require the derivatives of gl,g2
g2
1
g2
The formula for the Laplacian then gives

112r2
gl
h 2 02
r r
g2
r )
gl
g2
The gl term vanishes upon substituting the various derivatives,

and the g2 term produces the additional term (l/r)/r giving
r2
r2 2 r r
r2 2
which agrees with (3.6).
Table 3.1 collects expressions for the gradient and Laplacian

ators in Cartesian, cylindrical, and spherical coordinate systems. The
conventionused for the angles
shown in Figure 3.3.
and
are
in spherical coordinates
3.2 Differential Operators and Integral
Theorems
263
Figure 3.3: The orthonormal unit vectors

in
Cartesian
X2
Cylindrical
2+
spherical coordinates.
z z
2
V = er + eo
z
2
r
Spherical
r 2 2 + z 2
V = er
r 2 r
r sine
r
r 2 sine
sin 0
r2 sin2e
Table 3.1: Gradient and Laplacian operators in Cartesian, cylindrical,

and spherical coordinates.
VectorCalculus and Partial Differential
264
QQti0hs
Theorem
3.2.3 The Divergence
concerns the integral of the
The divergencetheorem
divergence
V. It is central to many
vectorfield v(x) in a region
aspects
volumeand the divergence theorem plays a key role in development

Toillustrate the arguments underlying the divergence theorem
out digressingtoo far, we will prove a limited version of it. Consider
the two-dimensional"volume" VAshown in Figure 3.4, whose "surface
consists of three pieces, Sl, S2, and S3 and whose outward unit normal
is TIA.In this domain
vx
VA
V.vdV= VA X
vy
Yl xc(y) vx
Yc(x ) v
dx dy +
ax
dy
y dy dx
(vy(x,yc) vy(x, 0) dx
Vy(X,0) dx +
dy
= eyand S2,
Since on Sl,
expression can be simplified
V.vdV=
ex,the first two terms in thelast
V dS +
dx +
V dS
(3.7)
both corTo simplify the remaining two terms, observe that they
respond to integrals along the surface (a curve in two dimensions)an
into
They can be combined into one by converting the second term
limitsof
the
changing
and
integral over x, noting that dy = dycdx
ax
Operators and
3.2Differential
Integral
Theorems
265
(x)
Figure 3.4: A two-dimensional volume

for evaluation
of the divergence.
of the integral
Differential
elementsas
and ax are
integrationappropriately
dX+
dx +
dyc
dy
xl Yc=Y1
dyc
dx dx
xly=o
ex + ey vdx
OnS3the normal vector can be written

1
dyc
dx
1 + dyc2
and dS =
dyc) 2dx (see Figure 3.4),

so this integral becomes
v dS
Combining
this result with (3.7),we find that
V.vdV=
v dS +
v dS+
v dS
(3.8)
S2
Finally,consider what happens if we extend the integral to the larger

domainV that includes both VAand a contiguous subdomainVB,with
Vector Calculus and Partial
266
Differential
Eq u
Qti0hs
Figure 3.5: Two contiguous subdomains.
normal vector 11B,as shown in Figure 3.5. By the same arguments

given
above,
V.vdV=
V dS+
V dS+
VB
V dS
(3.9)
The two domains VAand V2 share one side S2; on this side
nB.Adding (3.8) and (3.9) and recognizing that the integralsover
S2 cancelanything leaving VAvia S2 is entering VBwehave that
n v dS
V.vdV=
(3.10)
whereS is the boundary of the entire domain and n its outwardunit

normal. Bypiecing together domains like these and repeatingthearguments used here, (3.10)can be seen to hold for any closeddomain
on the plane.
Equation(3.10)is the divergence theorem. By extending theseelementary arguments it can be shown to be valid for arbitrarybounded
domainsin an arbitrary number of dimensions. In one dimension
THEOREMOF INTEGRALCALCULUS:
it reduces to the FUNDAMENTAL
fab dx dx = f (b) f (a). It is extremely important in a widevarietyof

on
contexts as it relates behavior in the interior of a domain to behavior
its boundary. Finally,one can see that the definition of the divergence
operator mirrors this result for an infinitesimal domain.
In Cartesiantensor notation, the divergence theorem is
vt
dV =
V Xi
niVidS
3.2 Differential Operators and Integral
Theorems
267
Byreplacing the vector by a scalar

in this expression, the related results or by a second-order
can be found
tensor
(nowexpressed
in
dS
(3.11)
n T dS
(3.12)
A closely related result is the
multidimensional
which we state without proof
RULE,
version of LEIBNIZ'S
here.
Consider
the time derivative
of an integral over a volume that is moving
ement in a velocity field. If a point on the with time, e.g.,a fluid elboundary is moving
velocityq (x), then Leibniz's rule states that
with a
dt V(t)
m(x, t) dV =
dV +
mn qdS
The second term in this formula appears only

if the volumeis moving
changing shape with
or
time and represents the
net amountthat is
swept into V because of the motion of its boundaries.
Example3.2: The divergence theorem and

conservation laws
Conservation laws can be written for many quantities.
Important examples include mass, energy, chemical species, and
probability. Consider
a quantity A that satisfies a conservation law in some
arbitrary
of space V with boundary S and outward unit normal n. The region
density
(amount per unit volume) of A is PAand the flux (amount per
unit area
per unit time) is FA. We allow for the possibilitythat A is created
or
destroyed within the volume, with rate RAhaving units of amount of A
per unit volume per unit time. If A is a chemical species,then Ris a
volumetric reaction rate of production of A. The conservationlaw for
A can thus be written for the domain V as follows
dt v
n.
as + S RAdv
(3.13)
The left-hand side is the rate of accumulation of A in the domain.The

first term on the right-hand side is the net rate of entry of A into the domain across its boundary and the final term is the net rate of production
of A via sources or sinks of A within the domain. Use the divergence
theorem to write a conservation statement for A that is valid at evers:
point in the domain.
and
VectorCalculus
Partial Differential
Equations
268
the lone
allows
theorem
Thedivergence
integral
a volume
n FAdS -
surface integral to be
dV
independent
V is time
because
Furthermore,
d PA dV =
v t
v
dt
equations
Substitutingthese two
into (3.13) yields
PAdV = V
v t
equation
Sinceall terms in this
bined
dV+
RAdV
are volume integrals, they can be com.
+ V.FAdV-RA av
that this equation can

Becausethe volume V is arbitrary, the only way
be satisfiedin general is if its integrand vanishes at every point within
V. That is
QA = -V FA+RA
(3.14)
This is the general pointwise statement of the conservation law for A.

To be more specific,let A be a chemical species. Its molar density,
or concentration,will be denoted CA.Chemical species are transported
by moleculardiffusion and flow; if the species is dilute the flux of
can be written FA =
DAVCA,where v is the velocity field for the
fluidin whichA is dissolved, and DAis the diffusivity of the species.
Now (3.14) becomes
CA
= -V
(CAV) +DAV
2cA + RA
(3.15)
Thisis a partial differential equation

for spatial and temporal distribUtion of a chemicalspecies. If U
and L are characteristic scales for the
fluidvelocityv and domain
size, respectively, then the relative importance of convectionand
diffusion is estimated by the PECLET
NUMBER
Pe = UL/DA.
3.2 Differential
Operators and Integral
Theorems
269
3.2.4 Further Integral Relations and Adjoints

of Multidimensional
IDENTITIES
are special cases
GREEN's
of the divergence
working
for
with
integrals over
theoremthat
are useful
quantities
other
than
the divergence.
involvingdifferential operators
Green's
first identity is
divergencetheorem for the case where v is
the
replaced by uVv,
scalars
now
whereu
and v are
(Vu vv + uv 2v) dv =
uVv ndS
(3.16)
Green's second identity comes writing Green's

first identity
v exchanged and subtracting this expression from Green's with u and
first identity
as written above
(uv 2v -
dv = (uvv - vvu) nds
(3.17)
Finally, GREEN'SFORMULAcomes from replacing v in the

original exby uv where u is a
scalar and v a vector
pression
(Vu.V+uv
uv .ndS
(3.18)
In one dimension, Green's formula reduces to the expressionfor integrationby parts.

The above theorems all deal with the divergenceand its closest relatives,the gradient and the Laplacian. The final results are instead for
vx
the curl. In two dimensions, V x v reduces to ( vy
x y )e3. GREEN'S
shows how the integral of this over an area A can be reduced
THEOREM
to an integral over the (closed) boundary curve C
vy
vx
dA =
(vx dx + vy dy)
The proof of this result closely follows what we did above with the
divergence theorem. STOKES'STHEOREM

is more general, applying to
anybounded orientable ("two-sided") curved surface A floatingin three
dimensions,with boundary curve C
n (V x v) dA=
v tdC
Heret is the unit vector tangent to the boundary C, pointingin the
directionin which the integration around C is being performed. The

VectorCalculus
Equations
270
surfaces like a Mbius strip. The

precludes
orientabilityconditionvector to the surface A. Since the surface
normal
does
tor n is a unitthree-dimensionalvolume, however, inward and
not enclosea
for C. For example, if S
the integration path
were
of
direction
upward
point
on the
would
out
n
of
the paper
paper, then
region on a sheet of around C is counterclockwise.
path
if the integration
of the above results is in the determina_
application
important
One
div, grad, and curl. First, we define the relevant
to
adjoints
the
of
tion
inner products. Let
uvdV
if u and v are (real)scalars, and
u vdV
if they are vectors. In our earlier discussion of adjoints, we used integrationby parts to help us compute them; in multiple dimensions,
Green's formula and identities are the appropriate replacements. For
example,using Green's formula, (3.18), we can easily find that, with
u (S) = 0 (Dirichletboundary conditions)
(Vu,v) = -(u,v v)
Thus the adjoint of grad (with Dirichlet boundary conditions) is -div,
Similarly,rearrangingGreen's second identity we find that
(V2u,v) =
(uvv - vvu) nas + (u, v 2v)
If we imposethe same boundary conditions on u and v, then u Vv v Vu on the boundary. Thus the boundary term vanishes, leaving
(v 2u,v) = (u, v 2v)

Therefore,the Laplacianoperator is
always self-adjoint. This fact has
important implications for the
solution of partial differential equationS
that involve the Laplacian.
3.3 Linear Partial Differential
Equations:

33.1 Classification and
Canonical
Properties
Equations:
and
solution
Propertiesand
Forms for
Second-Order
Partial
general
properties of
Many
partial
differential
duced with this second-order
equations
equation in
two dimensionscan be introauxx + 2buxy +
Cu
where x, y e R and ux = u etc. For
(3.19)
the
moment x and
essarily position variablesthey
y are not necare simply
the
problem.
the
The coefficients
for
a, b, and c independentvariables
latter
are real and
though the
restriction can be
relaxed.
Nowconsider constant,
of whether there exists a change of
independent variablesthe question
= 7xx+ nyy
that can simplify the left-hand-side of this
equation. Here
are constants and Exr7y
must be nonzero for Ex,5, nx,
the coordinate
transformation to be invertible. Applying the
chain rule yields that
(a; 2 +
+ cg)
(a;xnx + b(xny + Eynx) + c;yny) u;n+

ar7x2 + 2b7x7y+ cny) unn = g(,
(3.20)
If b2 ac > 0, then (3.19) is said to be HYPERBOLIC3.

In this case,
we can find real constants Ex, 5, 7x,
such that the coefficientsmul-
tiplying u;; and unn in (3.20) vanish, leaving the simpler differential
equation
(3.21)
This is the canonical, or simplest, form for a hyperbolicpartialdifferential equation. Lines = constant and = constant are called
for the equation. The WAVEEQUATION
CHARACTERISTICS
utt C2uxx = O
(3.22)
3The nomenclature introduced in this section arises from an analogywith conic

ey + f = 0. If they exist,
Sectionsdefined by the equation ax2 + 2bxy + cy2 + dx + parabolas,dependingon
real solutions to this equation are hyperbolas, ellipses, or
whetherb2 ac is positive, negative, or zero.
272
Vector Calculus and Partial Differen
tiQ1
EquQti0hs
= x ct, = x + ct. We present

the
has this form, with
Section 3.3.6.
in
equation
this
general
solution to
is ELLIPTIC.No real
Ifb2ac < 0, then (3.19)
coefficients
Ex,
of u;; and
coefficients
the
vanish.
make
exist that will
InsteaY hi
+ il,
characteristics =
conjugate
complex
= FOR , one
finds
and
n'
=
Using
as new
All
is not lost, however.
coordina
to
made
vanish,
leading
be
can
to the
the coefficientof
form
(3.23)
equation is the two-dimensional

The left-hand side of this
Laplacian
above
reduces
(3.15)
to
state,
this
steady
At
operator.
form in two
function
a
of
x
and
only
is
g
If
y,
dimensions.
this equation
spatial
If
g
=
0,
it
is
EQUATION.
called
the LAPLACE
is called the POISSON
EQUATION.
The borderline case b2 ac = 0 leads to the PARABOLIC

eqUation
urn = g
(3.24)
The standard example of a parabolic equation is the transient species

conservationequation, (3.15)in one spatial dimension, whichwecan
vsTite
ut + Vux = Duxx + RA
The Schrdingerequation is also parabolic. Elliptic and parabolicequations are treated extensivelyin the sections below.
The classificationof partial differential equations into these categories plays an important role in the mathematical theory of existence
of solutions for given boundary conditions. Fortunately, the physical
settings commonlyencountered by engineers generally lead to well-
worryabout
posed mathematical problems for which we do not need to
the presenthese more abstract issues. Therefore we now proceed to
insensitive
tation of classicalsolution approaches, many of which are
to the type of equation encountered.
with
Expansion
Eigenfunction
3.3.2 Separationof Variables and
2
Equations involving V
most familthe
perhaps
is
OFVARIABLES
The technique of SEPARATION
equationS
differential
iar classical technique for solving linear partial
mechanicS,
It arises in problems in transport, electrostatics, quantum
Equations:
Propertiesand
solution
and many other applications. The

technique is
tion property of linear problems (section
based on
2.2.1)as well the superposias the following
1. a solution
x3, .. to a
PDEwith
independentvariables
2. The boundaries of the domain are

boundary conditions for the PDE coordinatesurfacesand
the
can also be
written in the above
3. A distinct ODEcan be derived from the

originalPDEfor each
func-
4. Using superposition, a solution satisfying

tions can be constructed from an infinite the boundarycondiseries of solutionsto
these ODEs. This condition implies
that separation of variables
useful
primarily
for
is
equations involvingself-adjoint
partial dif-
ferential operators such as the Laplacian,in whichcase

eigenfunctions of various Sturm-Liouville problems provide bases for
representing the solutions. Consider a problemwith three independent variables and two of them, say x2 and x3, lead to
Sturm-Liouville problems with eigenfunctions Yk(X2)and Zl(x3),

I = 0, 1, 2, .. The basis functions for the x2 x3
The solutions to the problemin
direction are thus
the inhomogeneous direction are then coefficientsin the series

and the total solution has this form
00 00
= E EXkL(X1)
examples.
Weillustrate the method with several
distribution in a circular cylintemperature

Steady-state
Example 3.3:
der
radius and an imposedtemperaunit
with
cylinder
Considera circular
temperature profile
steady-state
The
ture profile us (O) on its surface. EQUATION
u(r, O)is a solution to LAPLACE'S

v 2u = o
in polar coordinates
1 u
1 2u
rr r r2
(3.25)
274
Vector Calculus and Partial Different

lal
the origin and satisfying u(l, O)

with u bounded at solution u(r, O) =
scribed above, seek a
Equqtions
us (9).
Asdes
Solution
simplifyingyields
Plugginginto the equation and
(rR')' d9/d. Notice that the LHS

where R' = dR/dr and 9' =
of the equa.
the
RI-IS
a
and
function
only
r
of
of
O. The only
tion is a function
for
them
is
both
to
equal
equal a constant,
for the two sides to be
c
This observation gives us a pair of ODEs
r (rR')' -
=O
(3.26)
(3.27)
The constant c is as yet unspecified.
Equation (3.26) satisfies periodic boundary conditions 9 (9) = 9(+
2TT),9' (O) = 9' (O + 2TT);it is a Sturm-Liouvilleeigenvalueproblem

with eigenvalue c. This has solutions 9k (o) = eikfor all integersk
with the corresponding eigenvalue c = 1<2.So in fact we have found

not a single solution, but a family of solutions; a basis for functionsin
the direction.
Nowconsider the equation for R (r), setting c = 1<2.A littlemanipulation puts the equation in this form
r 2 R" + rR' + k2R = O
This is a Cauchy-Eulerequation, with k as a parameter and solutions
Rk = Akr k+Bkr -k . To satisfy the boundedness condition atr = 0,only
the solution with a positive exponent must remain, so Rk = akrlk

Sinceevery integer k gives a solution, we can use the superposition
principle to write
krlkleik
u(r, O) =
= 00
This is a Fourier series, using the Sturm-Liouville eigenfunctions 6k(9)

as basis functions. The coefficients ak come from the boundary condi-
tion. At r = 1,
ake LkuS (O)
Equations:
Properties
and solution
the coefficients
Wecan extract
ak from
the
this formula
of
Sturm-Liouville
thogonality
by
basis
functions: take using the orinner products
akeiko ile
= 00
Us
die
eil)/2Tt, this
Letting = (us
process simply
(known)
the
Fourier
is,
That
coefficients of the gives us that ak = Ck.
boundary temperature
determinethe Fourier coefficients in the
cylinder,
so
Ckrlkleike
Example3.4: Transient diffusion in a slab

Thetransient diffusion of heat or a chemical
speciesin one direction
is governed by the transient diffusion equation,
also called the heat
equation
2u
X2
(3.28)
Considerthe initial and boundary conditions u(x,0) = 0,

u(0,t) = 0,
0) = uc, i.e., the initial concentration in the domain
0<x <
is zero and at t = 0 the right end of the domainis exposedto a known
concentration u = uc. Seek a separation of variables solution u(x, =
t)
Solution
Usingthe form u(x, t) = X(x)T(t), (3.28)becomes
XT' = DX"T
whereagain ' denotes the derivative of a function with respect to its
independent variable. Rearranging yields
IT' X"
Observingthat this expression equates a function of t to a function of
x we again conclude that each side of it must be constant
T' = cDT
(3.29)
(3.30)
and Partial Differenti

Vector CalCUlUS
al
276
A simple change of
EquQtions
variable solves this problem. Welet u
satisfy the inhomogeneous boundary

v (x, t) and choose us to which case v satisfies
homogeneous
at x = 0 and x = C, in
tions
v (C,t) = O. A particularly convenientbound.
ary conditions v (0, t) =
choice
the steady-state solution to this
problem.Thus
is us = ue, whichis
steady
the
state.
from
Substituting
into (3.28)
v (x, t) is the deviation
=
0
2us/x2
yields
=
us/t
that
and observing
2v
X2
with v(x,0) = us,v(0, t) = 0, v(C, t
0) = 0. Nowletting
X(x) T(t) and repeating the above steps we find that the problemforx
is a true Sturm-Liouvilleproblem, including the homogeneousboundary conditions X(O) = X(C) = 0. The eigenvalues are c = k2,where
for positive integer n and the eigenfunctions are sin
now k = 717T/C
Equation (3.29)is an initial-value problem. Its solutions, parametrized
by n, are
Tn(t) =
so the overall solution again has the Fourier series form

42
v(x, t) =
sin
n=l
(3.31)
The initial conditions Tn(0) are determined from the initialcondition v (x, 0) = usby setting t = Oin (3.31) and taking its inner product
mrrx
with basis function sin
x
. mrrx
uev, sm
, sin
n=l
Thus
Sin
Tm(0) = C mrrx
Sin
Sln
2uc
dX
ntTT
dX
The final exact solution is thus
u(x,t)
n 2ue
=
n=l
sin
nrx
(3.32)
3.3 Linear Partial Differential Equations:
Properties and
Solution
<
t
ID, this series
At short times
converges
-I decay of the Fourier
coefficients Tn(0) very
of the initialbecauseof
In this situation, alternate approaches that
semi-infiniteare more appropriate
approximatethe
because
domainas
heat
or
the
solute over a short diffusion has only
to spread
had time
distance from
see Exercises3.23 and 3.36. As t increases,
the
boundary.
becomessmaller and the series converges the exponentialdecayterm
more rapidly.
Withthese two examples, one can see

a pattern emerging.
tion of variables leads to at least one direction
Separathat
problem
whose
presents
Liouville
a Sturmeigenfunctions are a
useful
solution.
basis for repreIn the second example,
sentingthe
a
change
requiredto find a direction with the homogeneous of variablewas
boundaryconditions required of a Sturm-Liouville eigenvalue
problem. The following
Example3.5: Steady-state diffusion in a square domain

solveLaplace's equation V2u = O in a unit square
domain
< y < 1, with boundary conditions u = 200 on x = O O < x <
and
u = 300 on x = 1 and u = 500 on y = 1, as shown in Figure y = O,
3.6(a).
Solution
As stated, there are no homogeneous directions. Nowwe split the solution into three pieces: u(x, y) = U (x, y) + V(x, y) + W(x, y), where
U,V, and W all satisfy Laplace's equation, but with conveniently chosen
boundaryconditions that sum to the boundary conditionsfor the original problem, as illustrated in Figure 3.6(b). The problem for U is trivial
because all the boundaries have the same value of 200; thus U = 200.
The problem for V has homogeneous boundary conditions at y = 0
and x = 1, while that for W has homogeneous boundary conditionsat
x = 0 and x = 1, aside from a multiplicative constant, it is just a IT12
rotation of the problem for V. The solution to the Wproblem (to within
a multiplicative constant) is Exercise 3.32. From it the solution to the
Vproblem can be found so the solution for u = U + V + W is complete,

Equations
VectorCalculus
278
(a)
500
300
200
200
300
200
(b)
200
v2U
200
v2v
= O
100
v2w = 0
200
in a square domain. (a) Original probFigure 3.6: Laplace's equation

lem. (b) Three subproblems whose solutions sum to the
solutionof the original problem.
Example3.6: Eigenfunctionexpansion for an inhomogeneous prob

lem
Solve the Poisson equation
uxx + uyy
in a unit square with Dirichlet boundary conditions, which modelsa
steady-statedistribution given a source f (x, y) distributed withinthe
domain.
Solution
Separationof variables does not work for this problem (try it), but a
versionof eigenfunctionexpansion does. Think of this problem as a
linear algebraproblem Lu = f. Here L is self-adjoint, so the solutionS
to the eigenvalueproblem Lw + Rw = 0 form an orthogonal basis and
allowus to diagonalizeL. We can express u
and f in this basis, and
sinceL becomesdiagonal we can
easily solve for u.
To perform this procedure in
the present case, we need to solve
wxx + Wyy +
=O
3.3 Linear Partial Differential Equations: Properties

and Solution
Techniques
the unit square with w = O on the boundary.
279
We can solvethis
of variables: it gives
problemby Separation
Sturm-Liouvilleproblems
and yields eigenfunctions
in
Uhnn(X,y) = sinmrrx
bothX and y,
eigenvalues
mn
sinnTry
=
TT2(m2+ n2), for all
with (real) the Poisson equation,
solve
we express u andinteger pairs mn.
Nowto
f in terms of the
enfunctions
eig
UmnWmn (X Y)
Sincef is known, fmn = (f, u.'mn) / (Wynn,
, where the inner prodjust

is
the
case
integral
over the square. Now since
uct in this
I.vxx+
write
can
we
mnUmn
NW,
=
fmn,
which
we
can
solve immediWyY
ately to give Umn = fmn/
SO
mn
sin ml-rx sinnrry
mn
In some situations, a separation of variable solution canbe obtained

viamultiple approaches. For example, the Laplacian operator in polar
coordinatescan be written
where
r r
02
Z2
Givenappropriate homogeneous boundary conditions, all three of these
are Sturm-Liouvilleoperators, so depending on the boundary conditions,there may by the possibility of more than one method of solution.
Thefollowingexample illustrates this situation.
Example3.7: Steady diffusion in a cylinder. eigenfunction expansion
andmultiple solution approaches

ConsiderLaplace's equation in a cylindrical domainwithboundary conditionsu(r,z = 0) = 1, u(r = l,z) = 0, u(r,z = 1) = 0. That is, the
bottomis heated, and the top and side are cooled. Solvethis equation
in twodifferent ways:
Vector Calculus and Partial Differential
280
that depend on r.
(a) Usingbasisfunctions
that depend on z.
(b) Usingbasis functions
Solution
Wecouldproceed by seeking a solution v (r, z) = R(r)Z(z) as

directlyimpose a Fourier series form for the above
Insteadwe
solution
(a) For the current problem there is no O-dependence and

we first
seek a solution that uses basis functions in the r
-direction.,
i.e.
eigenfunctionsof Lr. This is a singular Sturm-Liouville
operator
(p(r) = r) so only boundedness is required at the
origin, and the
boundary condition at r = 1 is homogeneous.
Referring back
Example2.8, we recognize that the eigenfunctions
to
of
Lr
are
the
Besselfunctions of order zero so we can seek a
solution
u(r,z) =
where
= 2.4, 5.5, 8.7, 11.8

To simplify
van. Substituting this solution
let kn =
form into Laplace's
using the fact that LrJo(knr) =
equation and
14Jo (knr) yields that
d2un
dz2
k 2un = 0
Becauseof the bounded

domain, it is convenient
solution to this problem
to represent the
as
un (z) = an cosh
so
knz + bn Sinh knz

'i
u(r, z) =
(an cosh knz +

bn Sinh
(3.33)
At z = 0, u =
l.
from r = Oto Takingthe inner product, i.e.,
r = 1 of (3.33),
weighted integral
evaluated at z = O,with Jo(kmr)
an =
(JO
JO (km r )
(3.34)
Partial Differential Equations:

Properties and
Solution
Techniques
3.3 Linear
281
Evaluationof these and related integrals is

facilitatedby the following general results for Bessel functions with
integer n and arbitrary k, L
nJn(kx)] =
tX
dx
xnkJn-1(kx)
d
dx
(3.35)
xnkYn-1(kx)
(3.36)
x- [Jn(kx)] + nkJn(kx) = xkJn-1 (kx)

x-
(3.37)
+ nkyn(kx) = xkYn-1(kx)
(3.38)
x- [Jn(kx)] nkJn(kx) = xkJn+1(kx)
(3.39)
x- [Yn(kx)] - nkyn(kx) = xkYn+1(kx)

dx
(3.40)
J-n(kx) = (-1PJn(kx)
Y-n(kx) =
(3.41)
(3.42)
dx
0
-1
(k) k =
LJ2
2 71+1
(3.43)
Usingthe first and last of these expressions, one can find that
1
dr = v-J1 (km) (3.44)

JS (kmr)r
dr
(3.45)
Theboundary condition u = 0 at z = 1 requiresthat

bn _
an cosh kn
Sinhkn
Usingthese results, the solution is

cosh kn
an coshknz Sinh kn sinhknz Jo(knr)
with an given by (3.34), (3.44), and (3.45).
282
EquQti0hs
z) = 0) with the homogeneous

(because V2(1 --
boundary
Condi?
seeking a solution v (r, z) =

z).
Wecould proceedby
impose
directly
a
will
Fourier series
above. Instead we
eigenfunctions
sin nrrz of Lz form
the solution based on the
vn(r) sinnrrz
v(r,z) =
Substituting this solution form
n=l
into Laplace's equation
leads to
1 d dvn n27T2vn
(r) sinnrrz = 0
r dr dr
Takingthe inner product of this equation with sin mrrz, invoking
orthogonality,and changingm to n yields

1 d dvn
r dr dr
n 2TT2vn(r) = 0
This is called the MODIFIED BESSELEQUATION OF ORDERZERO.It
differsfrom Bessel's equation by the sign in front of the second

term. Its solution can be found by the method of Frobenius;the
general solution is
vn(r) = anlo(n27T2r) + bnKo(n27T2r)

The functions 10 and 1<0are the MODIFIED BESSELFUNCTIONS
of
order zero; they are shown in Table 2.3. The function has
a logarithmicsingularity at the origin, so for boundednesswe
require that bn = 0. The coefficients an are found by imposingthe
boundarycondition at r = 1 and again taking the inner product
with an eigenfunction
an =
z),sinnrz)
-2
(n27T2)
nnIo(n27T2)
The solution in final form is
v(r,z) =
sinnrz
an10(n27T2r)
ItI
Properties
In spherical coordinates, the

Laplacian
and solution
operator can
283
be written
where
sino OSin
It oftenis useful to rewrite the first of these

in this form
Lyf =
r r2(r f)
Accordingly,the introduction of a new
variable g = Yf
often is useful
Example3.8: Transient diffusion from a

sphere
considerthe transient diffusion of a

chemical species
out
withradius R into uniform surroundings
where the speciesof a sphere
concentra-
withu(r,0)=
= DLru
> O) = O.
Solution
sphericallysymmetric problems like this can be

solvedusing the eigenfunctionsof
This is the SPHERICAL BESSEL'S EQUATION
dx
dx + (m 2x 2
+ 1)) y = 0
inthe specificcase = m 2 and n = 0. Its solutions are the SPHERICAL

BESSEL
FUNCTIONS
of order zero, which are simply
sinmx
x
cosmx
x
Thesefunctions are orthogonal with respect to an inner product with

weightfunction w (r) = r2. This factor arises naturallyin the differentialvolume element in spherical coordinates. The eigenvalues
m2,and coefficients a and b are determined as usual by the (homogeneous/boundedness) boundary conditions. For example, for diffusion
in a sphere,boundedness at the origin will require that b = 0.

VectorCalculus
284
around a sphere in a linear gradient

field
Temperature
0) that surrounds
temperature field T(r,
Example3.9:
steady-state
inclusion in a solid material,
Considerthe R, e.g., a spherical
sphereof radius
it. Thus
heat fluxinto
with boundary
we are solving
2
O=LrT+ ALnT
conditions
VT Gez,
00
Solution
involving the Laplacian in Spherical

A'symmetricdiffusionproblems expansion in the eigenfunctions
by
of
geometriesare naturally treated
If we make the substitution
= cos O, Lbecomes
d
and the eigenvalueproblem can be written as
1 z ) dz2
dz
Thisis Legendre'sdifferential equation, see Example 2.9. Its eigenvalues are = n(n + 1) for nonnegative integers n and its eigenfunctions
are the Legendre polynomials Pn
Substitutingthe solution form
(3.47)
into the governingequation, recalling that LnPn = n(n + 1)Pn,and

using the orthogonality of the Legendre
polynomials yields that
r 2 LrTn -
+ 1)Tn = O
3 Linear
Partial Differential Equations: Properties

and Solution
echrliques
285
as
ewritingthis
2d2Tn + 2r dTn
dr2
it
recognize
we
(3.48)
as a Cauchy-Euler equation with solution
Tn(r) = anrn + bnr-(n+l)
(3.49)
the boundary condition at infinity. We can rewritethis

first
consider
is arbitrary;we have not
+ Too= r GPI (n) + Too,where TOO
anywhere, only its gradient. Comparingthis
temperature
as
cified the
al = G, and
we see that ao = TOO,
spe to the series solution (3.47),
form
Atr = R
1.
0 for n >
dr
Pn(n)
nanRn-l - (n
vanishterm
must
sum
this
of the pn(n),
orthogonality
the
Becauseof
-(n+l)-l = O
1)bnR
_ (n +
byterm
nanR
known values
the
Using
of an
bo
al G - 2b1R-3
bl -
GR3
(n + 1)bnR
is
Thefinalresult
(3.50)
Too G
R3 cose
2r2

VectorCalculus
286
perturbation: heat conduction around

Domain
Example3.10:
heat conduction outside an object describ
of
problem
Considerthe
coordinates by
in spherical
r = R(O) = 1 + EP2(cos O)
1)/2 is the quadratic Legendre polynomial.
(3x2
=
(x)
whereh
the poles and narrower at the equator
at
elongated
than
shape is slightly
surface area. Use a regular perturbation
same
the
has
is a sphere,but
smallness of the deviation of the surface from
approachbased on the
spherical.
Solution
the technique of DOMAINPERTURBATION.
This
This example illustrates
where the possibly unknown bound.

approachis applicableto problems
from a shape for which a closed form
ary shape is a small perturbation
Fourier transform) solution can be ob(e.g. separation of variables or
tamed. This approachis sometimes also used in numerical Solution

approachesto simplifythe domain shape. In the present example,the
choice of the Legendrepolynomial simplifies the calculation but the
solutionprocedurewould be similar, but more tedious, with a more
complicatedsurfaceshape, as long as the deviation from a sphereis
uniformly small.
The equationand boundary conditions are
V2T = 0, T(r = R) = 1, T 0 as r 00
Becausethe boundary is not a constant-coordinate surface, separation
of variables(in sphericalcoordinates) cannot be used to find an exact

solution. Nevertheless,a perturbation approach can be used to impose
an asymptoticallyexact boundary condition at r = 1. This is doneby
expandingthe boundary condition in a Taylor series around r = 1:
1 2T
2 r2 r=l
Inserting the particular expression for the boundary shape:
1
2T
O()
EP2(cos)+
E2P2(COS +
2 r2 r=l
3.3 Linear Partial Differentia/
Equations:
Properties
Notethat this boundary condition
and
solution
is imposed
287
use of separation of variables. There
at
perturbation approach is necessary, is no
so we indicationthat
expansionT(r, 0)
To(r, 0) +
a singular
posit a
(r,
regular
0)
order
each
+
is simply
equationat
perturbation
Laplace's 2T2(r,0). The
equation, With governing
boundarycon-
Usingthe fact that axisymmetric decaying

solutionsto
Laplace'sequaPi (cos o)
ri+l
i=0
wefind that the solutions at each order are:

1
P2(cos O)
21
r3
0)
36
0)
35 r5
solutions,
these
we
can
Given
find that the dimensionlessheat
fluxfrom
the object is
27T
O r
sine dO = 4TT(1+ 2)
whereQ = 1 corresponds to the heat flux from a sphere. Thusthe
changein heat flux from the sphere is proportional to the square of the
deviationof the surface from spherical. Notice that the entire solution
procedureis valid, and the heat flux the same if E < 0, so the objectis
actuallya slightly flattened sphere. Therefore both prolate and oblate
deviationsfrom a spherical shape increase the heat flux.
33.3 Laplace's Equation, Spherical Harmonics, and the Hydrogen

Atom
exSchrdinger'sequation for the wave function Y(x, t) of a particle

posedto a potential energy field V(x) is
(3.51)
_ _v2Y +
i
Vector Calculus and
288
Partial Differentia/
spherically symmetric
the case of a later, so it is natural potential
consider
to Workin
Wewill
we will specify
form
whose
equation
have
this
of
V(r),
a very richSh
solutions
The
ical coordinates.
many features of systems with
ture that encompasses
examples, we will allow

metry.
the previous couple
to
the
contrast
In
to again guide us. To begin, let Y
procedure
(x, t)
ration of variables
temporaland spatial variables are separated but not
coordinate directions. Inserting this form into (3.51)
individual
the
(yet)
yields
and rearranging
dt
Thus
where E is a constant.
dt
(3.52)
(3.53)
The solution to (3.52)is
-iEt
f (t) = foe
Equation(3.53)has the form of an eigenvalue problem where the eigenvalueEis a dimensionlessenergy. This must be real so that Y doesnot
vanish at past or future times. Now, since L4) = 2/2 with periodic
boundary conditions, we let (P(r, 17,4) = u(r, 7)eimfor any integer
m. As above, we have let = cos O. Equation (3.53) becomes
2
(3.54)
r2 (1 172)
Substitution into (3.54) and rearWenow write u(r, 17)=

rangement to group terms dependent only on r and 17yields
1
-r LrR + r2(E- V(r)) = --LnP +

2
Therefore
r 2LrR + r 2(E -
- CR
2
I 172
(3.55)
(3.56)
3.3 Linear
Equations:
Properties and
Solution
Equation (3.56) describes the angular

behavior of
0, it reduces to Legendre's differential
the solutions.
For
equationwe
knowthat
* O it is
I DIFFERENTIAL
the ASSOCIATED
EQUATION. Seeking
a
LEGENpower series
vealsthat this equation has bounded solutions
solution
rein 1<
number. For
. , l. These
solutions are the
if
ASSOCIATEDLEGENDRE POLYNOMIALS
Pim(n) and Pi,-m
2)
dnmPl(n),
(3.57)
Pint.
Recapitulating, the products Pim(cos O)eim<)

pendenceof the solution. Suitably normalized and
denoted Yim(O, ,
these products are called SURFACESPHERICAL
HARMONICS,
or someHARMONICS;
timesjust SPHERICAL
they are
the eigenfunctions
of the
angularpart of the Laplacian
(3.58)
Eacheigenvalue I has I + 1 corresponding eigenfunctions Yim

with m =
I. The normalized functions
have the form
21+1
(3.59)
andsatisfy orthonormality with respect to integration over the surface

ofthe unit sphere
Yim(O,
O,
sine d d4) = lnmp
(3.60)
Thefunctions Yimfor I = 4 are shown in Figure 3.7. Surfacespherical

harmonicsare widely used to represent functions on the surfaceof a
sphere.
Returning to (3.55) for the r-dependence, consider first the case
E = V(r) = 0, in which (3.53) becomes the Laplace equation
= 0.
Equation(3.55) and its solution reduce to (3.48) and (3.49),respectively,
withn replaced by l. Thus the general solution to

in sphericalcoordinates, is
00
1=0
almrl + bimr
-(1+1)
= 0, expressed
(3.6
Vector Ca/cu/us and Partial Differe

ntiQ/
290
eqt4Qtio
ecsoe
m=2
'77=1
"77=0
Figure 3.7: From left to right, real parts of the surface spherical
har_
Equation (3.50)is a particular case of this solution. Terms rlY1

and r -(1+1)Yim(O,4)) are called the growing and decaying SOLID
ICAL HARMONICS, respectively.
Now consider the case of an electron "orbiting" a protonahydro.

gen atomwherethe potential energy is the Coulomb potential
1
As boundary conditions, we require that Y is bounded at r = 0 and that
it vanishes as r If the latter condition is not satisfied, the electron

is not bound to the proton and we do not have an atom. Equation(3.46)
motivates the substitution w(r) = rR(r) into (3.55),yielding

d2w
dr2
oowe can approximate this as w" + Ew = 0, suggestingthat

As r
where E.Thisresultinwe seek a solution w(r) =
dicates that E < 0 for a bound electron. Without going into the details
(with which we are now largely familiar), seeking a Frobenius solution
00leads to = 1+1
F(r) = r ag(r) and requiring that Fe -r 0 as r
and requires that g(r) be a truncated power series, i.e., a polynomial.
Inspecting the recursion relation for the power series, one finds in close
analogy to the results in Chapter 2 regarding Legendre and otheror-
thogonalpolynomials that it will truncate at degree n' if
The solutions, which we denote RTZ'can be written in terms of ASSOCIATED LAGUERREPOLYNOMIALS(Merzbacher,
1970; Winter, 1979).
Equations:
Properties
and
n'
+
1
=
as
n
solution
pefining
the PRINCIPAL
QUANTUM
NUMBER
this
becomes
n2
expression
determines the
This
verywell the energy levels of a eigenvalues of
(3.53)
hydrogen
n,(x) = Yim(O,
(r) of (3.53) atom. The and describes
l, called the ANGULAR
are
eigenfunctions
characterized
MOMENTUM
RADIAL
the
QUANTUM
called
by l,
NUMBER. QUANTUM
NUMBERS,
onlyon I + n', various combinations Since the
and n'
eigenvalues
of
I
true
for
is
and
E
all eigenfunctions
Thesame
n' have
the samedepend
atomic
with
f
and
d,
orbitals correspond
energy.
the same
to
m.
I = O,1, 2,
Thes, P,
sinceE < O,when n = 1 only 1 =
and
O
3,
respectively.
is the ground state or lowest-energy states, s orbitals,can
state of the
exist. This
hydrogen
atom.
1
on. Thus we see in this analysis the (p orbitals)can exist, When
basic features
and so
of the electronic
33.4 Applications of the Fourier Transform

to PDEs
In Section 2.4.1 we saw that functions in a

finite domain
Ck =
k = 00
couldbe rep-
(f eikX)
(eikx
eikX)
The FOURIERTRANSFORMgeneralizes this idea to an

unbounded domain.First some definitions: the Fourier transform j (k) of a
function
f (x) is given by
f(x)e
tkXdx = F {f (x)}
Thisis the analogue of the expression for Ckin a bounded domain;becauseperiodicity is no longer required over a finite interval, k can be
anyreal number rather than needing to be an integer. The INVERSE

is the analogue of the Fourier series representaTRANSFORM
FOURIER
tion of f
1
f(x) = 2TT
-00
and vice
Theseoperations are mappings from "x-space" to "k-space" which
transforms,
versa. Here are some useful properties of Fourier
areeasilyderived from its definition:

vector Calculus
292
property
Derivative
l.
df(x)
2. Integral
ikF {f (x)} = ik(k)
property
f(x)dx = j(k) + c(k)

(3.63)
xo
on the
where c depends
3. Shift in x
lower limit xo of the integration.
F {f (x - a)} = e tka(k)
(3.64)
4. Shift in k
(3.65)
5. Scaling
F {flax)} =
lal
(9
(3.66)
where oxis a real scalar.
6. Behaviorupon exchanging variables: if j (k) is the Fouriertrans.

This property is usefulfor
form of f (x), then (x) =
extending the usefulness of lists or tables of transforms, likethe
one in the following paragraph.
of two functions Gand
7. Convolution theorem: the CONVOLUTION
h is
u(x) =
This is often written u = G * h. The CONVOLUTION
THEOREM
states that
(3.67)
A convolution in x-space is a product in k-space.

These properties
help us to solve PDEs.
Fourier transforms of some important functions are:
1. f(x) = (x)
(k) = 1. A spike localized in space has equal
components at every wavelength.
3.3 Linear
partial Differential Equations: Properties

and Solution
Techniques
293
2TT(k)
f (x)
1. Conversely,
is smeared all over space. a spike located at
wavenumber
zero
L)
f (x) eilx. A spike at
2TT(k
k = I corresponds
3. j'(k)
of wavenumber l.
sinusoid
to a
1, L< x < L and zero elsewhere
=
j (k) = (2 sin kL)Ik.
f(X)
4.
1
5.
6.
blx\
2b
7.
ex214a
The Fourier transform of a Gaussian is a Gaussian. If a is large, the
function decays very quickly as \k\ increases, so the Gaussianin

k-spaceis very localized. Because a appears in the denominator
in x-space, however, the function is very spread out in x. The
opposite is true if a is small, with the balance holding at a = 112.
spread of the function the samein k and
Here and here only is the
- (x), in which case this property
0, 2 ex214a
x. Asa
reducesto the first result on the list: f (x) = (x)
formula
Example3.11: Derivation of a Fourier transform
its Fourier transform.
Letf (x) = e-blxl with b > 0. Find
Solution
eblx\eikx dX
e
(b-ik)x
1
b -ik
2b
b+ik
j (k) = 1.
VectorCa/cu/us and Partial Differen

tiQ/
294
of Fourier
Weillustrate the use
EquQti0hs
transforms to solve
PDEs
With
amples.
diffusion in an unbounded domain:

Example3.12: Transient
one
multiple dimensions
(a) Considertransient diffusion
ut Duxx
(3.68)
in the one-dimensionalinfinite domain (00,00)with initial
ditionu(x, 0) = uo(x), where uo (x) is known but otherwise
bitrary. Usethe Fourier transform in x to find the solution
(b) Extend this result to three directions, with initial condition u(x,
0)
uo(x). Do so by first considering a -function initial condition
(x) = (x)(y)(z) and noting that it can be incorporated
into
the governingequation as a point source in space and time
) (t)(x)
(3.69)
Solution
(a) Takingthe Fourier transform of the equation and applyingthe

derivativeproperty yields
t) = k2D
Thisgives us an ODEfor each value of k, with initial condition
o(k). The solution is simply (k, t) =
Theinverse
Fourier transformputs this back in physical space. Considerthe
evolutionof a delta function, whose Fourier transform is simply
(k) = 1. Now
2
u(x, t) = F -1 {e-Dtk
F {ut = DUxx}
t(k, t)
e -x2/4Dt
2 7TDt
(3.70)
Thus at any time, the temperature field that starts as a delta function is a Gaussian distribution, with height
1/2 TTDtand width
4Dt. An important extension of this result comes from the
observationthat any function can be written as a superposition
of delta functions
uo(x) =
(3.71)
3.3 Linear
partial Differential Equations:
Propertiesand
solution
Thus the solution can be written

as the
1
superposition
s of the
2 TTDt
(3.72)
(b) Now the THREE-DIMENSIONALFOURIER

TRANSFORM
will be intro-
(kx, Icy,kz) = F3D{f (x, y, z)}
f (x, y, Z)e-ikXXe-ikyye-ikzz
dx dy dz
Here Fourier transforms have been applied

in all three spatial
ordinate directions. Defining the WAVEVECTOR
cok = (kx,19, kz),
(k) =
dx
Similarly
1
(2TT)3
The results presented above for one-dimensionaltransforms

can
be used to generate formulas for multidimensionaltransforms.
For example
F3D{Vf} = ik(k)
F3D{V v} = ik (k)
2),
F3D i V 2f I 1<
= k2x + k2y + k2z
F3D
Taking the three-dimensional Fourier transform of (3.69)yields
k2D = (t)
which is easily solved (by Laplace transform in time for example)

to yield
(t) = e-k2Dt
vector
296
formula
results
dk
k2Dteik.x
(27T)
2 TT
transform
three-dimensional
the inverse
Applying
u(x,t)
Calculusand
-kDteikxXdkx
co
kiDteikyYdky 27T 00-eDteikzZdk
27T 00
x
2/4Dt
2 TTDt
e -y2/4Dt
2 TTDt
2 7TDt
_r2/4Dt
(2 7TDt)
this result, (3.72) generalizes to an arbitrary
Using
Ixl.
(x)
wherer =
initial condition uo
three-dimensional
u(x)
(2 TTDt)
from a wall with an imposed concenExample3.13:Steadydiffusion
tration profile
Solvethe steadystate diffusion or heat conduction problem
uxx + uyy
-00 < x < 00,0 < y < 00,with boundary conditions

in thehalf-plane
u(x,0) = uo(x) and u(x,y) bounded as y
00.
Solution
Basedon our experiencewith the previous example, we begin by con-
sidering
theboundaryconditionu(x, 0) = (x). Taking the Fourier
transformof the equationand boundary
condition in the x-direction
(theproblemis not unbounded

in y) yields
-k2(y) +yy(y)
Requiringthat the
solution be
(0) = F {(x)} = 1
bounded as y -+ 00, this has the solution
(y) =
3.3Linear
Equations:
Properties
and
solution
1.
Now
the
110
inverse
that
297
from the pointtransform of
found. Recall
this
of
the variable Y
view of
solution
is a
andits inverse,
the
constant (we Fourier mustbe
transforms
involving
the
Fourier
are only transform
x-coordinate).
Therefore considering
we can
comand(cxx) = k(k/CX). Letting y = we
( ky)
e-l kly
have that
11
Try + x2
572
TTx2 +
Giventhis Solution, we can use (3.71) and

the
forlinear problems to determine that, given superpositionprinciple
an arbitrary
boundarycon-
u(x,y) =
Observethat (3.73) has the form of a
convolutionG(k)h(k),
and
(k)
uo(k)
=
with
,
i.e,
(k)
it is a convolution.
Thus,the
solutionarises directly from the convolution theorem
33.5 Green's Functions and Boundary-Value Problems

Overview
Thetransient diffusion problem we solved in Example3.12gaveus an

exampleof a GREEN'SFUNCTION,a solution to a differential equation
witha point source forcing. 4 We saw in that example that the solutions
foran arbitrary initial distribution u(x, 0) = uo(x) couldbe written

and the Green's function. Exercise3.36extends
asa convolution of
thatresult. In the present section we will develop the basic theoryof

Green'sfunctions, with a particular focus on boundary valueproblems.
Considera linear boundary value problem
Lu = f (x)
(3.74)
for a transientproblem
In quantum mechanics in particular, a Green's function
condition
a -function initial
likethis one is called a PROPAGATOR,

since it propagates
forwardin time.
298
conditions that may be inhomogeneous

boundary
specified
in
G(x, xo) for the operator L is
function
the
solution
eral. The Green's
to
the solution for a point source placed

is
xo)
G(x,
Forexample
of interest.
The discussion
within the domain
xo
position
arbitrary
boundary conditions G should satisfy. For the
what
reveals
low
for self-adjoint problems present
functions
Green's
and
we will consider
Sturm-Liouville
consider
will
operators. Recall
specificinitial example
(2.33)from Section2.42
1
+r(x)u
a w(x) dx
v w dx
= p(b) (u'(b)v(b)
p(a) (u'(a)v(a) u(a)v'(a))

b
+ r(x)v
w (x) dx
w dx
Lettingv (x) = G(x, xo), this becomes
+ p(b) (u'(b)G(b,xo)
p(a) (u' (a)G(a,xo)
Applying(3.74)and (3.75)in the two inner products gives us that
+ p(b) (u'(b)G(b,xo)
p(a)
xo) u(a)G'(a,xo))
The inner product (u, (x
ranging leads to
u(xo) =
(xo), so rear-
evaluates to
dx
w (xo)
p(b)
xo)
+ p(a) (u'
xo)
(a, xo)) ]
Finally,we specify boundary conditions. For example, we can set inhoBe-
mogeneousDirichletboundary conditions u(a) = ua, u(b) =

cause we are not specifying homogeneous boundary conditionshere'
Properties and
solution
the operator L is said to be FORMALLY

for a differential operator requires thatself-adjoint;true
we impose
self-adjointness
ary conditions such that the boundary
homogeneous
terms vanish.
and u' (b) are not specified. If we require
In this caseboundu' (a)
G to satisfy
pirichlet boundary conditions G(a, xo)
=
homogeneous
the unknown boundary values u' do not G(b, xo) = O,however,
appear and we
then
lutionfor u in terms of f, G and the
arrive
boundary conditions at a so1
u(xo) w (xo)
f(x)G(x,
dX+
Therefore,given the solution G(x, xo) to the
= G(b, xo) = 0, we can find the
(3.76)
problem LG =
xo),
solution
to
Lu = f for any
f through (3.76). Note that (3.76) is closely
A-1b of the algebraic problem Ax = b,analogous to the solution
with G
of A-I. Example 2.15 shows a derivation of this playingthe role
formulafor a specificproblem. Because that example already imposes
Dirichletboundary conditions, reworking it with u(x) =homogeneous
xo) would directly yield the Green's G(x,xo) and
f(x) =
functionfor the
Dirichletproblem.
The above discussion focused on a Sturm-Liouvilleproblem,

which
is formally self-adjoint. For a non-self-adjoint operator, the Green's
functionfor the adjoint operator satisfies

L* G* (x,xl) =
Xl)
(3.77)
alongwith appropriate homogeneous boundary conditions. In general,

the position of the source is arbitrary, which is why we let its position
herebe Xl, which is generally distinct from xo. From the definitionof
the adjoint
(wehave chosen homogeneous boundary conditions on G and G* so

theboundary terms vanish), and inserting (3.75)and (3.77),yields
((x xo),
(x,xl)) =
Xl))
Thisreduces to simply
(xo,xl) =
(3.78)
Vector Ca/cu/us
300
Thisresult
of the
is the analog
matrix adjoint result
to the Poisson Equation

Solution
Green's Function
identities provide the foundation
Green's
for
dimensions,
We focus
here on the
In multiplesolutions based on Green's functions.
developing
Poisson equation
solution of the
-v2u = f(x)
specified below. The Green's function of in

conditions
_
with boundary
terest here satisfies
= (x
(3.17),
Green's second identity,
xo)
(3.79)
with v replaced by G, is
xo) V2u (x)) dV(x)

(u(x) V2G(x,xo) G(x,
- G(x, xo) vu(x)) ndS(x)

differential volume and surface elementsas
wherewe have written the
us that it is the independent variable.
explicitfunctionsof x to remind
evaluating the integral containingthe
Inserting(3.3.5)and (3.79)and
-function yields
u(xo) =
G(x,
dV(x)
ndS(x)
(3.80)
requiring
If u satisfiesDirichletboundary conditions u = us on S, then
that G = 0 on S yields a solution for u
u(xo) =
G(x,
dV(x) us
xo) dS(x) (3.81)
the term
5Weput a negative sign in front of the Laplacian here so that physically,
function
f(x) represents a source of heat, chemical species, etc., and thus the Green's
represents a point source. Someauthors do not use the negative sign.
Equations:
Properties
and
solution
whereG/n n VG. A Green's
pirichlet boundary conditions is function
sometimessatisfying
TIONOFTHEFIRSTKIND. If u satisfies
called a homogeneous
we apply
Neumann
GREEN's
u/n = js, then
homogeneous
FUNCto
boundary
O
the
tions G/n
Green's function, Neumann
conditions
in which
boundarycondicase the
solution for
u(xo) G(x,
G(x,
and G is a GREEN'S FUNCTION OF THE

Evaluatingthe solutions (3.81) or SECONDKIND.
dS(x)
(3.82)
(3.82)
the solution to -V2G
xo) with the requiresus to
determine
ditiOnS.To do this, it is useful to let G appropriateboundary
be
con+ GB. In this sum,
written as the
parts: G =
is
sum
of two
called the
It is a solution to the
FUNCTION.
FREE-SPACE
equation
GREEN'S
domain, and contains the singular behavior Lu = in an unbounded
the point source. The boundary correctioninduced by the presenceof
GBsatisfiesLGB
singularbehavior is contained in GO, and is
= O(the
determined
satisfy
by
G
specific boundary
ment that
conditions on S. We the requirewillfind GOO
for L V2 in two dimensions.
and CYB
For the purpose of obtaining the free-space

Green'sfunction,we
place the source at the origin: xo = 0. Because
the -function
has no angular dependence, we will seek a two-dimensional
V2G00(x) that is only a function of r. Therefore,at solutionto
every
in the domain except the origin, GOO
(r) satisfies the equation point
r dr dr
Thesolution to this is simple
Goo(r) =
Inr + C2
Weset c2 = 0; any constant component of the solution can be incorpo= (x)

rated into GB. To find Cl we first integrate the equation
over any volume V (area in this case) containing the origin
(x) dV = 1
Recallingthat V2
theorem to the
V V and applying the divergence
left-handside of this expression yields that
s n
dS = 1
Vector Ca/cu/us and Partial Differential
302
14Qtio
to evaluate if we let V be a circle

The integral is simple in which case
Of
the origin,
surrounding
radius
s n
Therefore
1
27T
Letting r = Ix - xo l, the free-space
becomes
tion for V2 in two dimensions
Goo(x XO)=
-1 In
Green's
fun
Ix xol
(3.83)
To determine GB,the shape of the domain and the boundary

tions must be specified. We will take the domain to be the
half-plane
N< x < 00,O < y < 00and seek a solution that vanishes as
y
co
In the case of Dirichlet boundary conditions, GB satisfies
G00ony 0. Wecan solve this problem using the
withGB
"method
of images." Since GNrepresents the field due to a point source
position xo = (xo,yo), if we place a point sink (an "image"or at the
"reflection" of the source) at xo/ (xo, yo),symmetry shows us that
field due to the source-sink combination will be zero at y = 0 the
(Figure
3.8). Therefore we set
-1
In Ix
This satisfies V2GB 0 in y > 0 because the sink is in the imageregion

y < 0. Thus the total Green's function is given by
= = In Ix xo/ = In Ix
27T
2m
27T
In
XOII
Ix D
XO/ I
Finally, the solution, (3.81), becomes

-1 00 oo
2m o -00
Yo
Ix xo I
us (x)
(x xo) 2 + Yo
f (x,y) dx dy
dx
Iff(x,y) = 0, this solution reduces to what we found using Fourier

transforms in Example 3.13. For the solution with Neumann boundary
Equations:
Properties
and
solution
303
Figure 3.8: A source (as indicated

by the +)
in the
position (xo, yo) and
Physical
an image
domainat
sink (-) at
shaded region is
(xo, -yo).
outside the
The
Physical domain.
source and sink have
equal magnitude
Because
and are the same
and opposite
distance from
sign
the Plane Y
fields due to them
= O, the
cancel out on
that line.
conditions,(3.82), CJBwould have to satisfy

GB/n =
0. In this case CJBis the field due to
an image source
rather
position XOI.
sink at
on
than a
The simple geometry used here required

only one "image
point" to
boundary
conditions.
the
satisfy
Nevertheless, the
geometrydoes not
needto be much more complicated to require
many or even an
infinite
numberof image points. The infinite strip, -00
< x < 00,0< y <
1,
infinite
number
an
of image points since
requires
the image point we
use to satisfy, say, the boundary condition at y =
0 will changethe
= l, which must
be compensatedby another
imagepoint,
fieldat y
and so on ad infinitum. As a practical matter, often using

one image
two
the
boundaries
of
provides an adequate
for each
approximation.
BoundaryIntegral Formulation of the Laplace Equation

Equations(3.81) and (3.82) require the availability of the solution for
the Green's function with the appropriate boundary conditionsand in
the domain of interest. In some cases, as we saw above,this solution
is availablein closed form but often it is not. To addressthis situation,we step back to (3.80). In developing this equation, the boundary
conditionson G have not yet been specified. For example,it is valid if
welet G =
which has a simple closed-form solution, (3.83).Using
thischoice and letting f (x) = 0 so that we are consideringthe Laplace
equation, (3.80) becomes
u(xo) =
Cx,xo)
u(x)
dS(x)
- u(x)
CAx,xo)
n
n
(3.84)
Vector Calculus and Partial Differe

ntial
304
EquQti0hs
Above,we have taken xo to be a point within the domain

solution to this equation could then be inserted into (3.84)t
solution at any point within the domain. We will derive
for the case where the domain is the interior of a bounded volume
Thereis an important subtlety in doing this, which arises
fact that GN/n changes sign as xo crosses from one Sidfrom the
boundaryto the other. Considera vertical boundary define
line x = 0 (with the outward normal pointing to the right) and
let
xo = (xo,y). Takingthe limit xo 0 corresponds to approaching
the point (0,y) on the boundary
G00(x, xo)
xo-0
lim
G00(x, xo)
1
xo-0
xo
27Txo + Y2
sgn(xo)(y)
2
where the last step is accomplished by recognizing
Ix I
as a delta
family, see Section22.5. Thus this term is singular as xo approaches

the boundary, and the sign depends on the side from whichit approaches. Usingthis result, and recalling that here xo is approaching
the boundary from the left (interior)
lim u(x)
xo-s s
Gco (X, XO)
dS(x) =
u(x)
G00 (x, xo)
dS(x) u(xo)
(3.85)
Here the integral on the boundary must be evaluated in the senseofits

CAUCHY PRINCIPAL VALUE
G00(x, xo)
lim
dS(x)
where SEis the portion of S within a tiny radius E of xo.

Finally,inserting (3.85)into (3.84) yields that for points xo on the
boundary
1
u(xo) =
(x, xo)
u(x)
u(x)
(x, xo)
GOO
dS(x)
(3.86)
2
n
n
If Dirichletboundaryconditionsu = g are imposed, then the lefthand side and the second integral is known and the boundaryvalues
3.3
Partial
Linear
Techniques
Differential
Equations:
Properties
and
of u/n are determined by
solution
the solution
305
1mPosed,then of this
the unknowns. If u is impos
the
u/n on the remainder, then ed on some boundary
boundarywhere u is imposed u/n is an part of thevaluesof u are
boundary,and
and
closed form solutions
to (3.86) viceversa.
can be
its importance goes beyond
these.
that Laplace's equation, a
on aobtained in special
partial
fundamental cases,but
mulated as an integral equation
differential
level,it
original domain. on a practical whose domainequation,canbeshows
is
reforcomputational approach to level, it forms the the boundaryof the
basis of an
solving the
problems, the BOUNDARY
Laplace
important
ELEMENTMETHOD. equationand
tegrals in (3.86) are discretized,
related
In this
approach,
leading
to
the ina system of
equations whose unknowns are
linearalgebraic
values of u and
u/n at points
on the
3.3.6 Characteristics
and D'Alembert's
Solution to the wave
Equa-
The wave equation

utt
= C2 V 2 u
(3.87)
governs wave propagation in many physical
contexts,
tromagnetic waves (light), vibrations of strings and includingelecmembranes,and
sound propagation. In one spatial dimension, the equationis
utt = c 2uxx
which was introduced
(3.88)
in Section 3.3.1 as an archetypal hyperbolicequa-
tion. Following the change of variable procedure introducedthere,we

find that
= x ct and
yields(3.21)with g = 0
x+ct.
Rewriting (3.88)in these coordinates
solutionof the
Wecan easily integrate this twice to find the general
waveequation
ct) +F2(x + ct)
Fl(x
=
+F2(n)
FIG)
u(x, t) =
right-movinganda lefta
of
equationas an
It says that any solution is a superposition
wave
the
understand
and
conditions
moving wave. Usually, we want to
initial
two cases of
at
100k
we
so
initial-value problem,
then combine them to get a general
result.
Vector Ca/CU/USand Partial Differe

ntia/
306
First, consider the initial
Equations
condition u(x, O) = uo(x),

ed
to
0 the above general solution
and its . n'
but no initial velocity. At t

derivative become
u(x, 0) uo(x) = Fl(x) + F2(x)

- -CFI' + cF
ut(x, 0) = 0
The latter equation integrates to yield Fl F2, and using this factinthe
first equation gives Fl = F2 2110.Thus the solution for these
initial
conditions is
u(x, t) = uo(xct) + uo(x+ ct)

The initial condition splits immediately into two identical waves,one
travelingto the right and one to the left. These waves have the same
shape, but half the amplitude, of the initial condition. In Contrast
to
uxx,
ut
which
equation
smooths
the parabolicheat
discontinuous
initial conditions as illustrated in Example 3.12, no smoothingoccurs
in the wave equation. If an initial condition contains a discontinuityat
a point x, this will simply propagate along the characteristic directions
= constant, = constant.
Nowconsider a struck string rather than a plucked one. Theinitial
0. There is no initial
condition is u(x, 0) 0, ut(x, 0) = vo(x)
deformation, but there is an initial velocity. Now at t = 0 we have
= O= El (x) +F2(x)
ut(x, 0) vo(x) = CFI'+ cF
This tells us that Fl
find that
F2and that vo
1
2cF. We can integrate thisto
x+ct
2c o
Similarly
Fl(x ct) =
xct
2c o
The solution is Fl + F2, which is

1
x+ct
20' x ct
partial
Linear
3.3
Differential Equations:
Techniques
Physical domain x < 0

xo
Properties and
Image domain
solution
307
xo
Figure 3.9: An initially right-traveling wave in the domain

x
< 0 reflecting across a wall where u = 0,

as solved using superposition of a left-traveling "image"with
oppositesign.
The complete solution to the initial-value problem is the sum of the

abovetwo cases. This is D'Alembert's solution
u(x,t)
I
uo(x ct) + uo(x+ ct) +
x+ct
2C xct
Wehave only considered the very simplest hyperbolic equation here.

For example, if the coefficients a, b,c depend on position, then the
characteristicsare curved. The references contain extensiveinformation about more complex hyperbolic problems.
Becausethe wave equation is linear, we can superpose multiple solutions to form another solution. As an application of this fact, imagine
a pulse traveling rightward toward a boundary at x = 0, at which the
boundarycondition is u = 0. At time t = 0, the pulse is centered at
x = xo. To understand this situation, recall Figure 3.8 and the "method
of images" analysis of Section 3.3.5. Applying the same idea here, we
placean "image" pulse of the same shape but opposite sign at the position x = xo(which is outside the physical domain) and make it move
leftwardas shown in Figure 3.9. Now the real and image pulses will
eventuallyoverlap, and by symmetry they will satisfy u = 0 at x = 0.
Oncethe "image" pulse enters the physical domain, it is no longer an
image,but a component of the true solution. The implicationof this
COnstructionis that when a wave hits a boundary where no deformation is allowed, it reflects but with a change of sign. Whathappens if
theboundary condition is ux = 0?
Vectorcalculus and
Equations
308
Methods
Transform
3.3.7 Laplace
PDEs with Laplace
solution of several linear
the
trans.
Nextwe illustrate with some experience, Laplace transforms are
forms. For a user
method for solving linear, low-dimensional PDE
powerful
s
Laplace transform of a PDE,
bly the most
the
taking
After
usually
in closedform.the time variable, the result is a linear ODEin the trans.
wthrespect to
can often solve this ODE.To perform the
inverse
form function. Werequire some inverse formulas for transforms
With
transform,we then
inverse formulas next and then solve
these
develop
singularities.We
transform function
some example PDEs.
Let the
p(s)
a(s)
of q(s), which is assumed to havem

havesingularitiesat the zeros
simple zeros
a(s) = 0
Theinverseof this Laplacetransform is given by the followingformula
f(t) =
n=l
anesnt
an
(3.89)
whichis usually called the Heaviside expansion theorem. When(s)

and Q(s)are polynomials,the coefficients an can be derived usingpartial fractions. But the result applies to more general cases as werequire in the two examples below, where Q(s) = Sinh v/k-s
and Q(s)=
Whenthe zeros of Q(s) are higher than first order, f (t) is a linear
combinationof products of polynomials and exponentials of time,and
the coefficientsare more complex. Let the zero sn have order rn, n l, 2, ... m. Then the inverse is given by
(3.90)
6The
singularities of complex-valued
points, and essen-
functions are poles, branch

tial singularities(Levinsonand
is the smallest
integeri such that q(Sn)/(S Redheffer, 1970). The order Of a zerofirst-orderzeroe
sn)t is nonzero, and a simple zero is a
Soweare assuminghere
that the function f (s) has m simple poles.
Weare in good
a(s)
company. Heaviside also used the
expansion for the case Of
sinhxs (Vallarta,
1926)(Heaviside, 1899, p. 88).
partial
3.3 Linear
Techniques
Differential Equations: Properties

and Solution
coefficientsani, for i
Tile
309
1,
1, 2, ...m, are given

by
ill which
denotes the ith derivative of ((s) evaluated at s = sn. For

(sn)
and
a background in complex variables, Exercise A.2provides
students with
establish (3.90) (and hence also (3.89)),which requires insomehints to
transform by performing the contour integral (2.7).
vertingthe Laplace
transforms to solve the reaction-diffusionequaNextwe use Laplace
wave equation. We will see that the

tionand the only simple zeros and we will use transformin both
(3.89)for calculating
problemshas
the inverse.
and diffusion in a membrane

Example3.14: Reaction
describes diffusion through a membranein which
Thefollowingmodel
componentA decomposes by a first-order reaction. The membrane
initiallyhas zero concentration of A. At t = Othe concentrationat the
sideof the membrane at x 0 is abruptly raised to concentration
maintained at zero concentration.
andthe other side is
PDE
2CA
CA
DA x2 - KcA
BCI
=O
BC2
(a) Definethe dimensionless variables

tDA
cm
and showthat the model reduces to

PDE
2c
z 2
kc
BCI
BC2
c(z,0) = 0
KL2
310
Equations
the only dimensionless

in whichk = KL2/DAisThis dimensionless
parameter
problem.
parameter is
pearing in the
kno
modulus
Thiele
in
the
chemical
as the Thiele number or
engineering literature (Rawlings and Ekerdt, 2012, p. 363).
rate to the diffusion rate It
dicates the ratio of the reaction
(b) Take the Laplace
transform of your model (also the

boundary
differential
ditions). solve the resulting
equation and
bouns
sinh( s +
z))
s sinh s + k
(c) Apply the final-valuetheorem to C(z,s) to find the steady-state
solution Cs(Z).
(d) Take the limit of this solution as k 0 for the zero-reactioncase.

Does your solution satisfy the diffusion equation?
(e) Sketchthe solution cs(z) for a range of k values and showthe

effect of reaction on the steady-state concentration profile.
(f)Let p(s) = Sinh
s+
z) and Q(s) = s sinhv/7k,
and find
the zeros sn of a(s). Also find the value of

at the
zeros of a(s). The followingformulas may be helpful: cosh(iu) =
cos(u), Sinhiu isin u.
(g) Invert the transform and find c(z, t). Check that the solution
satisfies the PDEand boundary conditions.
Solution
(a) Inserting the defined dimensionless variables in the PDEgives
2c
L2
and rearranging gives
2c
z 2
2c
Z2
KL2
c
DA
kc
Partial Differential Equations:

Properties and
Solution
Techtliues
3.3Linear
the dimensionless variabies

in the boundary
OKzLKL,
311
and
Simplifyingthese expression gives
C(Z,T)
C(Z,T) O
C(Z,T) O
0 KZ KI,
T 0
(b) Taking the Laplace transform of the PDE and BCSgives

d 2C
d 2 k
z
CII, s) 0
The solution of the ODEcan be written
C(z,s)
acosh s +
z) +
z)
and we use the two BCSto find the constants a and b. Wehave
1
bsinh s + k
so we have
s sinh s + k
which gives for the Laplace transform of the solution

(z,s)
z)
sinh s +
s sinh s + k
VectorCa/cu/us and Partial Differentia/
312
QQti0hs
the
(c) Applying
final-value theorem gives
(z) = lims(z, s)
Sinh s
= lim s
= lim
cs(z)
z)
s sinh s + k
sinh
- z)
s-o Sinh s + k
Sinhv/k(l z)
Sinh v/k
x for small x gives
sinhx
(d) Usingthe fact that
limcs(z)
Yes, the solution
satisfies the steady-state
diffusion
boundary conditions
d2cs(z)
equation and
cs(0)
-1
(e) Theconcentrationprofile (z) versus z for a variety of rate constant k are given in Figure 3.10. We see that a large reactionrate
constant prevents species A from diffusing very far into the membrane.
(f) Sincethe zeros of sinu are u
nr, n
0, 1, 2
the zerosof
sinhu are u = nni, n = 0,

8 The zeros of sinhv/s
are given by sn = (n27T2 + k), and for these roots, wehave
that sn + k = nni, in which we choose the positive squareroot.

Thereforethe zeros of the denominator q (s) are given by
s = {0, (n27T2 + k)},
Theseare simple zeros so the inversion formula in (3.89)is applicable. Differentiatinga(s) and evaluating Q'(s) at the zeros
8SeeExercise3.48 for a proof that these are the only zeros of sinu for u G C,
partial Differential Equations:

Linear
33
Properties
chrliaues
('e
and solution
313
0.8
0.6
2
0.4
10
30
0.2
100
0.4
0.2
0.6
0.8
Figure 3.10: Concentration versus membrane penetration

distance
for different reaction rate
constants.
gives
Q'(0) = sinhNQ
Q' ((n 2TT2 + k)) =
(n 27T2 +
2nTTi
Evaluatingp (s) at the zeros gives

p(0) = sinhQ(1 z)
+ k)) = isinnr(l z)
(g)Puttingthese terms together in (3.89)gives
C(Z,T) =
sinhVk(1 z)
Sinh vfk
n=0
2rn
n 2TT2+ k
(n2TT2+k)T

Calculus
Vector
314
Noticingthe n
C(Z, T)
vanishes,
0 term
we can rewrite the solution

as
z)
Sinhk(l --Sinh
(1)n+1Trn
34 in Table A. 1.
entry
to
Compare also
wave equation
the
Solving
Example3.15:
utt = c2uxx on x e [0, 1] for a stringWith
equation
wave
Revisitthe
0, and the plucked string initial con.
u(l,t)
=
u(0,t)
fixedends
ut(x,()) = O. Solvethis equationusingthe
uo(x),
=
u(x,0)
dition,
Compare the solution to D'Alembert's solution.
transform.
Laplace
prefer
Whichform do you
and why?
Solution
remove the velocity c and simplify our work.

Firstwe defineT = ct to
The problem is now
uTT= uxx
u(x,0) = uo(x),
LIT(x, 0) = 0
u(0,t) = 0, u(1,t)=0
xe
(0,1)
T 0
Takingthe Laplacetransform with respect to the time variable gives
u(x,s)xx
= suo(x)
with transformedboundary conditions

We obtain a second-order nonhomogeneous differential equation for
the transform. We already have solved this problem in Chapter 2, and
obtainedthe Green's function. The solution is therefore
in which
sinh(sE)sinh(s(l x))
sinhs
sinh(sx)sinh(s(l 9)
sinhs
3.3Linear
partial Differential Equations: Properties

and Solution
Techniques
315
we expect, G(x, s) is
Noticethat, asboundary-value problemsymmetric in (x, k) because
second-orderrequire a Laplace inverseis self-adjoint. Next we the
invert
for the following
s). We
form
sinh(as) sinh(bs)
sinhs
sinhs has simple zeros at s
I-ITT
i With
Noticethat
in
(3.89)
to
given
obtain
formula
usethe
p (sn) = sinh(nTrai) sinh(nTTbi) = sin(nTra) sin(YITrb)
q' (sn) = cosh(nTri) (1p
inverse is
Thereforethe
(1/1+1sin(nTra)
E
= 00
f (T)
inTtT
Substituting e
gives
COS(n1TT)+ isin(nTtT) and combiningterms
cos
f(T) = 2 E (-1/1+1sin(nTta)sin(nTrb)
n=l
Noticethat the function is now real valued as it must be. Usingthis

resultto invert the Green's function gives
(1P
x)) COS(nTTT) < x

E)) COS(nTtT) > x
11=1 (1) sin(nTrx)
E)) = (1P +1
Butnoticingthat
G(x,
reduces this to
sin(nTtx) COS(nTTT)
=2
Substitutingthis into the solution gives

1
u(x,t) = 2 E sin(nTtx) cos(nTTt) uo)

71=1
Definingthe Fourier coefficients representing the initial condition
an = 2
uo)
316
we have finally
u(X, T) =
an sin(nrrx) COS(nTTT)
n=l
time variable
Returningto the original
gives
u(x, t) =
with the substitution
(3.91)
ct
an sin(nrx) COS(nrct)
not resemble D'Alembert'ssolution

Noticethat this solution does
provided a Fourier series
The Laplacetransform has equation. It is easy to seerepresentation
that the solus
of the solution to the wave
Taking two x derivatives
u
tion satisfies the wave equation. derivatives gives uTT = gives
T
c 2(nn) 2u
(nn)2U,similarly, taking two
wave
satisfies the
equation. Thezero
so utt = c2uxx and the solution because all the sine
terms vanishat
boundaryconditionsare satisfied
satisfied because of the Fouriersex = 0, l. The initial condition is
immediately that the solutionis
ries representation of (x). We see
periodic (in time) with period T = 2/c since all the cosine terms have
also convenient if we wishto
this period. The Fourier series solution is
analyze the frequencycontent of the solution, which is often a quantity
of interest when modeling sound propagation.
D'Alembert'ssolution, on the other hand, provides the nice structural insight that the solution splits into two waves travelingin opposite directions. But then we also require the additional insight fromthe
method of images to enforce zero boundary conditions and extendthe
solution to the (x, t) values where x ct < 0 or x + ct > 1, for which
uo(x ct) or uo(x + ct) is not defined.
3.4 Numerical Solution of Initial-Boundary-VaIue Problems: Discretization and Numerical Stability

and
Chapter 2 introduced numerical methods for solving initial-value
here to
boundary-value problems. These approaches will be combined
solve initial-boundary-value problems
u
+ Lu = f (x), Bu(S, t) = h, u(x, 0) = uo(x)
t
form, we could
91fwe knew enough about the problem to propose a solution of this
is that it is
here
transform
Laplace
the
arrive at this answer more quickly. The value of
solution to apply
prescriptive. Youdo not have to know (or guess) the structure of the
the method.
4 Numerical
Solution of Initial-Boundary-Value
and Numerical Stability

Discretization
Problems
317
L is a differential operator that contains all the

spatial derivaS is the boundary of the domain, and B is an operator
tweS,
. es the boundary conditions. To treat this problem that debe discretized using the
the spatialapproaches of Chapter
dependencewill
ordinary differential equations in
the form of a 2 to
yielda set ofproblem. Then the time-integration
normal
approaches
also intro2 can be used. This approach is
ducedin Chapter We will see that
sometimes
calledthe
a central issue in
OFLINES.
this approach is
METHOD
of time integration, which is
the
now closelycoupled
numericalstability
to
discretization (Press, Teukolsky, Vetterling,
spatial
and Flannery,
the
1986).
Strang,
1992;
Anyof the methods introduced in Section 2.9 can be used for spatial
discretization.In the weighted residual formulationfor one spatial
we look for an approximate solution (x, t); a truncated
dimension,
(discretized)series of basis (trial) functions (j (x); the difference now
is that we allow the coefficients in the series to depend on time. That
is
Notethe similarity of this expression to those arisingin the separation

ofvariablestechnique. We assume for the moment that the basis functionssatisfy the boundary conditions and define the residual or error
by
IN
+ LuN f (x)
t
Theresidual is now forced to be orthogonal to the set of N test functions(Pi;that is (R, (V)= 0, i 1, 2, , N. In the Galerkinmethod the
testfunctionsequal the trial functions so this conditionbecomes
Ifwelet Mij = W, (i), Aij = (14j,
theweightedresidual conditions as
and bi = (f,
canwrite
= -M -1 Ac + M-l b
Thisis a set of linear ODEs (an initial-value problem) for the vector
coefficientsc in the series for uN. We have reduced a partial different
equationto a system of ordinary differential equations,
318
VectorCalculus and Partial Differentia/ Equations
method is used, then there are only IV

tau
Galerkin
the
If
equations, where IVI,cis the number of bound
differential
ordinary
c
boundary conditions add ATI,algebraic
ary conditions. The explicitly solved for the last NL,c valuesequations
of c and
Typically, these can be
ODEs.
the
into
the formulas substituted if we use the collocation approach.
Nowwe
A similarresult arises operators in L by their matrix
approxima_
replace the spatial derivative
operators. This yields
differentiation
tions, the collocation
in the interior of the domain

(Xi) + LNu(xj) = f (Xi)for
u(Xi) = uc(Xi) on the boundaries
obtained by inserting the COIlocation
Here LNis the matrix operator
differentiation operators.
approaches, the PDE has been re_
In both Galerkin and collocation
we know how to solve these.
duced to a system of ODEs.In principle,
stability considerations that
In practice, though, there are numerical
arise because the matrices derive from the approximation of derivative
operators.
3.41 NumericalStability Analysis for the Diffusion Equation

To get an initial idea of the stability issues we face when numerically
solvingPDEs,we look at the diffusion equation in one dimension,
ut = Duxx
in an unbounded domain. Taking the Fourier transform of this equation gives t(k) = k2D(k), for all real values of k. This is a system
of linear ODEswith eigenvalues A = Dk2. If we want spatial resolution of wavelengthsas short as 27T/kmax,an explicit Euler scheme
= 2/ (Dkfnax)to ensure stability. Definwould require At < 2/max
2
Tr
ing emin= kmax as the smallest wavelength resolved, we can rewrite
this stability limit as
-1
AtD
12
< 2r2)
This result shows that, to within a numerical constant, the time step
for explicitEulermust be shorter than the time scale for diffusion over
a distance Cmm.
A similar result holds when finite element or finite difference methods are applied. For simplicity, we will consider a finite difference ap-
Solution of Initial-Boundary-Value
3 4 Numerical and Numerical
Stability
Problems
pjscretization
the diffusion equation,
prooation to
using the
central difference
foruj-l - 211j +
-D
dt
spacing between
whereh is the 2 that the finite mesh points x j and u
chapter
element discretization
from
identical
an
form
using hat funcfor the second
tionsleads to
derivative.The
approximationto this ODEis
forward
Euler
(n) DAt ( (n)
an approach initially developed by von

FolloMng
Neumann,we will
periodic
solution
to
spatially
this
a
equation: u(n) = eikxj= eikjh
seek
(The
arbitrary.
full
solution
is
is a superposition
wherek
unbounded
domain, this yields an exact over all k.) In
a periodicor
solution
discretizedproblemin a bounded domain it works very wellto the
when
kL> 1 where L is the domain size. Substituting into the equation
gives
above
2DAt
2DAt
coskh eikjh
(n)
2DAt
2DAt
+ hQ
cos khl is the growth factor, which for nuHereG = 11 VV< 1. When k = O,G 1, whichmakes
mericalstability must satisfy
physicalsense because k 0 corresponds to a constant function, which
doesnot decay by diffusion (there are no gradients). As k increases, G
decreases,taking on its most negative value when kh

stabilityat this value of k requires that
2DAt
IT.To maintain
(3.92)
Indeed,one common indication of numerical instability in a solution is

theobservationof "sawtooth" patterns with a length scale closeto h.
Equation(3.92) is the key result of numerical stability theory for

parabolicdifferential equations and is sometimes called the diffusive
COUrant-Friedrichs-Levy
(CFL) condition10. It mirrors the result we
foundabove using Fourier transforms. to within a constant, the time
given in the
The true CFLcondition was derived for convection problems and is
followingsection.
10
VectorCalculusand Partial Differential Equations
320
than
step At must be smaller
the time 112/1)required for diffusion
the time step if high spatial reSOIUtion

is
a very severe restriction on boundary layers.
with
problems
required, as in
means that for problems where
dif.
This severe stabilityrestriction
high),
not
implicit
is
number
integration
fusion is important (Peclet
second-order
The
Adams-Moulton
techniquesare almost alwaysused.
difference
approach used here
method (AM2)is popular. For the finite
AM2becomes
u (11+1)
IDAt
(n)
2u
2 h 2 k j-l
method. The linear system that
This is called the CRANK-NICHOLSON
must be solved at each time step is tridiagonal so it can be factored
quickly.
3.4.2 NumericalStability Analysis for the Convection Equation

Wejust considered diffusion, so it makes sense now to look at convection. The transient convectionequation in one dimension (also called
the FIRST-ORDER WAVE EQUATION)
(3.93)
where v is a constant velocity. The Fourier transform of this is =

ikv. Nowthe eigenvalueis purely imaginary. Recall that imaginary
eigenvaluespose a problem for many time-integration schemes; many,
including forward Euler and RK2,are never stable for problems with
imaginaryeigenvalues.
Usingthe central difference formula
u
x
Equation (3.93)becomes
U(Xj+1)U(Xj-1)
duJ
vAt (n)
dt
2h uj+l
The same right-hand side arises in the finite element approximation
with hat functions. The forward Euler approximation is
(n)_ vAt (n)

2h ku j+l
(3.94)
3.4 Numerical solution of
Initial-Boundary-Va1ue
Problems,
whichis sometimes called the

forward-time
cretization. It is first-order accurate
in time center-space
and
(FTCS)
To analyze stability we again
second-order
disseek a solution
accurate
cn
nitudegreater than one when k h sinkh

for the convection equation, as we
guessed from
above,which revealed that the
eigenvaluesof
use a different approximation for the

spatial
wemight expect that we should
eikxj
the Fourier
the convection
analysis
operator
only use derivative.In

computing
the
when
informationfromparticular
solution at the
next
physicalproblem, convection carries
time
the valuestepafter
all,in the
of
u
should ideally
downstream,
so
valuesupstream of it. Applying this
only be
idea, we replace determinedby

above by a forward or "upwind"
the
difference. For v centraldiffer> Othe forward
(n) vat
Thisgives the growth factor
Thestability condition IGI < 1 will hold if
vAt
vAt as
DefiningC =
the COURANTNUMBER,
the stability conditionbe-
comes
C < cmax
(3.95)
wherein this case Cmax 1. This is the COURANT-FRIEDRICHS-LEVY
Physically,
CONDITION,
CONDITION.
often simply called the COURANT
it tells us that the time step must be smaller than the timeit takesfor
convectionat speed v over one mesh unit h. Byreplacingthe central
difference,which has second-order accuracy in space, with an upwind
gained
difference,we have lost an order in spatial accuracybut have
small
One
stability.And anyway, the method is still first order in time.
velocitycan
the
where
problems
Complicationof this method is that for
322
Equations
to take care that the appropriate

changesign, it is necessary
upwind
If downwind differencing is used, the
difference is used.
approximahon
unstable.
is always
without use of upwind differences.
Stability also can be gained
The
method is a simple modification to the FTCSdiscretizas
LAX-FRIEDRICHS
at point xj is replaced by the average
tion where the present value
of
j 1
the values at points j + 1 and
u (n+l) 2
(n)
u j+l +
u
211 t j+l u j 1
(3.96)
By applying the average, this change effectively introduces a small
amount of smoothing or "numerical diffusion" into the time-integration

process. This can be seen explicitly by rewriting (3.96) so that it has
the form of (3.94)with an additional remainder term that indicatesthe

difference between the two methods
(n)
vAt
(n)
u j _1)
(.n)
2u(.n) + u J+I)
(3.97)
The remainderterm has very nearly the form of a central difference

approximationto the second-derivativeoperator, and in fact thisex-
pression is precisely the FTCSapproximation to a convection-diffusion

equation with artificial or numerical diffusivity 2At
h2
uxx
ut + VuX
2At
This diffusion term is enough to stabilize the method: using the von
Neumannanalysis the stability criterion is found to be very similarto
what we found for the upwind scheme but is now insensitive to the sign
of v
IvlAt ICI<I
=
h
All the methods developedso far for the convectionequationare
first order in time, so even if the stability condition is satisfied,the soon

scheme builds
lution may not be very accurate. The LAX-WENDROFF
01+1/2)
the Lax-Friedrichsscheme to yield second-order accuracy. Let uj1/2
on
be intermediate values at the midpoint t + At/ 2 of the time step and
"half-meshpoints" xj h/2. Lax-Friedrichs,(3.96) is used to generate
these intermediate values
(11+1/2)
j1/2
(n) h _ VAt l/ (n)
211 kujl
3.4 Numerical Solution of
Initial-BoundQFYVQ1ue
Thissolution is used in a modified
Problems
FTCSstep
to
generate
the
j +1/2
solution
u(+1/2)
Eliminatingthe intermediate values,
this can
be rewritten
in the
more
n
211
-u(
(VAt)2
j-l
2u (n)
Thisis almost identical to 3.97; the

difference is
diffusivtyhas the value v2At/2. The
that now
the
stability
the
since
now,
and
method
1
is second
conditionis artificial
be set very close to the stability limit order in time, the againICI<
time step can
for many purposes. Lax-Wendroff and and still Yieldenough
related methods
accuracy
are thus widely
pure convection does not change
the
tion,convection only carries the initial amplitude of an initial
condition downstream. condithe methods described here, some amplitude
In allof
damping
precisely
when
except
k = O or C 1. we
occurs (IGI< 1)
care most about
this damping
whenkh is small, corresponding to length
scales
that arelargecompared to the grid size, i.e., IGl should be very
close to unity for all
scales of interest. If we care about length scales
length
close to h, then we
have
h too big; h should
made
always be chosen to be much
smallerthan
thelength scales over which the true solution varies.
Taylor-expanding
2 around kh = 0 yields
IG1
IG12 = 1 - (1 and
IG12 = 1 - c 2(1 - c 2) (kh)4

4
for Lax-Friedrichs and Lax-Wendroff, respectively. The latter is substantially better, since the deviation from IG12 1 scales as (kh)4
rather than (kh) 2
3.43 Operator Splitting for Convection-DiffusionProblems
limitsof the
Thecases above represent the low and high Peclet number
generalconvection-diffusion equation
ut + VuX = Duxx

Calculus
Vector
324
would use central differ

equation
_
for this
the
for
convection
scheme
Lax-Wendroff
explicit
A simple diffusionterm and
encesfor the
(n) -2u(n) + Uj+l
DAt kujl
(n+l) _ u (n)
u
(vAt) 2 (n) 2u(jn) +
(n)
)
+
u(n)
(n) Uj -1
-- kuj-l
vAt
large, the diffusive term controls

very
is
number
factor that scales as
growth
a
Unlessthe peclet
to
becauseit leads
We could use an imthe stability, It-I from the convective term.
but this entails solution of a large
rather than the
problem,
whole
the
every time step if the problem is
plicit method on
(at
problem
non-self-adjointmatrix preferable to use an implicit method only on
nonlinear).It wouldbe is self-adjoint. A popular solution is called
which
the diffusivepiece, an explicitmethod is used for the COnvective
SPLITTING;
OPERATOR
for the diffusive ones. For example,Laxmethod
implicit
an
terms and
convective terms and Crank-Nicholson
the
for
used
be
Wendroffcan
executed in two steps:
often
for the diffusive.This is
applied, to give an intermediate solution
are
terms
convective
The
1.
(n)\
(n) vAt ( (n) uj-l)
(vAt)2 u (n)
2h2 k j-l
values instead
2. Crank-Nicholsonis applied, using the intermediate
of the values at step n
IDAt
2 h2 \uj-l
IDAt (
2 112\uj-l
+ uj+l
(n+l)
In methods like this, because the diffusion terms are evaluated implicitly, the stabilitylimit is set by a Courant condition on the convective
terms. In fact, one might also get away with an unstable (e.g.,FTCS)
method for the convectionterm, relying on the implicit treatment Of

the diffusionterm to stabilize the overall result. There is not generally
a good reason to do this.
3.5 Exercises
3.5
Exercises
325
Exercise3.1: Gradient formula from

gradient
a cubic volume with
one
consider
definition
corner at
the origin
(X,y, z) = (Ax, Ay, AZ). In this case the
and
integral
definitionthe
opposite
of the
corner
gradientgradat
dS
Becausewe are going to shrink the volume to
zero, we
can make
the truncated
Taylor-
y
z
wherethe derivatives are evaluated at the origin.
Combine these
to derive
the formula
arguments
hold
same
for
The
the
other two terms,

so
c
grad c = ex
c
x + ey y
z =
Exercise3.2: Derivatives of unit vectors in polar

(cylindrical)
coordinates
Bytaking limits in polar coordinates, derive the formulas
L=o
er
-Q=o
for the derivatives

of the unit
eo
Donot refer to Cartesian coordinates in your derivation.
Exercise3.3: Divergence of the flux in polar coordinates
Derivean expression for the divergence of a flux in polar coordinates,

V q,
q is an arbitrary vector. Do not use Cartesian coordinates in your derivation.in which
Hint: the answer is
1 qe
(rqr) +
r r
Exercise3.4: Gradient and Laplacian in spherical coordinates

RepeatExample 3.1 and find expressions for V and V2 for sphericalcoordinatesshown
in Figure3.3. Do not refer to Cartesian coordinates in your derivation.Then derivethe
resultusing the h and g formulas provided in the text. Whichmethoddoyouprefer

and why?
Hint:the answers are in Table 3.1.
Exercise3.5: Fundamental identities in vector calculus
(hereu, v, and
UsingCartesian tensor notation, derive the following identities
vectorsand
is a scalar).
(b)
(C)V (uu) = U Vu + UV u
w are
Vector Calculus and Partial Differentia/
326
Exercise 3.6: Cross-product

(a) Verify that Cijkklm
results.
identities
vu
(v w)u
(d) (u x v) XW = (u
= (vu uv)
(f) V X (V X V) =
jl. Now use this to derive

the follo
Wing
iljm
VV.u-V.
(b) v x v x u =
(e) (u x v) x w
Equations
llv112) - (v.
Exercise 3.7: A special case of Leibniz's rule
DeriveLeibniz's rule for the special case where the volume V is a cube whosesize
is constant but is moving with velocity q. In other words, explicitly show that the
contribution from the motion of V becomes Js mq ndA.
Exercise 3.8: Adjoint of curl

Find the adjoint of the curl operator with Dirichlet boundary conditions.
Exercise 3.9: Volume as surface integral

(a) If A is a constant vector and r = llxll, then show using Cartesian tensor notation
that
and
(b) Show that
(c) Use this result and the divergencetheorem to derive a formula for the total
volume T = JV dV of a region V in terms of an integral over the surface S of the
volume.
Exercise 3.10: Curl theorem

Use the divergence theorem and results of vector algebra to show that
VxvdV=
n x v dS
Exercise 3.11: Poisson equation in a no-flux domain
Consider the Poisson equation
v 2u = f
T' n
in a volume T with (no-flux)boundary
condition n Vu = O on the boundary S Of
is the outward unit normal vector on the
boundary.
3.5
Exercises
(a) Use the divergence theorem to show

that a
(b) Iff
v for some vector v,

what
necessary
condition
3.12:Helmholtz decomposition
Exercise
condition
for the
must v
existence
satisfyfor
the
resultof
Underrather general conditions it is possible

to write a
vector field
q(x) as
of
Problem
3.5, find
Usingthe results
independent
equations for
and v in
Exercise3.13: The Stokes equations for

viscous flow
Thestokes equations for the velocity u
and pressure
termsof
p in a viscous
flowdriven
bya
Theseequations can be written in matrix-vector form

as
where
If u = 0 on the boundary S of the flow domain V, show

that the Stokesoperator
A is
then
(AU,V) = (U,AV)
wherethe inner product is given by
u v dV+
pq dV
Exercise3.14: Differentiating functions of a matrix and matrix determinant

Derivethe following two differentiation formulas.
(a) Usethe polynomial expansion of a matrix function to showthat
dt
scalar functionf().
in which g() =
is the usual derivative of the
For the special case of f (A) = In A for A nonsingular, we obtain
dt
dt
and
Calculus
vector
328
nonsingular
A,
differentiate
to
(b) For previouspart
the
of
3.15: Euler
(1.35) with respect to scalar t and use the
show that
det(A) tr A-l A
d detA =
dt
expansion
continuum.
of the
given by
coordinates is
(3.99)
formula
reference
representthe
u
Letcoordinates
Exercise
Equations
position of a point in a deformable
)dVx
dVu = det
of the
is the determinant
Jacobian matrix of the transformation.
matrix differentiation formula of thepre.

in which Jacobianis nonsingular, use the
sumingthe
establish that
viousexerciseto
detCux
expansion (or dilation) formula.
Euler
the
as
whichis knoval
profile in tube flow

Temperature
3.16:
Exercise
Check the following points.
in (Bird et al., 2002, p.384).
ReadExample12.2-2
Equation 12.2-21 into 12.2-23. Then exchange the orderof
(a) SubstituteY from
integral can be performed. Then,makea
integration,and show that the inner
change of variable to obtain
t-1/3 e-t dt
whichis equivalent to 12.2-24.

(b) Evaluate the derivatives
2
x 2
(c) Verifythat the temperature profile in (a) satisfies the differential equationin
Equation12.2-13.Usethe chain rule and the results from (b).
(d) What is the numerical value of r (2/3)?
Exercise3.17:The error function and some useful integrals
The error function is defined by
erf (z) =
e t2dt
Notethat
e t2dt=
The complementary
error function defined
by
erfc(z) = I -- erf (z)

erfc(z) = 2
00
e t2 dt
3.5
Exercises
329
function and the complementary

the error
error function.
sketch
(a)
function f (x)
considerthe
cos(2tx)dt
Differentiatef(x) and then integrate by parts to show
dx + 2xf(x)
that f satisfies
the dif-
initial condition for this ODE?

Whatis the
the ODEand show that
Solve
(c)
(d)Lett
au and x = b/(2a) and show that
(e) Integrate
Tr
exp
2a
cos(bu)du
the previous equation with respect to b on the interval

to,
the order of
integration and show finally that
-a2u2sin(u)
m. Change
du
3.18:Other useful integrals

Exercise
the following function with respect to x

Differentiate
e2aberf (ax + ) + e-2aberf (ax andderivethe indefinite integral (Abramowitz and Stegun, 1970, p. 304)
2x2_
4a
e2aberf (ax + ) + e-2aberf (ax
+ const. a *0
Usetheindefiniteintegral to derive the definite integral

-a2x22
dX =
b
e-2aberfc(
ax) e2aberfc( h + ax)l

(3.100)
Fromthis result, show that

x22
ax =
bO
(3.101)
Thisintegralarises in transport problems in semi-infinite domains (seeExercises3.19

and 3.23).
VectorCalculus and Partial Differentia/
330
useful Laplace
Some
3.19:
Exercise
transforms
transform pairs are useful for solving transient

Laplace
Thefollowing
f(t)
k>0
e kv/
(a) Establishthe first entry by
erfc
taking the Laplace transform of the function
f (t) = erfc
Use the definition of the Laplace transform, switch the order of integration,and
use Equation 3.101.
(b) Establishthe second entry by differentiating the first f (t) with respect to t.
(c) Establish the third entry by differentiating the second f (s) with respecttos.
Exercise3.20:A transform pair for reaction-diffusion problems

The followingLaplacetransform pair is useful in solving problems with simultaneous
diffusion and first-order reaction (Carslaw and Jaeger, 1959, p. 496)
e-kvs
(s cx)v
f(t) = 1 at e-kerfc
'Xerfc
Derive this result by using the convolution theorem and the last entry in the tablein
Exercise 3.19. You will also require the integral (3.100).
Exercise 3.21: Integral representations of

The followingintegral representation of Ko proves useful in applying Laplacetransforms to solve the diffusion equation
1
20
(3.102)
The following argument provides a derivation.

(a) Denotethe integral by
fo(x) =
1
20
Differentiate with respect to x and show

1 d
dfo
x dx
dx
= 2x2
3.5
Exercises
(b) Next use integration by parts to show

-(x2t+I)dt =
X2
331
substitute this result into the
equationand
showthat
fo
Thereforefo is of the form
satisfiesthe
fo(x) = allo(x)
+ a2K0(x)
with some constants al , 612.
(c) Given the integral defining f)
what value
does
(d) Nextuse l'Hpital's rule to show that
approachfor
largex?
lim fo(x)
In(x)
(x) -In(x) as x
It is known that
O(see
p. 375)),so we conclude that
= 1 and fo(x) =(Abramowitzand stegun,1970,
Ko(x).
Exercise3.22: More useful Laplace transforms
Usethe integral representations
of the modified
Bessel function Ko
derived in Exercise3.21 to derive the following Laplace transform pairs.
f(s)
f(t)
1
e-
1
1<1(kv), k > o
e-TE
Thesetransforms are also useful in solving transient heat-conductionand diffusion
equations (see Exercise 5.9).
Exercise3.23: Time-dependent heating of a semi-infinite slab

Considera slab of infinite thickness, density p, heat capacity Cp, and thermalconduc-
tivityk with a surface at x = 0. The boundary conditionsare
(a) Definethe following scaled variables
VectorCalculusand Partial Differential
332
energy equation
Show that the
reduces to
x 2
with boundary
Equations
conditions
T>0
9(0, T) = 1
parameters in the problem, but there is also no

Notice that there are no
natural
problem.
length scale for this
transform of the PDEand show that
(b) Take the Laplace
e-xv
What assumptions did you make?
using Exercise 3.19 to obtain
(c) Take the inverse transform
(x, T) = erfc
Plot 6(x, T) as a function of x on 0
10 for T = [0.01, 0.1, 1, 10, 100, 1000].
(d) Show that the proposed solution satisfies the PDE and BCS.
Exercise 3.24: Partial fraction expansion

Weoften teach inversion of Laplace transforms by so-called partial fraction expansion
For example, given
(s
b)(s c)
Notethat a b c ensures a, b, and c are simple zeros of the denominator polynomial.

The function f (s) is first written as a summation of simpler fractions
(s
b)(s c)
(3.103)
and the coefficients A, B, and C are determined. Then the inverse is simply
f (t) = Ae at + Bebt +
(a) DetermineA, B, and C in the partial expansion approach and determinef(t).

(b) Apply (3.89)with P(s) = 1 and Q(s) (s a)(s b)(s c), and find f(t)
using (3.89). Whichmethod do you prefer and why? Notice that (3.89)canbe
applied when the denominator q (s) is more general than a polynomial as shown
in Example 3.14.
3.5 Exercises
Exercise3.25: Transient heat
333
conduction in
a finite
slab
.
we have a one-dimensional slab Withends (-kVT)
at uniform temperature To. Just after t
located
= 0, the at x
temperatureTl and held at this
two ends
are
we Wish
to findimmediately
the transientraisedto
(a) Write the PDE and (three)
boundary
solution
at x = L, x -L, and t = O.
conditions for
How many
this situation,
parameters appear
i.e.,
(b) Choose nondimensional
in this conditions
temperature,
problem?
spatial
temperature.
position, and
TI-T()
Express the PDE and BCSin these

nondimensional
tk
time variables
as fol-
variables. How
manyparam-
(c) Take the Laplace transform of the PDE,

apply the boundary
cosh(vGz)
S cosh
conditions,and show
(d) For what s values in the complex plane is

9(z, s) singular?
(e) Invertthe transform and find 9(z, T).

Hint: the answer is
(1)n
(n + 1/2)TT cos((n + 1/2)Trz) e
(3.104)
(f) Show that (z, T) satisfies the PDEand boundary conditions

at z = 1. Does
the solution satisfy
the initial condition? How would you check
this?
(g) Whatis the steady state,
(z), i.e., take the limit of
as T 00.
Exercise3.26: Heat conduction in a cylinder and a sphere

Let'schange the body in Exercise 3.25 from a slab to a cylinderand a sphereand see
whathappens. Again assume the body is initially at uniform temperatureTo.Just after
t = 0, the outer boundary at r = R is immediately raised to temperatureTl and held
at this temperature. We wish to find the transient solution T(r, t) for theseproblems.
i.e.,
(a) Write the PDE and (three) boundary conditions for the cylindrical body,
in this
conditionsat r = R, r = 0, and t = 0. Howmanyparametersappear
problem?
(b) Choose nondimensional

lows
time variablesas
temperature, radial position, and
tk
T-To
pCpR2
fol-
Howmanyparamvariables.
nondimensional
Express the PDE and BCSin these
eters appear in this problem?
and
Calculus
Vector
Equations
334
1
cylinder
0.8
1
0.6
slab
0.8
0.4
0.6
0.2
0.4
0.2
O
0.2
0.4
0.6
0.2
0.8
0.6
0.8
sphere
0.8
0.6
0.4
0.2
0
O
0.2
0.4
0.6
0.8
of slab, cylinder, and sphere from

Figure 3.11: Transient heating (3.106). Dimensionless temperature
(3.104),(3.105), and
(E, T) versus at T = 10-4, 10-3, 10-2, 0.1, 0.5.
(c) Take the Laplacetransform of the PDE,apply the boundary conditions and find
s) for the cylinder.Youdo not need to invert this transform.
(d) Writethe PDEand (three) boundary conditions for the spherical body, i.e., condi-
tions at r = R, r = 0, and t = 0. How many parameters appear in this problem?
(e) Choosethe same nondimensional temperature, radial position, and time variables as follows
T-TO
Tl-To
tk
pCpR2
Expressthe PDEand BCSin these nondimensional variables. How many parameters appear in this problem?
(f) Take the Laplacetransform of the

PDE,apply the boundary conditions and find
for the sphere. Youdo not need to
invert this transform.
Exercise3.27:Transient
solutions for slab, cylinder, and sphere
Wewish
to plot and compare
for the slab, cylinder,
the temperature profile 9(;, T) versus
and sphere geometries.
at differentT
Exercises
335
(a) The
transform for the cylinder is given by
o,s) =
slo(v')
zeros of the denominator.

Findthe may want to use the Bessel
function relations
Hint: you
(Abramowitz
and
(ir)/i
Stegun,
1970, p. 375, Io(r) = Jo(ir) and
Jl
(r)
11
9.6.3.).
(3.89)and show that the inverse is given by
Use
(b)
1
(3.105)
(c) The
transform for the sphere is
9,s) = sinh(vf9
Sinhvr
Findthe zeros of the denominator Sinhv". Note that the denominator
has a
doublezero at s = 0 because both s and Sinh vanish
at s = 0.
(d) Becauseof the double zero, we cannot use the inversionformula(3.89),which

assumes simple zeros. But notice the following fact. If the Laplacetransforms
f(s) and @(s)satisfy
(s) = f(s)
s
then their inverse transforms satisfy
g(t) = f(t')dt'
Therefore define
= s,s) = sinh(v'9
Sinh
Use(3.89)to invert this transform and show

(1)t1+1
n=l
(TITT)
sin(nrr9e -n
(e) Performthe time integral and show that
Noticethat the following series is the Fourier sine series of the linear function
(Selby,1973, p. 480)
(1)71+1
so we have
n=l
sin(nTT9 =
(l)tt+l
sin(nTTE) e-
n2Tt2T
(3.106)
vector
calculus and
Partial Differential Equations
336
T) versus at several for the slab

profile 0(;,
temperature
results should resemble Figure 3.11
of the geometries. Your
plots
slowest? Give a physical eXp1anation
(f) Make and sphere
quickest? The
cylinder,
the
up
geometryheats
results.
for these
c'
separation of variables
in a sphere by
diffusion
Transient
spherical geometry described in Exercises
Exercise3.28:
problem in the
the information in
diffusion
using
the transient by separation of variables,
it
solve
3.26and 3.27.
Fourier
Exercise3.29:
Example 3.8
series
1 on the interval x
function f(x)
e
coefficients for the
..
5x,
cos
series
3x,
cos
x,
Find the Fourier the odd cosine terms {cos
[-77/2,TT/2Jusing
f(x) =
an cos(2n + l)x
n=0
(3.104) in Exercise 3.25.

the initial condition of
Usethis result to checkorthogonality property f_ r 12cos nx cos mxdx = -L
Hint:first establish the
discussed in Section 2.4.1.
by taking inner products as
1,2,.. .. Then obtain the an
oati(
Exercise3.30: Plancherel's formula

Plancherel'sformula states that
2TTJ
If (x) 12 dx =
dk
the expression on the right. In

Beginwith the expressionon the left and from it derive
00
complex.
e
can
be
j
(k)
and
=2
general,both f (x)
l).
Exercise3.31:Green'sfunction for a fourth-order problem

(a) Usethe Fouriertransform technique to solve the ordinary differential equation
d4G
d2G
Usea computeralgebraprogram or a math
that is required to find G(E).
handbook to perform the integral
(b) ThefunctionG from the
previous problem is the Green's

function for the ordiL = d4 - 272+ 1.
Use this Green's function to solve
Lu = f(x),u(x)
0 as x oo,where f
(x)
= 1 if 0 < x < 1 and 0 elsewhere.
Usenumericalintegration
to approximate the
solution for IXI < 10.
nary differential operator
Exercise3.32:A square
with one
solve v2u = o in
a unit
u = 0 on x
heated wall
square
0, x = I and y domain O < x < 1, O < y < 1, with boundary conditionS
= 0, and u
= I ony = 1.
tat
tkk
3.5
Exercises
337
3.33: separation of variables for the wave equation
variables to solve
separationof
the
utt = c2uxx
followingboundary conditions
u(x,()) 1 -x,
that your
u(O,t) =
x > 1/2'
= O,
SOIutioncan be put in the D'Alembert form

u = Fl (x --ct) +
F2(x + ct).
Separation of variables for a partially

heated sphere
Exercise3.34: variables to solve
for the
of
steady-state temperature
Useseparationbottom half the surface temperature
distribution in a
is kept at T = O,
spherewhose Use the transformation X =
1.
cos (t) to convert the and whosetop
=
T
at
is
half
equationin the
to Legendre's equation. Note that the
eigenvalues
polarangledirection
of Legendre's
+ 1) for positive integers n. The
=
corresponding
equationare polynomial
eigenfunctionis
Pn(X). Explicitly find the first four
terms in the expansion.
theLegendre
equation in spherical coordinates (r, O,(b) is
Laplace's
1
r 2 sinc
sin 4)
Y2 sin2
2T
02
Exercise3.35:The Helmholtz equation

Considerthe wave equation
utt = c2 V2u
00
in the domain y > O, < x < 00,with boundary condition u(x,y = O,t) = f (x)eiwt
Thisequation governs sound emanating from a vibrating wall.
l. Byassuminga solution of the form u(x,y,t)

equationcan be reduced to
, showthat the
c0 2 v c 2 V 2 v = O
withboundary condition v (x, 0) = f (x). This is the HELMHOLTZ

EQUATION.
2. Findthe Green's function G = Geo+ GBfor this operator with the appropriate
boundaryconditions, using the fact that the Besselfunction Yo(r) ?lnr as
3. Usethe Green's function to solve for u(x,y, t).
Exercise3.36: Transient diffusion via Green's function and similaritysolutionapproaches

InExample3.12 we found that the transient diffusion problem
Gt = DGxx
has solution
e -(x-9214D(t-T)
Wehavechanged notation here to emphasize that this solution is the Green's function
fortransientdiffusionwith delta function source term f (x, t) =
"(t
- T).
vector Calculus
Equations
338
analogouS to (3.71), to find the
with a result
solution
along
result,
this
Use
(a)
problem
t)
ut = DUxx + f(x,
the initial-value
condition u(x, 0) = uo (x).
initial
and
00< x < 00
in the domain x > O with boundary
O
=
f
case
condition
the
=
condition u(x > 0, 0) 1. Use an image or symm
(b) Nowconsider
initial
in the unbounded domain,
u(0,t) = 0 and
this into a problem
where
convert
to
argument
in Exercise 3.17 may be useful. This
information
The
solution
can apply (3.72). transforms in Exercise 3.23.
is found by Laplace
of SIMILARITYSOLUTION.That
again by the method
is, 0b_
problem
this
problem
is
the
combination
(c) Solve
the
in
scale
=
length
serve that the only
convenient), and seek a solution
(the factor of 2 is
arbitrary but
u(x,t) = u(n)
the governing equation and applicationof the

Substitutionof this form into differential equation.
chain rule leads to an
ordinary
in a circular domain
Exercise3.37:Schrdingerequation
"quantum corral"an arrangement of atoms on a surface

The wavefunction for the
by the Schrdinger equation
designedto localizeelectronsisgoverned
w _ v2q,
t
Useseparationof variables to find the general (bounded) axisymmetric solution to this
problemin a circulardomain with (P = 0 at r = 1. Hint: if you assume exponential
growthor decayin time, the spatial dependence will be determined by the so-called
modifiedBesselequation. Use the properties of solutions to this equation to showthat
there are no nontrivial solutions that are exponentially growing or decaying in time,
thus concluding that the time dependence must be oscillatory.
Exercise3.38:Temperature profile with wavy boundary temperature

Solvethe steady-state heat-conduction problem
uxx + uyy
in the half-plane00< x < 00,0 < y < 00,with boundary conditions
A + Bcos cxx = A + 2(eiax + e-iX) and u(x,y) bounded

as y
00. Use the Fourier
transform in the x direction. How far does the wavy
temperature variation imposedat
the boundarypenetrate out into the material?
Exercise3.39:Domainperturbation
analysis of diffusion in a wavy-walled
slab
SolveV2T = 0 in the wavy-walled
domain shown in Figure 3.12. The top surface is at
y = 1, the left and right
boundaries are x = 0 and x = L, respectively, and the bottom
surfaceis y = ecos 2TTX/L,
where e < 1. Find the solution to O(E) using domain
perturbation.
3.5 Exercises
339
=O
Figure 3.12: Wavy-walled

domain.
Exercise3.40: Fourier transform for solving heat
conductionin a
strip
solvethe steady-state heat-conduction problem
uxx + uyy = O
in the infinite strip 00< x < 00, 0 < y < 1, with boundary
conditionsu(x,0) =
uo(x), u(x, 1) = 111(x). Use the Fourier transform in the x direction
to get an ordinary
differentialequation and boundary conditions for (k, y).
Exercise3.41: Separation of variables and Laplace's equation for a

wedge
Useseparation of variables to solve Laplace's equation in the wedge 0 < e < a, 0 <
r <
boundary conditions u(r, 0) = 0, u(r,
1,with
= 50,u(l, O)= 0.
Exercise3.42: Laplace's equation in a wedge

Againconsider Laplace's equation in a wedge, but now fix the wedge angle at tx= Tt/4.
should
Usethe method of images to find the Green's function for this domainwhere
the images be, and what should their signs be? A well-drawn picture showingthe posi-
tionsand signs of the images is sufficient. The first two images are shown.Theydon't
completelysolve the problem because each messes up the boundaryconditionon the
sideof the wedge further from it.
+ position of point source
Equations
VectorCalculusand
340
form of
Exercise3.43: D'Alembert
the wave equation
= U(x -- at) where a

of the form u(x, t) WAVEEQUATION is to be deter_
solutions
for
(a) Bylooking
general solution of the
the
that
show
mined,
2u
2 2u
t2
X2
u(x,t) = f (x -- ct) + g(x + ct)
is
where f and g are arbitrary.
condition u(x, 0) =
find the solution with initial
Iv(x),
(b) Use this solution to unbounded domain. Pick a shape for w (x) and sketch
the
u/t(x, O) = 0 in an
solutionu(x, t).
Exercise 3.44: Heat equation in a
The solution to the heat equation
semi-infinite domain
ut = Duxx
uo (x) is
subject to the initial condition u(x, 0) =
u(x,t) =
find the analogous solution for
Use this solution and an argument based on images to
condition u(0, t) = 0 and
boundary
the same problem,but in the domain x > 0, with
with initial conditionu(x > 0, t = 0) = (x).
Exercise 3.45: Vibrating beam
The transverse vibrations of an elastic beam satisfy the equation

utt + KUxxxx
where K > 0. Use separationof variablesto find u(x, t) subject to initial condi= 0 and boundaryconditionsu(0,t) = u(L,t) = o,
tion u(x,0) =
uxx(0, t) = uxx(L, t) = 0. These conditions correspond to a beam given an initial

= c has
deformation and held fixed at its ends x = 0 and x = L. Hint: the equation
solutions = +8/ 4 and = ic 1/4 where c l /4 is the real positive fourth root of c.
Exercise 3.46: Convection and reaction with a point source

Use the Fouriertransform and its properties to find a solution that vanishes at
the ordinary differential equation
for
dx = au+ (x)
where a > 0. Recall that F -1
1 ealxl
Exercise 3.47: Green's function for Laplacian operator

(a) Find the free space Green's function Goofor the Laplacian operator in three dimensions. It is spherically symmetric.
3.5
Exercises
341
(b) show
a
that if u is solution to Laplace's equation, then
so is vu, as
(c) showthat
Eij
-;uis also a solution, for any
constant tensor
as
E.
3.48: Zeros of sine, cosine, and exponential in the

complexplane
definition of the exponential to complex
extendthe
argument
We
ex +iy = ex (COSy
z e C as follows
i Siny)
usual definitions of real-valued

we take the
ex
cosine
to complex arguments in , cosy, siny for x, y e R. We
and
sine
terms
the
of the exponential
extend
cos z =
iz
sin z = etz
eiz
definitions, find all the zeros of the following

Giventhese
functions in the complex
plane
function sin z.
(a) The
Hint:using the definition of sine, convert the zeros of sin z to solutions
equatione2iz = 1. Substitute z = x + i)' and find all solutions x, y e R. of the
Notice
that all the zeros in the complex plane are only the usual ones on the real
a,'ds.
(Answer:
only
cos
z.
the usual ones on the real
(b) The function
ds.)
(c) The function ez. (Answer: no zeros in C.)
Exercise3.49: A Laplace transform inverse
TheLaplaceinverse for the following transform has been used in solvingthe wave
equation
sinh(as) sinh(bs)
sinhs
a, be R
Findf (t) , and note that your solution should be real valued, i.e., the imaginary number i
shouldnot appear anywhere in your final expression for f (t).
3.50:Wave equation with struck string

Exercise
Example3.15 and the wave equation
= uxx on x e 10, for a string with
Revisit
=
fixedends U(O,T) = U(I,T) = 0, but the plucked string initial condition,
uo(x),
= v(x). Here there is zero initial deformation,but a nonzeroinitial
velocity.
(a) Solvethis equation using the Laplace transform.

(b) Notethat this initial condition requires an inverse Laplacetransform for
sinh(as) sinh(bs)
s sinhs
Showthat this inverse is given by
n=l
nTT
sin(nTta) sin(n1Tb) sin(nTTT)
(3.107)

VectorCalculus
ti
342
(c) Denote
coefficients
the Fourier
as
for the initial velocity v (x)
v(x) =
sin(nmx)
n=l
(x) and
0)
(x, 0)
initial condition u(x,
mixed
the
Example
in
as
(x)
uo
3.15.
consider
of
Showthat
coefficients
(d) Next
the Fourier
Let an denote the mixed initial condition is
the solution for
cos(rtTtT) + sin(nTTT))
=
11=1
Exercise3.51: Wave
115:
initial condition
equation with triangle wave
sound and vibration of strings

to describe propagation of
useful
is
equation
on z e [O,L] for a
The wave
the wave equation utt c2uzz
string
Consider
membranes.
string
initial condition,i.e.,
and
the "plucked"
and
0,
=
t)
u(L,
=
t)
with fixedends u(0, and zero velocity at t = 0, u(z, 0) = uo(z),
= 0. The
fixed arbitrary position wave speed.
constant c is known as the
position as T = (c/L)t, x = z/ L, to remove the parameters
(a) First rescale time and
work. Show that the rescaled problem is
c and L and simplify your
LITT= uxx
u(x,0)
= uo(x),
CIT (x, 0) = 0
xe
(0, 1)
(b) Considerthe solution (3.91)given in Example 3.15. Establish that the solution
u (x, T) satisfies the wave equation, both boundary conditions, and the initial
condition.Establishthat the solution u(x, T) is periodic in time. Whatis the
period?
(c) Considerthe string's initial condition to be the triangle function depictedin

Figure2.32 with a = 0.1. Given the Fourier coefficients for this triangle function
from Exercise2.10,plot your solution at the following times on separate plots:
2. T = 0.45, 0.48, 0.49, 0.495, 0.4975, 0.50
3. T = 0.50,
4. T = 0.90, 0.95, 1.00, 1.05, 1.10
5. T = 1.90,
Providea physical description (comprehensible

to the general public) Ofwhatis
happeningas the wave equation evolves
forward in time. In particular, explain
what
the initial condition does just after

T = 0. Explain what happens when
waves arrive at the boundaries
x = 0, 1?
3531
-1
the
soluti(
3.5
Exercises
343
3.52: Numerical solution of the heat equation

Exercise
run) a program to use
(a) Write
Chebyshev
(and
withboundary
conditions
collocation to
ut = uxx
u(0,t) = 0,
solve the heat
equa-
=1
method and compare your solutions

Use the AM2
at a number of
Approximately how long does
times.
different
it take for the values of N
at
temperatureat
0.5?.
---0.9 to reach
(b) Howmany
terms in the exact solution are neededto find
u(0.9) = 0.5? (to
five percent precision)?
the timeat which
Propagation of a reaction front

Exercise3.53:
technique
for spatial discretizationand the

collocation
AdamsUsingthe Chebyshev
MATLAB
a
or
write
Octave
integration,
program
to
solve
the transient
Moultontime
problem)
-diffusion
reaction
2T
X2
Perform simulations for a long enough time that the solution reachesa
using = 0.05.
convergence checks to verify that your spatialand temporal
perform
steadystate, and adequate, i.e., that the solution does not change much whenthe
discretizationsare
resolutionis increased).
Exercise 3.54
(Courant)
analysis to find the growth factor and the stability
stability
Neumann
Usevon
method, (3.98).
conditionfor the Lax-Wendroff
Bibliography
Handbook of MathematicalFuncti0ns.
and 1.A. Stegun.
Washington, D.C., 1970.
M.
Standards,
rial
Bureau of
Basic Equations of Fluid
Tensors, and the
Mechanics.b
Vectors,
1962.
R. Aris.
York,
over
New
Publications Inc.,
Hassager. Dynamics of
Armstrong, and O.
Polymeric
C.
R.
York, second edition, 1987
R. B. Bird,
New
Wiley,
Dynamics.
Vol.1, Fluid
R. B. Bird, W.
&
N. Lightfoot.
E. Stewart, and E.

Sons, New York,
Transport phenomena.
John
Analysis. Charles E. Merrill Books, Inc., Columbus,

Ohio
H. D. Block. Tensor
1978.
Hussaini, A. Quarteroni, and T. A. Zang. Spectral Methods.

C. Canuto, M. Y.
Springer-Verlag, Berlin, 2006.
Fundamentals in Single Domains.
Conduction of Heat in Solids. Oxford
University
H. S. Carslaw and J. C. Jaeger.
1959.
edition,
second
Oxford,
Press,
R. Courant. Methods of Mathematical Physics. Volume Il. Partial Differential

Equations. Wiley,New York, 1962.
W.M.Deen. Analysis of Transport Phenomena. Topics in chemical engineering.
Oxford University Press, Inc., New York, second edition, 2011.
M.D. Greenberg. Foundations of Applied Mathematics. Prentice-Hall,NewJersey, 1978.
O.Heaviside.ElectromagneticTheory, volume Il. The Electrician Printingand

Publishing Company, London, 1899.
N. Levinson and R. M. Redheffer. Complex Variables. Holden Day, Oakland,
CA, 1970.
E. Merzbacher. Quantum Mechanics. John Wiley and Sons, New York,second

edition, 1970.
J. Ockendon, S. Howison, A. Lacey, and A. Movchan. Applied PartialDifferential
Equations, Revised Edition. Cambridge University Press, Cambridge, 2003'
344
Bibliography
345
J. p. Hernandez-Ortiz.
Polymer
osswald and
Processing:
Hanser, Munich, 2006.
Modelingand
simulation.
Introduction to Theoreticaland
pozrikidis.
ComputationalFluid
C.
University Press, New York, 1997.
Dynamics.
OSford
A. Teukolsky, W. T. Vetterling, and
Art of Scientific Computing. B. T. Flannery.
The
C:
Numerical
Recipesin
CambridgeUniversity
1992.
Press,
Cambridge,
Advanced Mathematics for Applications.
prosperetti.
CambridgeUniversity
H. press, S.
press,
and J. G. Ekerdt. Chemical Reactor Analysis

Rawlings
B.
and Design FundaJ.
Hill Publishing, Madison, WI, second
Nob
mentals.
edition, 2012.
and R. C. Rogers. An Introduction to Partial Differential
M.Renardy
Equations.
York, 1992.
New
Springer-Verlag,
Standard Mathematical Tables. CRCPress,

S.M.Selby. CRC
twenty-first edition,
1973.
Brief on Tensor Analysis. Springer, New York,

J.J.simmonds. A
second edition,
1994.
I. stakgold. Green's Functions and Boundary Value Problems. John Wiley&
Sons,NewYork, second edition, 1998.
G.Strang. Introduction to Applied Mathematics. Wellesley-Cambridge

Press,
Wellesley,MA, 1986.
M.S.Vallarta. Heaviside's proof of his expansion theorem. Trans. A. I. E. E,
pages429-434, February 1926.
R.Winter.Quantum Physics. Wadsworth, Belmont, CA,1979.
probability,Random Variables,
and
Estimation
and the Axioms of

Introduction
4.1
Probability
engineers familiar with only deterministic

Forthose
models,we now
makea big transition to random or stochastic models in the
final two
text. Why? The motivation for
chaptersof the
including
stochastic
modelsis simple: they have proven highly useful in many
fields
scienceand engineering. Moreover, even basic scientific literacy of
demandsreasonable familiarity with stochastic methods. Studentswho
to primarily deterministic
descriptions of physical
havebeen exposed
initially
regard stochastic methods
processessometimes
as mysteridifficult.
We
hope
to
and
change
this perception, remove
ous,vague,
perhaps
even make these methods easy to
anymystery, and
understand
use.
To
achieve
to
this
goal,
enjoyable
we must maintain a clear
and
separationbetween the physical process, and the stochasticmodelwe
chooseto represent it, and the mathematical reasoning we use to make
deductionsabout the stochastic model. Ignoring this separationand
callingupon physical intuition in place of mathematical deductioninvariablycreates the confusion and mystery that we are trying to avoid.
Probabilityis the branch of mathematics that provides the inference
enginethat allows us to derive correct consequences from our starting
assumptions.The starting assumptions are stated in terms of undefinablenotions, such as outcomes and events. This should not cause
anyalarm,because this is the same pattern in all fields of mathematics,such as geometry, where the undefinable notions are point, line,
plane,and so forth. Since human intuition about geometryis quite
strong,however, the undefinable starting notions of geometry are taken
in stride without much thought. Exposure to games of chance may pro-
347
Probability, Random Variables, and

348
intuition
the samehuman
Estimation
about probability's undefinable

starting
set or space of possible outcomes which we denote

the
start
We
events, which are subsets of 1. We use the empty
be
'B
and
by '1.Let
event. Let A u B denote the event "either
impossible
an
set to denote
B denote the event "both A and B." Theclose
n
let
and
B,"
or
operations of union and intersection is intentional
analogy with the set
event c 'I, we can assign a probability to that
each
To
helpful.
and
The three axioms of probability can then be
event, denoted Pr(A).
stated as follows.
terms.
I. (Nonnegativity)Pr(A)
0 for all
Il. (Normalization) Pr(l) = I

Ill. (Finiteadditivity) Pr(AUB) =
e 1.
for all A, B e '1
satisfying n B = .
These three axioms, due to Kolmogorov (1933), are the source from
whichall probabilisticdeductions follow. It may seem surprisingat
first that these three axioms are sufficient. In fact, we'll see soon that
we do require a modified third axiom to handle infinitely many sets.
First we state a few immediate consequences of these axioms. Exercise
4.1 providesseveralmore. WhenA n B = we say that events
and 'B are mutually exclusive,or pairwise disjoint. We use the symbol
A \ B to denotethe eventsin set A that are not events in set B, or,

equivalently,the eventsin set A with the events in B removed. The
set A is then defined to be 1 \ A, i.e., 31 is the set of all events that are
not eventsin A. Wesay that two events and B are independentif
Pr(A n B) = Pr(A) Pr(B).
Someof the important immediate consequences of the axiomsare
the following
Pr() = o
Pr(A) + Pr(A) =
Pr(A)
If B A, then Pr() Pr(A)
Pr(A u B) = Pr(A) + Pr(B) - Pr(A n B)
Proof. To establish the first result, note that AU = A and An =
for all e 1, and apply the third axiom to
obtain Pr(A u ) = Pr(A) =
Pr(A) + Pr(). Rearrangingthis last equality
gives the first result
42
Random
Variables and the probability
Density
Function
349
second result note that

from the
establishthe
definitionof A,
we
the second result. Using
this second
then gives gives the
third
result.l
To obtain theresult and the
axiom,then A, then
fourthresult,
can be expressed
Bg
if
as
that
A
=
note
B u (A n B),
B) = . Applying the third axiom
n
(A
n
B
'B), and applying the first axiom then givesPr(A) =
n
Pr(A
givesPr(A)
pr()+
result,
fifth
we
express
Pr(B).
the
both u and
Toobtain
'B
as the union
exclusive events: AU'B = Au(AnB) with
mutually
of
B) u (A n B) with (A n 'B) n (A n B) = An(AnB) = ,
n
(A
. Applyingthe
and
gives
both
to
axiom
third
B) = Pr(A) + Pr(A n B)
Pr(B) = Pr(A n 'B) +

Pr(31n B)
equation for Pr(A n B) and substituting
solvingthe second
into the first
which
is
result,
known
as
the
fifth
addition
law of probability.
givesthe
to
the
first
due
result,
the probabilityof two
Alsonote that
mutually
zero.
is
events
exclusive
4.2 Random Variables and the Probability Density Function
Nextwe introduce the concept of an experiment and a random variable.An experiment is the set of all outcomes '1, the subsets f g '1
thatare the events of interest, and the probabilities assignedto these
events.A random variable is a function that assigns a number to the
possibleoutcomes of the experiment, X (w), w e 1. For an experiment
a finite number of outcomes, such as rolling a die, the situation is
simple. We can enumerate
all outcomes to obtain '1 = {1, 2, 3, 4, 5, 6},
andthe events, f, can be taken as all subsets of '1. The set f obviouslycontains the six different possible outcomes of the die roll,
{l},{2},{3},{4}, {5}, {6} e f. But the random variable is a different

idea.Wemay choose to assign the integers 1, 2, . .. , 6 to the different
events.But we may choose instead to assign the values 1 to the events
corresponding
to an even number showingon the die, and 0 to the
eventscorresponding to an odd number showingon the die. In the
firstcasewe have the simple assignment
1Notice
that we have used all three axioms to reach this point.
Probability, Random
Variables, and
Estimation
350
and in the
second
the
case, we have
assignment
cases, but we have chosen different

same in both
the
is
goals in our
physical
Theexperiment to reflect potentially different
variables
process.
random
to this random
modelingthat led
considerably more complex when we havean
becomes
The situation uncountablymany outcomes, which is the case when
experimentwith
random variables. For example, if Wemeasure
real-valued
the reactor as a random
werequire
reactor, and want to model
a
the temperature in variable of interest X (w) assigns a (positive, real)
process, the random
experiment (0 e 'I. If we let '1 =
R
of the
valueto eachoutcome
should
we
allow for the
immediatelyclear what
for example,it's not
individual points on the real number
subsetsf. If we allowonly the
enough set of events to be useful, i.e.,the
line,we do not obtaina rich
probability of achieving exactly some
real-valued temperature T is zero
to infinite sets of points, e.g.,

for all T e R. The events corresponding
a T b witha < b e R, are the ones that have nonzero probability.
If we try to allow all subsets of the real number line, however, we obtain
a set that is so large that we cannot satisfy the axioms of probability.

Probabilistshave found a satisfactory resolution to this issue in which
the events are chosen as all intervals [a, b], for all a, b e R, and all
countableintersectionsand unions of all such intervals. Moreover,we
modifythe third axiom of probability to cover additivity of countably
infinitelymany sets
Ill'. (Countableadditivity) Let

e '1,i = 1, 2, 3
able set of mutually exclusive events. Then
=
Wecan then assign probabilities to these
be a count-
+Pr(A2) +
events,
satisfy-
ef
ing the axioms. The random variable
X (w) is then a mapping from
(0 e to R, and we have
well-definedprobabilities for the events
x) =
: X(w)
all the foundationalelements x) for all x e R. At this point we have
that we require to develop the stochastic
methods of most use in
science and engineering.
The interested reader
may wish to consult
Papoulis (1984, pp.22-27)
and Thomasian (1969'
pp.320-322)for further
discussion of these issues.
42
Variables and the Probability Density

Function
Random
351
of the random variableso

FUNCTION
that
(x) is the probability that
wesaythat than or equal to x. The the random variable takes
function
less
is a nonnegative,
function and has the followingproperties
due to the
probability
value
asiomsof
if
lim
00
Wenext define
suchthat
=0
< X2
lim
=1
the PROBABILITYDENSITYFUNCTION,
denoted
00< X < 00
(x) ,
(4.1)
discontinuous FE if we are willing to accept generalized

Wecanallow
(delta functions and the like) for
Also,we can definethe
functionS
densityfunction for discrete as well as continuous random variablesif
weallowdelta functions. Alternatively, we can replace the integral in
(4.1)with a sum over a discrete density function. The random variable
maybea coin toss or a dice game, which takes on values from a discrete
setcontrasted to a temperature or concentration measurement, which
takeson values from a continuous set. The density functionhas the

properties
following
00
andthe interpretation in terms of probability

Pr(X1
TheMEAN
of a random variable
or EXPECTATION
is defined as
(4.2)
TheMOMENTS
of a random variable are defined by
xnpdx)dx
Random Variables, and Estimation

Probability,
the
3052d
first
meanis the
moment.
Moments of
about the mean
definedby
second
as the
defined
is
-TheVARIANCE
var)
moment about the mean
+ 2 (E)
($) ---22(E)
square
deviationis the
standard
The
=
root of the variance

2
(F3)
normal or Gaussian distribution is ubiquiThe

Normaldistribution.It is characterized by its mean, m, and variance,
tousin applications.
0 2, and is given
by
pdx)
2TT2
(4.3)
exp
mean of this distribution is indeed m

Weproceedto checkthat the
and that the density is normalized
and the varianceis 02 as claimed, the definite integral formulas
so that its integralis one. We require
dX
e x2
xe -x2 dx
(4.4)
(4.5)
x 2e -x2 dx
(4.6)
Thefirst formulamay also be familiar from the error function in transport phenomena
erf (x) =
e-u du
erf (00) = 1
The second integral follows because the

function e x2is even and the
functionx is odd. The third formula may also

be familiar from the
gamma
function, defined by (Abramowitz
rot) =
e -t dt
and Stegun, 1970, p.255-260)

(n 1)!
Random
4.2
Variables and the Probability
Density
the variable of integration using

t
canging
x 2 e -X2 dx =
2
o
Function
353
x2 gives
x2e-x2dx
ti/2 e-tdt
r(3/2)
calculatethe integral of the normal density as
follows
exp 1 (x m)2
0-2
change of variable
the
pefine
dx
whichgives
p; (x)dx =
du = 1
V-
from(4.4)and the proposed normal density does have unit area. Computingthe mean gives
1
27T2
x exp
dx
usingthe same change of variables as before yields
7T -00
(vQu + m)e-U2du
Thefirstterm in the integral is zero from (4.5),and the secondterm

produces
asclaimed.Finally the definition of the variance of gives

var() =
1
2Tr 2
-00
(x
exp
Changing
the variable of integration as before gives
var) =
u2e-U2du
dx
Probability, Random Variables, and Estimati0h
354
2
var() = (T
and from (4.6)
the random variable

for
notation
Shorthand
and variance (T2 is
bution with mean m
N(m,
a more
In order to collect
having a normal
distris
(T2)
useful set of integration facts for manipu-
we can derive the following integrals by

lating normal distributions,
For x, a e
integration in (4.4)-(4.6).
R, a
of
changing the variable
e {x2/adX
27Tv/
{x2/adX
x2e-gx2/adX
2na 3/2
Figure 4.1 shows the normal distribution with a mean of one and vari-
ances of 1/2, 1, and 2. Notice that a large variance implies that the random variable is likely to take on large values. As the variance shrinks to
zero, the probability density becomes a delta function and the random
variableapproaches a deterministic value.
Characteristic function. It is often convenient to handle the algebra
of densityfunctions,particularly normal densities, by using a close
relativeof the Fouriertransform of the density function rather than
the density itself. The transform, which we denote as

(u), is known
as the characteristic function in the probability and statistics literature.
It is defined by
Wk(t) = f(eit)
where we again assume that any random variable of interest has a

density PEG). Note the sign convention with a positive sign chosen on
the imaginaryunit i. Hence,under this convention,
the conjugate of the
characteristicfunction we(t) is the Fourier
transform of the density.
The characteristicfunction has a
one-to-one correspondence with the
densityfunction, whichcan be seen
from the inverse transform formula
1
27T
4.2
Random
variables and the Probability

DensityFunction
355
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-3
-2
-1
distribution, with probability density

pe(x)
Figure4.1: Normal
(1/ 27T2)
variances are 1/2, l, and 2.
m)2/2). Mean is one and
note the sign difference from the usual inverseFouriertransAgain

form.Notethat multiplying a random variableby a constantn =
gives
= (e ita =
independent random variables

Adding
(4.7)
gives
(Pn(t) =
e
e itX1
wn(t) =
(Xl ) dX1
(x2)dx2
oo
(4.8)
normal distribution.
the
of
function
Wenextcompute the characteristic
Probability, Random
Variables, and
Estimati0h
356
function of the normal density

Characteristic
normal density is
Example4.1:
function of the
show the
characteristic
= exp
itm
t 2
function and the normal density

characteristic
the
Thedefinitionof
give
1
00
integration to
of
variable
Changingthe
itm
z = x m gives
27T
COStzdz
e -(1/2)z2/2
itm
2TT2
itmt22/2
in which we used the
definite integral
e-a2x2 cos bxdx
2a
how to derive this definite intediscusses

4.49
Exercise
line.
on the last
because
the
gral. Note also that the integral with
sin tz term vanished
sine is an odd function.
4.3 Multivariate Density Functions
variable but a
In applicationswe usually do not have a single random
vector and let
collectionof them. Wegroup these variables together in a
analogously
random variable now take on values in R't. Proceeding
FUNCTIONF; (x) is
to the single variable case, the JOINTDISTRIBUTION
definedso that
in whichthe vectorinequalityis defined to be the n corresponding
a
scalar inequalities for the components. Note that F; (x) remains
scalar-valued function taking values in the interval [0, 1]
Density Functions
Multivariate
4.3
357
variable case, we
single
the
define the
as in
JOINT
DENSITY
FUNCTION,
derivatives exist,
providedthe
or,
(4.9)
the scalar case, the probability that the
on values between a and b is n-dimensionalrandom
givenby
variable takes
b) =
Pr(a
dxn
al
covariance. The mean of the vector-valued

Meanand
randomvarithe vector-valued integral
simply
is
able
writingout this integral in terms of its components we have
J f
...dxn
J f
...dxn
...dxn
J J
Thecovarianceof two scalar random variables
cov,n)
is definedas
(n - (n)))
Thecovariancematrix, C, of the vector-valued random variable with
components
4, i = 1
n is defined as
Cij = cov(Ei, 9)
varl)
cov2,
covl,
var2)
cov(En, E2)
covl, 41)
cov2,
varn)
Probability,Random Variables, and
358
EstithQti0h
in terms of the components

gives
Again,writingout the integrals
(Xi
---f
(x)dX1 ... dxn

(4.11)
Noticethat Cij = cji, so C is symmetric and has positive elements

the diagonal.Weoften express this definition of the variance With
the
Noticethat the vector outer product xxT appears here, whichis

the usual inner or dot
n x n matrix, rather than
product
x TX,
Which is
Marginaldensity functions. We often are interested in

subset of the random variables in a problem. Consider onlysome
two
of random variables, e
and e Rm. We can consider vectors
the joint
distributionof both of these random variables
(x, y) or we
may
onlybe interested in the variables, in which case we can
integrate
out
the m variablesto obtain the marginal density of
dym
Analogously,to produce the marginal density

of
P17(Y)
we use
dxn
43.1 Multivariatenormal density

Wedefinethe multivariate
normal density of the random
p(x) =
in which m
1
e Rn is the
(x m) T
P -l (x
variable e
m)
(4.12)
mean and P e
itive definitematrix. We
is a real,symmetric,
posshow
subsequently that P is the
matrixof E. The notation
covariance
multivariatenormal densitydetp denotes the determinant of P. The
is well defined only
lar, or degenerate,
for P > 0. The singucase P Ois
discussed subsequently. Shorthand
43
Multivariate
Density Functions
359
variable having a
normaldistribution
for the random P is
covariance
and
we
also
find it convenient
to define the notation
P)
QCx --
that we can
write compactly for the normal with

mean m and co-
so
P
variance
(x m)
(4.13)
is real and symmetric.

Figure4.2 diseormmatar11fXorP
-1
3.5
2.5
2.5 4.0
Figure 4.2, lines of constant probabilityin

the multiASdisplayedin lines of constant
normalare
variate
(x m) TP 1(x m)
Tounderstand the geometry of lines of constant probability(ellipses

or hyperellipsoids in three or more diin twodimensions, ellipsoids
we examine the eigenvalues and eigenvectorsof a positive

mensions)
definitematrix A as shown in Figure 4.3. Each eigenvectorof Apoints
alongone of the axes of the ellipse. The eigenvaluesshowus how
stretchedthe ellipse is in each eigenvector direction.
If we want to put simple bounds on the ellipse, then we drawa
boxaround it as shown in Figure 4.3. Notice that the box contains
muchmore area than the corresponding ellipse and we have lost the
correlationbetween the elements of x. This loss of informationmeans
wecan put different tangent ellipses of quite different shapes inside
thesamebox. The size of the bounding box is givenby
length of ith side =

in which
Aii = (i, i) element of A

SeeExercise4.15 for a derivation of the size of the bounding box. Figure 4.3 displays these results: the eigenvectors are alignedwith the
Probability, Random Variables,
360
0.6
0.4
0.2
Figure 4.2: Multivariatenormal for n
2. The contour
ellipses containing 95, 75, and 50 percent lines show

probability.
x TAx = b
Avi
bA22
bA11
Figure 4.3: The
geometry
of quadratic form xTAx
= b.
eStith
MultivariateDensity Functions
36)
axes and the eigenvalues scale the
that is tangent to the lengths. The
ellipse
desof the box
ellipse are
lengthsof
si
the
to the
The mean and covariance
4.2:
of the
mple
variable
is distributed multivariate
the random
normallyas normal
in (4.12)
exp
E (x
the following facts of

integration. For
1. Establish
ze
withA e
1
z T A -I z dz
1
z T A -I z dz
(scalar)
zz T exp
z T A -Iz dz
(n-vector)
(27T)n12 (detA) 112A
(n x Il-matrix)
(4.14)
(4.15)
(4.16)
2. Showthat the first and second integrals, and the definition

of
mean,(4.10),lead to
Showthat the second and third integrals, and the definitionof
covariance,(4.11), lead to
Sowe have established that vector m and matrix P in (4.12)are
the mean and covariance, respectively, of the normally distributed
random variable g.
Solution
l. To compute the integrals, we first note that becauseA is real and

symmetric,there exists a factorization
A = QAQT
A-I = QA IQT
Random Variables, and

Probability,
Estimation
362
matrix containing the eigenvaluesOf

diagonal
is a orthogonal. To establish the first integral
which
in
z = Qx and change the
real and
is
transformation
Q
variable
and variable
use the
in (4.14)
of integration
exp
1
xTA-I x
IdetQl dx
2
T -1 z dz
Iz A
exp
XL2/i]dx
x /dxt
because QQT I, which makes det(QQT)

I
=
IdetQl
in which
= 1. Performing the integrals gives
(detQ)2= 1 so detQ
2Tr
(27TH/ 2
i=l
and wehave established the first result.

Toestablishthe second integral, use the variable transformation
z = Qx to obtain
zexp zTA Iz dz = Q
1
x exp xTA Ix dx
[2
Noticethat the ith element of this vector equation is of the form

(21 Xiexp xTA -l x dx =
(21
xie-B Xf/ L
dxi
eXi/kdXk = 0
Thisintegralvanishes
because of the first term in the product
Sincethe integral
vanishes for each element i, the vector of integrals is
therefore zero.
4.3
Multivariate Density Functions

363
third integral, we again

To establish the and
change the variableuse the variable transforz = (2x
nation
zz T exp
of integration
in (4.16)
-- zTA-I z dz =
2
1
xx T exp
xTA-I x det
Qdx QT
2
xxT exp
= QVQT (4.17)
1, and the V matrix is

in which, again, detQ
definedto be the
integralon the right-hand side. Examiningthe components
of V
wenote that if i * j then the integral is of the form
L x 2 /i
xjex;
dxj
The off-diagonal integrals vanish because of the odd functionsin

the integrands for the Xi and xj integrals. The diagonalterms, on
the other hand, contain even integrands and they do not vanish
ii
e-Xklkdxk
Evaluatingthese integrals gives
Vii =
Substitutingthis result into (4.17)gives

zz T exp
00
1
zT A I z dz = QVQ T
2
= (2Tr) n / 2 (detA) 1/ 2 QAQ T =

Probability,
364
integral result of interest.

the
established
have
the multivariate normal and the
andwe
density of
probability give
the
Using
the mean
2.
of
definition
xp; (x)dx
1
()
x exp
-00
1
2
integration to z = x
of
variable
the
Changing
1
m gives
(m + z) exp zTp-1z dz
2
m produces unity by
in whichthe integralwith because the integrand (4.14) and the
is odd.
integralinvolvingz vanishes
Nextusingthe probabilitydensity of the multivariate normal, the
definitionof the covariance,and changing the variable of integra-
tion give
(x)dx
(x ()) (x
1
(x
1
zz T exp
1
(2n)
dx
1
z TP -1 z dz
2
(detP) l /2P
Characteristic
functionofan functionof multivariate
as
m)
Il-dimensional
density. The
characteristic
multivariate
random variable
sz,is defined
WE(t) =
eitTx
00
p (x)dx
43
Multivaria
te Density
t is now
Functions
365
an Il-dimensional variable. The inverse transform

(2TT)n
is
it T x
one has the characteristicfunction of the entireranif

Notethat vector available, one can easily compute the characteristic
variable marginal distribution. We simply

set the components
doOctionof any zero for any variables we wish to
to
integrate
over to
t vector
function Q (tx, ty)

characteristic
its
cp(tx,ty)
exp i t tl
t}
x
Y
y)dxdy
in the characteristic function of n's marginal, (Pn(ty) ,

interested
Ifweare
the joint characteristic function to obtain it
tx ---Oin
we set
ty)
x
exp i 0 tT
eityy
(x,y)dxdy
Qn(ty)
4.3: Characteristic function of the multivariate normal
Example
that the characteristic function of the multivariatenormal
Show
N(m,P)is given by
Solution
Fromthe definition of the characteristic function we are required to

evaluatethe integral
00 eitTx
(xm)
dx
Changing
the variable of integration to z = x m gives
e itTm
(2TT)n/2(detP)1/2
00
Probability, Random
Variables, and
366
Theorem 1.16 it can be factored as

by
p
definite,
the variable of integration
positive
changing
and
is
SinceP
to
QA-IQ T,
-1
=
P
so
T
gives
QAQ Tz the integral
Q in
-(1/2)zTp-1Zdz
itTze
---+1
det(Q)
that
after noting
lin gives
div
since Q is orthogonal. DenotingtTQ
n 00
J j dw
I] F 2Aje
exp ( (1/2)
=
to
in which we used (4.95)
evaluate the integral. Noting that
b2J)
b22t
gives
t TQAQ Tt = t TPt
(2 T) n/2(detp)
(4.18)
Substitutingthis result into the characteristic function gives

WE(t) = e itTm(1/2)tTPt
which is the desired result.
Example 4.4: Marginal normal density

Given that
and
are jointly,normally
distributed with mean and
covariance
x
showthat the marginaldensity of
rameters
PX
Pxy
Pyx Py
is normal with the followingpa(4.19)
Multivariate
Density Functions
367
4.3
a first approach to establish (4.19),we could directly

solt1ti0
variables. Let R x mx and = y
Mett10 the y dimensiOn of the and n variables, my, and nx
respectively, and
tegratebe the
ill
the definition of the marginal density gives
Then
lly
rlY -Jr
alid IIS
11
(X)
n/2 (detP)1/2
(2TT)
PX pxy -1
exp
approach, we'll also need to use the matrix inversion

followthis as an exercisefor the interestedreader.
Thisis left
we use the previouslyderived
In the second approach,
2.
Method characteristic function of the multivariatenormal and its
liltsof
the characteristic function of the joint density is given
First
marginals.
exp i
Y] m
- (112) tx
ty
ty
O to compute the characteristic function of E's marginal

Settingty
gives
= exp i [tl
-(112)
Butnoticethat this last expression is the characteristicfunction of a
normalwith mean mx and covariance PX,so inverting this result back

tothedensities gives
1
e (112)
Summarizing,since we have already performed the required inte-
gralsto derivethe characteristic function of the normal, the second

aPproach
saves significant time and algebraic manipulation. It pays

368
one time, "store" them in the

them whenever possible, such charac.
off to
reuse
then
and
as here
teristic function,
marginals.
integrals
do the required
when deriving
of random variables.
Functions
4.32
we need to know how the density of a random
applications
In many
a function of that
random variable
the density of
to
related
is
variable
variable
random
the
into the random
be a mapping of
Let f : Vt -4
that the inverse mapping also exists
variable17,and assume
p; (x), we wish to compute the densityof

Giventhe densityof E,function f. Let X denote an arbitrary
regionof
pn(y), induced by the
set
the
define
Y
and
as
E,
variable
the transform
thefield of the random
function f
of this set under the
e X}
Y = {yly =
such that
Then we seek a function pn(y)
pn (y) dy
(4.20)
for every admissible set X. Using the rules of calculus for transforming
a variable of integration we can write
det f -l (y)
dy
(4.21)
I is the absolute value of the determinant
in which
of the Jacobian matrix of the transformation from

(4.21)from (4.20) gives
Pn(y)
det
f-l(y)
to
dy = 0
Subtracting
(4.22)
Because(4.22)must be true for any set Y, we conclude (a proof by
contradiction is immediate) 3
Pn(y) =
det
f-l(y)
(4.23)
SeeAppendixA for various notations for

derivatives with respect to vectors.
3Somecare should be exercisedif
one has generalized functions in mind for the
2
probability density.
Multivariate
4.3
d tilen
Density
Functions
369
Nonlinear transformation
4.5:
function of the random variable under the
transfornormally distributed
N(m,
02).
for
f %ioll
is invertible and we have that

trosformation
= (1/3)7
gives dE/dn
vative
defl
pn(y) 3 21T
2/3 exp (
= n1/3.Takingthe
m)2/2)
transformations. Given random variables having

n
Noninvertible
components
7k)defined by the transformation = f
(E)
variables
We
= fk)
to find pn
vector
bythe
in terms of PE. Consider the region generated in Rtt
inequality
X(c), which is by definition

region
this
call
X(c) = {xlf(x)
c}
necessarily simply connected. The (cumulative) probNotethat X is not
distribution(not density) for

ability
Fn(y) -
then satisfies
p; (x)dx
(4.24)
Ifthedensitypn is of interest, it can be obtained by differentiating Fn.
4.6: Maximumof two random variables

Example
twoindependent random variables,
Given
and the new random
variable
definedby the noninvertible, nonlinear transformation
= max(E1,
Show
that Il's density is given by
Pn(y)=
(y)
(x)dx +
(y) J
(x)dx

370
X(c)
Figure 4.4: The region
X(c) for y =
c.
Solution
generated by the inequality y

X(c)
region
The
Applying (4.24) then gives
c is
sketched in Figure 4.4.
p; (Xl ,
)dX1 dX2
the probabilitythat
whichhas a clear physical interpretation. It says
variables is less than some
the maximumof two independent random
valueis equal to the probability that both random variables are less
than that value. To obtain the density, we differentiate
POO')= PEI(y) Fk2(y) + FEI(y)p2 (y)
PEI(Y)/
p2(x)dx +
(y)/
(x)dx
43.3 StatisticalIndependenceand Correlation

From the definition of independence, two events A and B are indepen-
dent ifPr(A n B) = Pr(A) Pr(B). We translate this definition intoan

equivalentstatementabout probability distributions as follows. Given
y, then
random variablesE,0, let event A be
x and event 'B be
Multivariate
4.3
Density Functions
371
y. Bythe definitions
of joint
distribution, these events have probabilities:and marginal
x and
Aroility
Pr(31n 6B)=
two random variables

are STATISTICALLY
depe that the
if
independent this relation holds
INDEPENfor all x, y
wesayorsimply
=
PENT
all x, y
(4.25)
4.2 for the proof that an equivalent condition for
statistical
can be stated in terms of the probability
densities
independence
instead
distributions
of
all x, y
that the densities are defined. We say two

and n,
are
UNCORRELATEDif
(4.26)
randomvariables,
cov(E, n) = 0
4.7: Independent implies uncorrelated

Example
and are statistically independent, then they are
uncorprovethat if
related.
solution
of covariance and statistical independencegives

Thedefinition
4.8: Does uncorrelated imply independent?

Example
Let and rl be jointly distributed random variables with probability
density
function
{[1 +xy(x 2 y2)], IXI < 1, | yl < 1,
0,
otherwise

Probability,
372
0.5
0.25
-1
uncorrelated random
density function for the two
Figure 4.5: A joint
variables in Example 4.8.
and pn (y). Are

(a) Computethe marginals p; (x)
and
indepen-
dent?
(b) Computecov,n). Are and
uncorrelated?
(c) What is the relationship between independent
and uncorrelated?
Areyourresults on this example consistent with this relationship?

Whyor why not?
Solution
Thejoint density is shown in Figure 4.5.
(a) Directintegrationof the joint
density produces
2' IXI < 1
Pn(y) = E, Iyl<l
43
Multivariate
Density Functions
373
performing the
term gives
double integral for the
expectationof
xy +
Y2)dxdy
-1
and the
covariance of
and
cov(E,n) =
the product
is therefore
uncorrelated.
and and r) are
independent implies uncorrelated.
This example
(c)Weknow that
does not contradict that relationship. This example shows uncor-
related does not imply independent, in general,but see the next

examplefor normals.
Example4.9: Independent and uncorrelated are equivalent for normals
Iftworandom variables are jointly normally distributed,
Px Pxy
my ' Pyx
Prove and are statistically independent if and onlyif and are

uncorrelated,or, equivalently, P is block diagonal.
Solution
Wehaveshown already that independent implies uncorrelated for any

density,so we now show that, for normals, uncorrelated impliesindependent.Given cov(E, n) = 0, we have
detP = detPx detPy
Probability,
Estimation
374
so
can
the density
be wTitten
1
(nx+ny)(detpx detPy
(2TT)2
eXP
For anyjoint
know that
normal,we
so we have
2 S?
py
(4.27)
x TPx I x
exp
exp y TPy 1y
and combining
product
the
Forming
S,
the marginals are simply
(detPx)1/2
= (27T)nx/2
pn(y) =
-1
terms gives
(27T)(nx+ny)(detpx detPy
exp
the inverse of a blockthis equationto (4.27),and using

Comparing
and are statistically indepen-
that
diagonalmatrix,we have shown
dent.
4.4 Sampling
variance
Letscalarrandom variable have density p; with mean m and
xn. By
P, and consider n independent samples of E, denoted Xl, x2, .. , ,
independentsamples,we mean that the joint density of the samplesis
the product of the marginals, which all are identical and equal to p;
(Zl,
,zn) =
(Zl) pxn (zn) =
ampling
5
4
4,
375
Transformation
Linear
4.4.1
about the linear transformations
facts
following
of random
random variable e
variRn with density
variance of random variable
mean and
= A;
var(n) = Avar)AT
(4.28)
formulas as follows. Using the

these
definitionof expecestablish
have
tation,we
that
the definition of variance, we have that

Using
var(n) = var(A9
(Ax (Ax))(Ax
AJ
(x (x))(x
= Avar)A T
Withnormals, we often wish to check if the variance is positive definiteafter a linear transformation. Let P e Rnxn be positive definite
be an arbitrary matrix. The followingresult is often
andA e R
APAT > 0. See
useful:P > 0 and A's rows linearly independent
alsostatement 5 in Section 1.4.4.
Singularor degenerate normal distributions. It is oftenconvenient
toextendthe definition of the normal distribution to admitpositive

semidefinite
covariance matrices. The distribution with a semidefinite
covariance
is known as a singular or degeneratenormaldistribution
(Anderson,
2003, p. 30). Figure 4.6 shows a nearly singular normal
distribution.
Toseehow the singular normal arises, let the scalar random variable
bedistributed normally with zero mean and positive definitecovari-

Probability,
376
p(x)
exp
+
1/2(27.2xf
+ 73.8x;
0.75
0.5
0.25
Figure 4.6: A nearlysingular
ance,
normal density in two dimensions.
N(O,PX),and consider the simple linear transformation
in whichwe have created two identical copies of
nents
and
for the two compo-
of n. Nowconsider the density of n. If we try to use
the standard formulas for transformation of a normal, we would have
Py = APxAT and Py is singular since its rows

are linearly dependent. Therefore one
of the
eigenvaluesof Py is zero, and Py

is positive semidefinite and
not positivedefinite.
Obviouslywe cannot use (4.12) for the density
in this casebecause
the inverse of Py does not
exist. To handle these
cases,we first provide
an interpretation that
remains valid when the
covariancematrixis
singular and semidefinite.
Definition4.10
(Densityof a singular
normal). A singular joint normal
to ex
theappe
*stcompute
the
sampling
density
377
of random variables l,
e Rn2,is
denoted
0. The density is defined by

1
(21T) n1 / 2 (detA1)1/2
p; (Xl,X2)
exp --(XI
In this limit,
-ml)
the "random" variable
(4.29)
becomes
mean nt2. For the case ni = 0, we deterministicand

equalto its
have the completely
(x2) =
in which
(X2 m2),
degeneratecase
= nt2, and therewhich describesthe
completelydeterministic case
is no random
that
by
performing
Notice
the
required integrals component
of (4.29)
densities
are
found
to
marginal
be
thetwo
1
(2TT)n1/2 (detA1)1/2eXP
Q(XI
ml)
= (X2nt2)
4.11: Computing a singular density

Example
consideragain the motivating example with the unit normal scalarran-
domvariable
N(O,
= 1 and the linear transformation

1
1
UseDefinition4.10 to express the density pn for this case, and drawa
figureshowingthe appearance of pn.

Solution
Wefirstcompute the eigenvalue decomposition of the semidefinitecovariancePy
andobtain
Py = QAQT
1 1-1 -1
1

probability,
378
pn(y)
singular normal
The
4.7:
Figure
deficientA.
resulting from y = Ax with rank
variable transformation
invertible
the
Nextwe define
= QTn
the covariance
and we can write
of C, Pz, as
4.10. Using that definition gives the

Definition
of
form
the
which is in
density for
1
Finallytransformingback to the variable

Zl =
(Z2)
this q
samplesincreases,
using
1
(Yl + Y2)
andnoting(ax) = (l/a)(x) gives

1 (Yl + Y2)2 (Y1
2
Y2)
To draw a sketch, first we note that pn (Yl, Y2) = 0 for Yl * Y2because

of the delta function. So we have a singular normal defined in the plane,
and the density is nonzero on the line
= Y2. Therefore take a zero
ofmdomvariab
the differe
and
towar
Ofsqu
44
sampling
unit variance
=
to the
379
normal defined on the

axis, and
line, and that is the
joint density rotate it by 43
for
res
Tile
expanded definition of normal distribution enables

important result that the linear transformation us to generthat it holds for any linear transformation, of a normalis
so
normal,transformations such as the A matrix
includingrankgiven
above in which
deficient independent We state this
result
the
as
not
the
following
rowsare
Exercise
to
4.24.
theorem
the proof
anddefer
4.12 (Normal distributions under linear transformation).
eorem
1b
Convariable
normally distributed random
e RI ,
a
sider
covariance PX O and an arbitrary
linear transforwithsemidefinite and transformed
random variable n
R
e
A
= AE. Then
mation with Py APxAT O.
4.4.2 Sample Mean,
Sample Variance, and Standard

Error
in applications we do not obtain nearly enough samplesto obUsually

tainconvergenceto the entire density, and we settle for convergence
toa fewlow-order moments of the distribution, such as the mean and
variance.The SAMPLEMEANis defined as
1
andwe expect this quantity to converge to E's mean as the number of
increases. Indeed if we take expectations

samples
'E(Xi) =
=m
whichmeans that the sample mean is an unbiased estimate of the mean

ofrandomvariable for all values of n. An estimator's BIASis defined
tobe the difference between the expectation of the estimator and the
truevalue,and an estimator is termed UNBIASED
if the bias is zero.
Next,toward defining an appropriate sample variance, we consider
thesumof squares of the samples' differences from the samplemean
Probability,
380
can
Rn)2,which
-- Sit)
Random
Variables, and Estimation
as follows
rearranged
be
Sn
m))
2)
2 2(Xi
--
2n(Rn --
2
+ n(kn m)
2
2 n(Rn ---m)
expectation
Takingthe
gives
(Sn) =
var(x ---nvar(Rn)
1, . , n, and to compute the vari=

i
all
for
P,
=
Weknowvar(Xi)convenientto first determine the variance of vecanceof in, it is stackingthe samples together in a column vector
tor X, obtainedby
mutually independent, we
xn Sincethe are
form
Pij, i, J = 1, , n or in matrix
=
that
have
var(X) =
Using = AX with A = (l/n) [1

(4.28)gives
var(Rn)= Avar(X)A
I and the second part of

1
nP
Substitutingthese into the equation for expectation of Sn gives
Sohere we notice an interesting outcome; if we want to obtain an unbi-
ased estimate of the variance, we

should define the SAMPLEVARIANCE
45
central
as sti
Limit Theorems
381
Sn/(n 1) to obtain
Sn
explainsthe somewhat mysterious definitionof

samplevariance
divisionof the sum of squares by n 1
instead
anticipated. We show later
of n, which
mighthave
that
onemaximum-likelihood
division
estimate of the variance,
by n gives
which
the
is also a good
converges to p as n 00.
estimatebecause it
Although
estimate is not an unbiased estimatefor the maximumlikelihood
finiten, the bias

zero as n 00.
to
decreases error is the standard
deviationof the
standard
samplingdistribuFor
example,
in
estimator.
the scalar case,
an
of
tion
if we consider
above to be an estimator of the mean,
the
mean
sample
we
have
worked
variance of the sample mean is var(n) =
outthat the the STANDARD ERROR
(1/11)2,and
OF THE MEANis
therefore
SE(Rn) =
Whenthe standard deviation of the random variablebeing sampledis

alsounknown, people sometimes replace in the previous expression
an estimate of it, such as the square root of the samplevariance
Fn. Wethen have
SE(n)
Thisquantity does provide a rough measure of the uncertaintyin

dueto the finite sample size. But if we want to say somethingprecise
aboutthe uncertainty in the sample mean as an estimate of mean,we
mustcalculatea true confidence interval for that estimate.Weshow

howto calculate confidence intervals in the discussionof maximum
estimation in Section 4.7.
likelihood
4.5 Central Limit Theorems

Centrallimit theorems are concerned with the followingremarkable
observation:if we have a set of n independent random variablesXi,i =
l, 2,... , n, then, under fairly general conditions, the density py of their
sum
. . . + Xn
Probability,Random Variables,and
382
best to illustrate this observation with a

perhaps
is
It
normal.
example.
10 uniformly distributed random

Example 4.13: Sum of
Variables
independently
distributed
and
random
Consider10 uniformly
variables
random
variable
new
y,
a
which is
Xl(). Consider
the s
of the 10 x random variables
Whatis y's mean and variance? Draw samples of the 10 Xi

random
variables,and compute samples of y. Plot frequency distributions
x and y. Eventhoughthe 10 x random variables are uniformly of
dis.
tributed,and theirprobability distribution looks nothing
like a normal
distribution,discusshow well y is approximated by a normal.
Solution
The x random variables are distributed as x
Px(x') =
U (O,I), which
means
0 otherwise
(4.30)
Computingthe mean and variance gives

1
var(x) =
(x
(1/2)) 2dx 1
12
If we stack the x variables

in a vector
we can write the y
we have that y's
X10
random variable as
Y = Ax
the linear transformation

of x
A = [1 |
mean and variance
are given by
var(y) = Avar(x)AT 5
=
6
central Limit
Theorems
central limit theorem is in force
if
ill
Withonly
10
N (5, 5/6)
383
random
variables
histogramof the 10,000 samples of

It is clear
and y are
that even 10
shown
uniformly
produce nearly a normal distribution
distributed x in Figfor their
random
4.8and 4.9.
ores
sum Y.
Identically distributed random variables

4.5.I
independent random variables,
Xi, i
considern
having mean u and
distribution
identical
variance
of the sum Sn = Xl + X2 02. we are interested
distribution
+
inthe
are
independent,
the
since
the mean and
large.
varianceof sn
by
are
given
n
i=l
n
var(Sn) =
i=l
var(Xi) = n2
00,Wefirst rescale
we want to take the limit as n
Since
the sum
tokeepthe mean and variance finite. Given the formulas for shifting
meanandvariance we choose Zn = (Sn nu)/
and obtain
(Zn) =
var(Zn)
n2
var(Sn) = 1
Theorem4.14 (De Moivre-Laplace central limit theorem). Let Xi,i

l,2,
,n be independent
and
identically distributed with mean u and
variance then Zn tends to the standard normal N(O,1) as n -+00.

Proof.In keeping with Laplace's approach to the problem, we shall use
characteristic
functions to establish this result. Weshallfinduseful
of
thefollowing
bound on the error in the Taylor series approximation
theexponentialwith a purely imaginary argument.
(ix)tn
m=0
(4.31)
Probability,Random
Variables, and
EstithQtion
384
600
500
400
nx 300
200
100
0.1
0.3
0.2
0.4
0.5
0.6
0.7
0.8
0.9
of 10,000 samples of uniformly distributed x.

Histogram
4.8:
Figure
1600
1400
1200
1000
ny 800
600
400
200
Figure 4.9: Histogramof 10,000 samples of y =
i=l
Theorems
central
Llmit
385
simple to establish (see Exercise

is
ound
4.53). we
Willuse it
With
ix
x2
+ O (l
X 13)
(4.32)
denotes
that
the
size
o(lx13)
of the error
13.
time
we first show
term in (4.31)
some constant
that the
is
characteristic
t variance.
for Yi's characteristic function
and obtain
a series
tpansion
(x)dx
(1 + itx (1/2)t2x2 + Ix13
= 1 + i(Yi) +
- 1- (1/2)t 2 + 0(lt1 3)
that here we have assumed (lYi13) is finite, so that it can
Notice
be
into the O(lt13) term. Next, since Zn
(l/vfi)
absorbed
Yi,we
and (4.8) that
(4.7)
from
have
= (1 (1/2)t2/n+
vzn(t) =
Intakingthe limit as n
droppedto obtain
oo the last term is negligible and

can be
wzn(t) = n-.oo
lim
nlim
00
the calculusresult that
Using
gives
lim00Qzn(t)
n
(1
+ ax)l/x = ea withn = l/x
= e-(1/2)t2
Thefinalstep, which unfortunately requires the most effort, is to show
thatifthe characteristic function converges, then the randomvariable

alsoconverges(in distribution). Assuming this is true, we then have
nlim00zn
andtheresult is established.
N(0, 1)

Probability,
386
(Durrett, 2010, pp.114-116)

assumed here
(lYi13)
of convergence in distribution
absolutemomentjustified the claim function. However, the
argUment
havenot in characteristic
Andweconvergence
functions prove so useful. In
characteristic
plied by illustrate why
much more general approach that is not
a
does nicely
pursue
we
function, so we content ourselves to leave
the next section
characteristic
based on the
this proof here.
,0fO
distributions
variables with different
Random
4.52
and Laplace is already a SPectac_
theorem of de Moivre
it is not a compelling reason
Thecentral limit
result. But as it stands,
mathematical
system would be well
ular
unmodelednoise in a physical
to assume that
how
all,
would we deduce
distribution. After
representedby a normal
random effect in a physical system is the result
that someunmodeled
random causes, all of which have iden_
of manydifferentindependent
limit theorem runs deeper. We next

tical distributions? But the central
are identically distributed. This verremove the assumption that the Xi
sion of the centrallimit theorem was developed by Lindeberg (1922).
Weconsider the following conditions on the Xi variables.
the
lion to
Assumption 4.15 (Lindebergconditions). Consider independent randomvariablesXi, i = 1,2, , n satisfying (Xi) = 0 and var(Xi) = 2
(42.The following two conditions hold as n
and let s; =
00
(a) Sn
00
ago
(b) For every > 0, -5
f(Xk2; IXkl > sn)
CLTI
Thenotation (Xk2; IXkl > sn) is shorthand for

taking expectations
of the truncated random variable
f(x 2;
nt
variab
random
flu
> 0, and satisf
> a) =
Noticethat the definition

implies
that (X2; IXI > a)+(X2;

IXI a) =
var(X).Manysufficient
conditions
for the central limit theorem have
been proposed over
the years, but all were
superseded by the Lindeberg
conditions,whichwere
also shown to be
necessary (Feller, 1935; Lvy,
(1998
central
4.5
Limit Theorems
conditions. We have the following

4.16 (Lindeberg-Fe11er central
var(St)
theorem
limit
theorem).
satisfying Assumption
4.15. The
lim sup IFzn(x) ---
With
denoting
consider in-
and
normalized
sum Z
=0
this theorem is given in

Theproof of
Section4.9.
45.3
Multidimensional central limit theorems
Thecentral limit theorem (CLT) can be extended to

e Rd. Consider first
vector-valuedranvariables,
independent,
1, 2, ... , n random
Xi, i
variables with identicallydistributed (IID)
(Xi) =
var(Xi)= E. We assume that > 0 is positive
result.
following
definite.Wehave
the
Theorem 4.17 (Multivariate CLTIID). Let vector-valued

1, 2, ... , n be independent and identically random variablesXi, t
distributed with
(Xi)= u andvar(Xi) = E. The normalized sum
Zn = (l/vl)
distribution
in
to
the
normal
converges
N(O,E).
u)
(Xt-
Again,the IID version is a special case of a more general

version
thatassumesa generalization of the Lindebergcondition.
Theorem4.18 (Multivariate CLTLindeberg-Feller). Considerindependentvector-valued random variables Xi, i = 1, 2, . , n with (Xi) =
andvar(Xi)= Ei > 0, and satisfying the followingconditions

as n
(b)Forevery > 0,
Thenthesum Zn =
( llXi112; llXill > E) -+0

(Xi li) converges in distribution to the normal
Seevan der Vaart (1998, pp. 20-21) for further discussionof this
case.Theorem 4.18 is the mathematical basis for the commonphysical
modeled
aSsumption
that noise in process measurements is oftenwell
Probability,Random Variables, a
388
by a zero mean
normal distribution. The variance
(2
fid
EstihQti0h
often
can b
minedby examiningsamples of the measurement, Which is
tantpart of the process modeling task that is often overlooked
Finally,the history of the term "central limit theorem"
teresting.Apparentlycoined by Poly in 1920 (in German Is also
z
entrap;
eac
bution,wherethe distributionconverges quickly
pared to the tails of the distribution, where the as n increases

convergence
corns
slower(LeCam,1986). Le Cam's article is hi
is
much
ing for anyoneinterested in the fascinatin g
recommended
history of the
theorem.
central reads
linut
4.6 ConditionalDensity Function

and Bayes's
Let and
cific
be jointly distributed
Theorem
random variables
(x,y). we seek the
mth
density function
of given Probabihty
y of Ohas been
observed. we
define the that a
conditional
PE/0(x/y) =
P7(y)
Weexp
Considerarollofa single
die in which
whethertheoutcome
is
even or odd takes on values E or
die. The12
and is
O to
valuesof thejoint
density function the integer value denote
of the
are simply
computed
1/6
p,o (3, E)
1/6
pe,o (5, E)
p,o (6,
E)
1/6
Themarginal
densitiesare
then easily
whichgives
by
g across
p(x)
Pm;(011)
1/6
earesure the die

O
(4.33)
tecondional
density
This
factleads to a
computed;
rows of
(4.33)
we have
biddensity,
which
for
Density Function and Bayes's Theorem

ditjonal
4.6
con
we have
SIOjlarlY
389
for n
pn(y)
summing
givesby
down the columns of (4.33)
accordance of our intuition on the rolling

in
both
of the die:
are
1 to 6 and equal
value
each
for
probability
probabilityfor an
Tllese
.
outcome.
an o dd
is a different concept. The
or
fen the conditionaldensity
conditional
(xly) tells us the density of x giventhat n = y has been

considerthe value of this function
so
observed.
probability that the die has a 1 giventhat we know

tellsus the expect that the additional
information
We
thatit is odd. us to revise our probability that it is on the die
from 1/6 to
oddcausesdefining formula for conditional
being
densityindeed gives
1/3.Applyingthe
116
112
the reverse question, the probability that we have an odd

consider
thatwe observe a 1. The definition of conditional densitygives
given
= 116
pn(011)=
i.e.,weare sure the die is odd if it is 1. Notice that the argumentsto

theconditionaldensity do not commute as they do in the joint density.
Thisfactleads to a famous result. Consider the definition of conditionaldensity,which can be expressed as
or
Because
anddeduce
= pny,x), we can equate the right-handsides

(4.34)
pn(y)
390
theorem (Bayes, 1763). Notice

whichis knovsnas Bayes's
that
sult comesin handy whenever we wish to switch the varia thi
ble
knownin the conditionaldensity, which we will see is a

thatis
key
state estimation problems.
step
Example 4.19: Conditional normal density
Showthatif and are jointly normally distributed as

PX Pxy
mx
my ' Pyx Py
thenthe conditionaldensity of stgiven is

also normal
P" (xly) = n(x, m, P)
in which the mean and covariance
m = mx +
are
my)
Solution
(4.35)
P = PX --PxyPy-1Pyx
(4.36)
Thedefinitionof
conditional
density gives
PEIQ(xly) = PE,t7(x, y)
Because(E,7)is
jointly normal,
PEIQ(xly) =
in which
the
from Example
4.4
PO(Y) = n(y, my,

Py)
and therefore
Substitutingin
the
we know
definitionof
(2n)n/2
argument of
det
the
x
the normal
xy
density from
(4.13) gives
1/2 exp
(1/2)a)
(4.37)
exponent is
xy -1
(4.38)
in
conditional Density Function and

Bayes's
46
Theorem
P PX pxyPIPyx as defined
use
If
in (4.36)
inversion formula
artitjonedmatrix
to express
then we
can use3thle
the matrix
{4.38)as
inversein
Pxy
-PilPyxp-1
Pyx
substituting this
expression into (4.38)

and
multiplying
out terms
yields
expansion of the following

whichis the
quadratic term
[(X Inx) - PxyP1(y -
P-1 [(x -
mx)
we use the fact that Pxy = PIX.

inwhich yields
Substituting(4.36)
into this
expression
noting that for the partitioned matrix

Finally
det
PX Pxy
detpy detp
andsubstituting the two previous equations into (4.37)yields

x
PX pxy
x
n
Pyx
Y ' my
n (y, my, Py)
or
= n(x,m,P)
(4.39)
= n(x,m,P)
whichis the desired result.
Example
4.20: More normal conditional densities
Letthe joint conditional
of random variables (A, B) given C be a normal
distribution
with the following mean and variance
blc) = n((a, b),m,P)
ma
(4.40)
J get;

Probability,
Estimati0h
392
the
showthat
with mean
conditional
density of A
given B and C is also normal
c) = n(a, m, P)
given
and variance
by
Solution
density we have that

joint
of
Fromthe definition
the top and bottom of the

Multiplying
fraction by pc (c) yields
b,c) pc(c)
Pc(c)
or
PBlc(bIc)
Substitutingthe distribution given in (4.20) and using the result

in
Example4.4 to integrate over a to obtain the marginal pmc(blc)
blc)da yields
f
a ma
Pa Pab
'
b
' Pba Pb
n(b, mb,Pb)
Nowusing (4.39) and (4.36) gives
m = ma + PabPJl (b
and the result is
= n(a, m, P)
nib)
P = Pa PabPJ lPba
established.
4.7
Maximum-Likelihood
Estimation
we now
turn to one
of the most
terminemodel
basic problems
in modeling: how to deparameters
methodsto
from
experimental measurement. Finding
solveparameter
estimation problems
has had a significant
41
Maximum-Likelihood Estimation
development of
mathematics
acton the
393
that we wish to use to explain

model means simply that y someresponse
aleS1'1near
ox in
wish to determine
which Oisvariables
etersthat we
from
measurements a set of
of X. we often intend to use the identified
of y for
parametersto given
make
thex variables to find the conditions that
Thisapproach may save considerable timemaximizethe
and expense
alternative of trial and error experimental
compared
adjustment
of the x
variables.
finding
the
to
"best"
Inaddition
parameter estimate,
our
uncertainty
we would
in
quantify
the estimate.
liketo
also
Modeling
as
a
data
random
the
the
variable
taintyin
with some fixed uncerthe
key
of
methods
that we can use
is one
probability
density
to
solve
this probuncertainty in measurement leads to uncertainty
lem.
in estimate,
and
sfipulatingthe structure of the measurement uncertainty
allows
us to
find(exactlyin some cases) the uncertainty in the estimate.
thecentrallimit theorem, our first choice for modeling Becauseof
uncertaintyin
is the normal distribution. We then
measurement
have the model
inwhiche is assumed normal and zero mean. The effectof nonzero

meanis assumed to be included in O as additionalparametersto be
estimated.
Thesix canonical linear estimation
problems.
We next look at the
sixversionsof this problem that result from assuming(i)y is a scalar

orvector,(ii) O is a vector or matrix, and (iii) whether we know the mea-
surementerror variance, or if it has to be estimated from the data. The

variable
x will be a vector throughout. The goal in each problemis the
same:find the optimal parameter estimate by maximize the probabilityofthe data, and quantify the estimate's uncertainty, for example,by
determining
confidence intervals. The first five estimationproblems
haveanalytical, closed-form solutions. Number six requires iterative,
andthe
nUmerical
solution for both the optimal parameter estimate
measurement
error covariance estimate.
394
Probability, Random
4.7.1 Scalar Measurementy, Know

n
Variables
, and
Esti hQ
Measurement
Variance
e Rnpis the np vector of ermronmental

Consider n
conditions for the
ith,
1 samples. The probability

density of the
set of
distribution for the measurement error
(27T)n/2nexp
Taking logarithm gives
In 27T + n Ino +
2
This equation is easier to express if we first stack
the
in a vector
and
giving
1
Inp(ylO, a) = U ln2TT+ n Ina + G(y
-XO)
We define the the log of the likelihood as a function of the parameters
O and (Twith the data y regarded as fixed values
1
L(O,(T) = U ln2TT+ nln + G(y
(4.43)
2,
Because we assume that we know the measurement error variance
to
the only unknown in this first estimation problem is O. Therefore,
find the maximum-likelihood estimate, we maximize L(O,U) bydifferentiating with respect to Oand set the result to zero
= x T(y -XO)
O= A x T(y -XO)
(4.44)
47
395
O = (x TX) -1 x Ty
delaya discussion of what to do when X does

not have
until Section 4.8. But we know from
full column
Chapter
1
unique,
and
we have a linear
that the
estimateis not
optimal
subspace
This situation is depicted
of estimates
optimal.
in Figure
allare
that
1.6(b).
of
density
parameters
and parameter
probability
item
of
interest
next
is
the probability confidenceinterval. The
densityof the
be the parameter generating
estimates. Let 00
the
measurements
+
e.
00
Then
X
we
have
so the
modelis y =
= (x Tx) -1 x Ty
e
N(O, 21n)
Usingthe result on linear transformation of a normal, we have
O N(00,
(4.46)
Asshownin Exercise 4.21, for a random variable e

distributedas
normal
with
mean m and covariance P, the probability
a multivariate
that takes on value x inside the ellipse
(x m) TP 1(x m) < b
is given by
y(np/2,b/2)
r(np/2)
inwhichthe complete and incomplete gamma functions are definedby
(Abramowitz
and Stegun, 1970, p.255-260)
F(np) =
t np- l e -t dt = (np 1)!
y(np,x) -
tnp-le-tdt
Thefunction X2 inverts this relationship so we have that
Probability, Random variables, and

396
for
defining
into the equation
(x --
the ellipse
m)
x2(np, cx)
mean and
in the values of the region covariancegives
substituting
for the
Finally,
elliptical confidence
maximum.
a-level
the following
likelihoodestimate
xTx
T
( 00)
(O 00)
x 2(np, a)
(4.47)
parameter vector, the elliptical region is

For a large-dimensional these cases we may wish to approximate
the
bersometo present. In
box
bounding
that
smallest
the
contains
regionwith
the
confidence
4.15, this box is given by
ellipse. As shown in
Exercise
|-oo/
1/2
(x 2(np,
limits with the following

whichis commonlyreported as plus/minus
notation
in which
= (x2(np,
1/2
Note that the parameter uncertainly interval does not dependon

the measurementsamples when we know the measurement error
variance.Wecan compute c before we do the experiment, based solely
on the chosen Xi. Only O depends on the experiment. And if we do an
increasing number of experiments, X TX = Ei=l XiXiTincreases linearly
withthe numberof samples n, so the confidence interval c decreases
as n -1/2. So one method to reduce uncertainty in parameter estimates

is to replicate experiments.
Marginalparameter estimates. Another way to condense themulti-
variate density is to compute its marginals. Since e is distributed as a
normalin (4.46),we compute marginals

as in (4.19) giving
kelihood Estimation
41
Maximum-Ll
397
0(-levelconfidence levels
compute
on each of the
then
np unigiving
normals
O = 00
srariate
bich
= (X 2 (1,
that the
and
formulas are different. The first

is the
boundmultivariate
0(-level
true
confidence
region;the secfor the
of the cx-level
confidence intervals for all

a collection
simply
is
multivariate estimate. Let's call this latter regionthe
the
marginalsof to distinguish it from the bounding box. Students the
box"
often
"marginal it is difficult to present a high-dimensionalellipse,whichof
ask,"Sinceplus/minus results should be reported as the
confidenceinthesetwo research presentation?" This questionhas no satisfactory
tervalin a
is to know
and communicatewhatyou
The important point
The bounding box certainly contains more than the
reporting.
are
region in its interior.
since it contains the true
probability
level
box does not have this property. The interpretationof
marginal
The
box is the same as the interpretation of any marginaldenmarginal
the
obtained many samples of the parameter estimatesfrom
sity.If you the ith interval of the marginal box would contain an
manydatasets,
all the different samples of the ith parameter estimate.
levelfraction of
about the probability of the jointly distributedparameter
statement
No
this characterization. We include the following
from
follows
estimate
these distinctions.
clarify
help
to
example
confidence region, bounding box, and marginal
Example4.21: The
box
random
Assumethat the two-dimensional
N(m,P)with
variable is distributedas
(a)Plotthe multivariate density.

densities,
(b) Compute and plot the two marginal
fidenceintervals.
and their 95%con-
box, and plot

marginal
the
and
(c)Compute the bounding box
ellipse.
confidence
95%
density
alongwith the joint
them
Probability, Random
Variables, and
Estimation
398
0.6
0.4
0.2
0
0.6
0.4
0.2
024
-2
0.4
0.2
-2
-2
Figure4.10: The multivariatenormal density (top right). The two

marginal densities and marginal 95% confidence regions (shaded) (top left and bottom right). The joint elliptical 95% confidence region (shaded), bounding box
(outer),and the marginal box (inner) (bottom left).
(d) Take 1000independent samples of E, and determine the number

inside the ellipse, the bounding box, and the marginal box. Approximatelywhat confidence levels can you assign to the bounding box and marginal box?
Solution
(a) The multivariate density

is shown in the top right of Figure 4.10.
The 95%confidence
ellipse is given by
(x This ellipseis shown
- m)
= 5.99
in the bottom left of Figure

4.10.
4.7
(b) The
399
two marginals are

1
The marginal densities of
and top left of Figure 4.10,
(1/2)
and
are
shown
respectively. The in the bottom right
intervalfor the
x 2 (1, 0.95) =
3.84
Xl e
[1.77, 3.771
= 3.84
e [0.614,
3.391
These intervals are shown as the

shaded regions in
the bottom
(c) The ellipse's bounding box is given by
2
x 2(2, 0.95) = 5.99
(1/2)
= 5.99
e [-2.46,
4.461
e [0.269, 3.731
The ellipse, bounding box, and marginal box

are shownin the
bottom left of Figure 4.10.
(d) Generating 1000 samples of
and counting the fraction of sam-
ples within each of the three regions gives
ellipse = 0.956
bounding box = 0.981 marginal box = 0.920
47.2 Scalar Measurement y, Unknown MeasurementVariance

Wenow consider the measurement error variance 2to be unknown.
Wehavethe same model as in the previous section

ei
N(O, 0 2)
Probability, Random Variables, and Estimation
400
is unknown, we maximize the likes

variance
measurement
and O and estimate
Whenthe
(4.43) over both (T
in
given
the same as in (4.44)both
is
derivative
lihood function
O
data. The
,and
quantitiesfrom the SNithrespect to (Tgives
differentiating(4.43)
L(O,(T)
I xT(y -XO)
L(0, (T) = _ Y + -3 (y
to
Equatingthe derivatives
XO)
zero and solving simultaneously gives
(xTx)-1xTy
1
= (y
n
(4.48)
-XO) (4.49)
maximum-likelihood parameter estimate is Unchanged

the
that
We see
case, and the maximum-likelihood estimate of
from the known variance
of the residual over
the varianceis the mean of
the square
the samples.
estimate of variance is close to but

Notice that the maximum-likelihood given by the formula
(for n np)
the sample variance s2
not equal to
n np
2
n np
Weshowsubsequently that the sample variance is an unbiased estimate
of 2so the maximum-likelihoodestimate of 2is biased. But this
bias is small for a large number of samples compared to parameters
n > np.
Giventhe same result for O as in the previous problem, the probabilitydensityof Ois unchanged from the previous problem. Wenext
For this it is convenientto
determinethe probabilitydensity of
first considerthe singular value decomposition of the X matrix. We
assume that this n x np matrix has independent columns so the rank
is np. As discussedin Chapter 1, a real n x np matrix with independent
columnscan be writtenas the product of orthogonal n x n matrix U
and orthogonalnp x np matrix V, and diagonal np x np matrix E
x = [UI U2
0
4.7
relationships result
the
ill
U! UI
40)
from
orthogonality
In np
UITU2
0 npxnn
UIUIT + U2U2T
= 1n
VT
Onnpxnp
V V VT
value
singular
decomposition
Usingthe
(SVD)for X,
we find by
(x Tx) -1 x T =
1
substi-
Vs -I UT
= UIUT
= (J2U2
Theserelations allow us to express the estimate and

residualin terms
measurement errors as
of the
O - 00 = VE-l UTe
Y -XO = U2UTe
(4.50)
Usingthese relations we can express the followingquadraticterms

as
(0 - 00)T (xTx) (- 00) = eTUIUTe
(y
- XO) = eT U2U2Te
Theserelations provide an essential insight. The error e obviouslyaffectsboth quadratic terms, but its effect in the sum of the squaresof
theresidual (the sample variance) is through U2and its effectin the
parameterestimate's distance from the true value is through UI. Becausethese two matrices are orthogonal to each other, the effectof
themeasurement error is independently distributed in these two quadraticterms. We make this statement precise subsequently. First it is
helpfulto establish that the following two random variables, Zl, z2 are
independent
statistically
= UT e
Giventhat e
= U2Te
transformation of a
linear
on
result
the
and
N(O, 21)
normal,the pair
is distributed as
0
O
Intip
402
and the covariance is diagonal ,

and
We also know that their
Since
independent.
quadratic
z2 are statistically
as chi-squared
products are distributed
normal
the pair is jointly
Z2 Z2
Xnp
Zl
Xn-np
f,
ti
the chi-squaredand chi densities, and also

Exercise4.33 discussesX; is n.
of
shows that the mean
deduce quickly the earlier claim that the sarn_
Fromthat fact we can estimate. Summarizing our results
on
ple varianceis an unbiased
ple variance thus far
1
n -Tip
n np
(zT2Z2)
Taking expectation gives

2
f(s 2) - n np
2
n np
2
and the result is established.
As shownin Exercise4.3, if two random variables are statistically

independent,then all functions of the two random variables are also
and zlZ2 are
statisticallyindependent. Thereforewe know that
statisticallyindependent.The ratio of two chi-squared, statisticallyindependentrandom variablesis defined as the F-distribution
confli
Z(451)
xTx
n np
np)
np
The F-distribution can be shown to have density
hi*thesamplev
expre
r ( nt2m ) 1
(zn)nmm
(zn + m) n+m
41
Maximum-Lik
elihood Estimation
403
provide further discussion

of the F-distribution.
definitions of Zl, in terms of
4.35 and 4.45
the
00)
Oand
2
(O
(O 00)
give
x2n-n
00) T (x T x ) (O 00)
YIP)
(4.51)
provides the basis for the confidence

intervalson
lastdistribution
estimates. Summarizing our results so far, the
parameter
estimates and the measurement variancedensities
parameter
estimate
forthe
are
(4.52)
(4.53)
Xnnp
distributions are inadequate to construct confidence

these
that
Notice
estimated parameter Obecause they both depend on the
the
on
levels
measurement variance 2. One might be tempted to reunknoWn
02 in the normal density for with the maximumplacetheunknown
estimate and obtain the confidenceintervalsfor e from
likelihood
idea is in the right spirit, but is not quite correct. We
thatdensity.That
region by considering the distributionin
obtainthe correct confidence
Noticethat the ratio of the two quadratic terms has dividedout
(4.51).
O
thecommonterm 02. We define the function F (n, m,
toreturnthe argument of the cumulative F-distribution that achieves
value
(4.54)
Thenthe ellipsoidal confidence intervals for the parameter estimates

followfrom (4.51)
( e 00) T
xTx
F(np,n
(O 00)
Wecanuse the sample variance s2 in place of 02 in this formula to give

aslightlysimpler expression
xTx
(O
00)
up
Random
Probability,
we
the
can obtain
bounding
in the previous
as was done
intervals
box
section
1/2
TIP,
in which
and the previous case is that

between this on the measurements. The
difference
also depends confidence ellipse is there_
Thesignificant interval c the
confidence
of
the same; given
herethe as the center (0) interpretation remains
00,
which is not a
well
sizeas
the statisticalthe true parameter
But
forerandom. experiments, generated confidence ellipse for the
manyreplicated lie within the
c decreases with
randomvariable,
previous
denceas the
samples n
confidence interval
the
case,
number of
tbe
Parameters Correspondy, Different
Measurement Covariof
Measurements, Known
4.73 VectorDifferent
ing to
ance R
fremeasurement case. This case arises

vector
the
Wenext consider
linear models between a vector of
empirical
quentlywhen identifying
or response variables Yi. We
output
of
vector
a
type has its own
input variablesx and
considerfirst the case
vector of parameters
1
in which each measurement
sRisknown,W
Itis
perhaps
Rewritinf
describing it
1
el
(4.56)
weuse
dices,Taking th!
Yp i
givesa ma
ei
Theenvironmental
variableXi is assumed to have q components, Xie
Rq,and e RPxq,and we assume q < n. In this model we have
np = pq modelparametersto estimate. Notice that this model is not
restrictedto onlyp independent versions of the model given by (4.42),
Thegeneralizationallowedhere comes from the covariance matrix R.
Toreducethis case to the (4.42),
we would add the further restriction
that R = 21.Wewill see
that allowing the different measurements
ikelihood Estimation
41
Maximum-L
405
to be correlated does not prevent us from solving

this estiassc%elated).
variables
1, ... , n, and, given the

Xi,
we
have for the probability deterministic
(r)and the n
density of the
measurements
1
(2TT)np/ 2
taking
or,by
(det R)n/2
exp ( E (Yi 9xi)TR-1

(Yi
logarithm
nP In 27T+ IndetR +
E (Yi xi)TR-1 (Yi

ext)
i=l
the log-likelihood as a function

Weagaindefinedata Yi, i = 1, 2, ... n, regarded of the parameters
as fixed values
andRwith the
In 27T+
n
IndetR + E (Yi o
xi)TR-1 (Yi
SinceR is known, we take the derivative of L with respect to the matrix
It is perhaps easiest to perform this derivativeusing component

notation.Rewritingthe expression for L in components gives
= LZln2TT +
IndetR + (Yir rjXij)T
(Yis
(4.57)
inwhichwe use the Einstein summation convention for repeated indices.Takingthe derivative of scalar-valued function L with respect to
givesa matrix derivative
(+)mn
1
(YirPerforming
the sums over the deltas, noting R is symmetric,and collectingterms gives
R ms (Yis slXil)Xni
Probability,Random Variables,and
406
If we convert this back to
statement we have
the vector/matrix notation of the

n
Settingthis matrix to zero and solving gives the maximum-likelihood

estimate for the parameters G)
-1
in which we assume that the matrix E i XiXiThas full rank. Again,

discuss what to do when this rank condition fails later in Section we
4.8
Notice that the value of the measurement error covariance R
vant in the estimationof G)in this problem also. It is often is irreleto arrange the variablesso that the summation is performedconvenient
by matrix
operations.Arrangingthe data vectors in the following matrices
allowsus to express the maximum-likelihood estimate
= yx T xx T) -1
as
(4.58)
Next we determinethe probability density of

the estimated parameter 9. We denote the parameter value
generating the data as 90,
the measurements are given by
so
Y = 0X+E
E = [el
Theestimate and its transpose are

therefore
= 98 + (xx T)
Wefind the transpose
convenient because we now
matrix 9 Tin a vector
wish to stack the
giving
01
0 1 02
02
47
Maximum
-Likelihood Estimation
407
to both sides of the

transposed
vec T =
fromthe
form of the
+ (1
definition of E we see
el,l
el,2
el,n
ej,i jth measurement, ith sample
vecE
ep,l
ep,2
of these normally distributed random variables,

Giventhis arrangement
the density
wehavefor
vecET
in which
RII
RIP
RII
RIP
RPI
Rpp
RPI
Rpp

vec T vec{
in which
N(O, S)
S = Re (XXT) 1
(4.59)

408
Using the
plify this
Kronecker
from Section 1.5.3, we can

product formulas
follows
covariance as
Sims
result for S, is the matrix analog of the vector

this
with
Equation(4.59),
result in (4.46).
the elliptical confidence region for vecT
density,
normal
Giventhe
4.7.1
in Section
can be found as
S -1 (vecT vec(T)
(vecT
x2(np,
a)
(4.60)
Interlude
orthogonality
Let's put the tools of
and Kronecker products to good use
in statistics, namely that the sample

and provea fundamentalresult
normal distribution are statistically
meanand sample variance from a
independent.
Theorem 4.22 (Meanand variance of samples from a normal). LetXie
RP,i = l,...,n be n independent samples from NW, E). Definethe
samplemean and the maximum-likelihood estimate of the variance as

n
n
ThenX is distributed as N(u, (1/n)E) and independently of E, and ny is
distributedas
and identically as N(O, E)
Proof. Stack the n
in which the Zi are distributed independently
vectors next to each other in a matrix
Wenext construct an orthogonal transformation of this matrix. LetI

be l/vl times an n-vector of ones so
that Xl = V. Next consider
the null space of I T.

From
the fundamental
theorem
of linear algebra,
that is an n I dimensional
space. Collect an orthonormal basis in the
aximum-Lik
M
(11
elihood Estimation
409
Bn-l. Then construct the

matrix
following
1)
orthogonalB
atri$
BnT 1
IT
BT _
BBT
e the
BTB = 1
transformed random variables
pefill
XB T
X. The samples Xi are distributed as

ill
vecX
more compact
in
or
xn
notation
vecX N(FLI
transformationgives for Z
The
vecz = (B 1)vecX
vecZ
inwhich
these expressions gives

Rearranging
P = (BBT
the orthogonality relations we have

From
BBT = 1
1
so
vecZ =
zn-l
Probability, Random
Variables, and
Estimation
410
variables Zl,
conclude that the
we
covariance
From the independent. Computing E gives
statistically
',zn are
xxT)
(XiXiT
1
--- (X X T nxx T )
1
T T
(ZB BZ -- znzn)
n
1
Z ZT
znznT)
nl
ZiZi
distribution for E. Since E is a function of

which establishes the stated
of only zn, E and x are independent.
function
a
is
x
and
zn-l
...
onlyZl,
(1/n)E), and the theorem
X
SinceX = zn/ffi, we have that
N(u,
is
proved.
statistical texts using a dazzling

This result is established in many
the mystical. The proof given
varietyof arguments, some bordering on
method given by Anderson
above is a compact expression of a standard
(2003, p. 77).
4.7.4 Vector of Measurementsy, Different Parameters Corresponding to Different Measurements,Unknown MeasurementCovariance R

When R is also unknown, we maximize L in (4.57) with respect to both
and R. The G)derivative has been given previously. Differentiating

(4.57)with respect to R is facilitated by using the following fact about
the trace of a matrix product
tr(B) = tr(BA)
whichfollowsimmediatelyfrom the definition of trace and expressing

the matrix product in components
tr(AB) = AijBji = Bji Aij
47
Maximum
-LikelihoodEstimation
411
twice on a product of three

matrices gives
tog
tr(ABC) = tr(BCA) = tr(CAB)
allows us to rewrite the following scalar
identity
term
xi)TR-1 (Yi Xi) = tr ((Yi - 9xi)TR-l
(Yi 9Xi))
= tr (R-1 (Yi - 9Xi)T)
the following fact in differentiating the trace of
use
we
a function
Next
this result
of a
matrix
dtr(f(A))
g(x) = df(x)
= g(AT)
dx
the usual scalar derivativeof the scalar function
whichg is a derivation of this fact.
f. See
for
Applying
this result and using
Exercise4.4
thefact
that R is symmetric gives
= -R -2 C
of the determinant and the log of the determinant

Thederivative
are
derivation)
a
for
4.5
(seeExercise
d det A
d In detA
detA
= (A
TheR derivative of (4.57) is therefore
+ -R -2
= -R
2
- oxi)T
this matrix equation to zero, using the estimate of 9, and solvSetting

inggivesthe maximum-likelihood estimates for this problem
-1
E(Yi
TheestimateR is an unbiased estimate of the measurementvariance

R.Thedistribution for nR can be shown to be a Wishartdistribution
(seeExercise4.51), which is a generalization of the X2distributionto
Probability,Random
412
The Wishart distribution can be

1928).
case (Wishart,pp. 252-255)
multivariate
the
(Anderson, 2003,
be
to
1tr(R -l W)
W) 2P
(det
=
pw( W)
the
in whichrp is
(4.61)
function defined by
gamma
multivariate
rp(z) = TT
1))
probability density pw() is a POsitive

the
of
Notethat the argument
is zero for W not positive definite.
probability
The
W.
definitematrix
Measurementsy, Same Parameters for all Measureof
Vector
47.5
Measurement Covariance R
ments, Known
in which the different measurement types
Nextwe consider the case
parameters. The model is
are affectedby the same set of
el
e2
ep
In this model, all of the different components of the measurement

are affectedby the same, single vector of parameters
O. Considern 1 samples, i = 1, ... , n, and, given the deterministic

variablesO and the n Xi, we have for the probability density of the
measurements
exp(-
i=l
-Xi)
np
2
i=l
-Xi0) (4.62)
41
Maximum
raking
-LikelihoodEstimation
473
with respect to O gives

derivative
the
2
n
E 2XiT R -1 Yi ---2Xi T R -1 Xi0
(Yi- Xi0)
equation to zero and solving for O

Vector
gives the maximumsettingthisestimate
likelihood
Ixt)
(4.63)
it can make sense to estimate Owith a single

problem,
sample
Inthis
the
number
choose
of
can
measurements
we
p
1)if
significantly
number of parameters np. For a singlesample,

the
than
the
larger
formula is
estimate
parameter
= (X TR -1 x)-
X TR -1 y
(4.64)
of a weighted least-squares problem using R-1 as

whichis the solution
Compare this expression to (4.45).
theweight.
this is the first estimation
problem for whichthe

Noticealso that
estimate of the parameter O depends on the comaximum-likelihood
of the measurement error R. We see next that this dependence

variance
us from solving the final estimation problem in closed form.
prevents
Wenext calculate the probability density of the estimate. We denote
theparametervalue generating the data as 00, so the measurements are
givenby
Yi = Xi()+ ei
andsubstitutingthis result into the estimate equationgives
ExTR -1et
el
+ (EXTR -1Xi)
[XTR-I
en

O - 00
(4.65)
414
in which
XTR-I
s = (ExTR -l Xi
R-IXI
-1
R-l xn
S = (EXT R-l xt)
-1
Giventhe normal density, we can compute the elliptical confidence re_

gion as in Section 4.7.1
eo) T S -1 (0 - 00)
(0
(4.66)
The bounding box intervals follow as in Section 4.7. I. Notice that when-
ever the variance of the measurement errors are known, the maximumlikelihoodestimate is normally distributed and the elliptical confidence
intervals are given by X2(np,
4.7.6 Vector of Measurementsy, Same Parameters for all Measurements, Unknown Measurement Covariance R
The final case is the one that arises most often in mechanistic modeling
of chemical and biologicalexperiments. To determine the unknown

R, we maximize L(O,R) over R in addition to O. Using the results of
Section 4.7.4 we can take the derivative of (4.62) with respect to R giving
n
2
Setting this result to zero and using the result of the previous sec-
tion gives the followingset of necessary conditions for the maximumlikelihood estimates
(ExTk -1Xi)
ExTk -1Yi (4.67)

i
(4.68)
tion
covariance such as Ro =
I. One
then
estimates
by SOIVinga sequence of
standard
thereis no guarantee that this
the iterate
by
estimation
procedure
a crudeinitial guess like Ro = 1 lies
converges. problems.
But
Onemay
outside the
find
regionof
that
convergence
Maximum-Likelihoodand Bayesian
Estimation
background
this
in
maximum-likelihood
liketo compare the approach to
another classestimation,we
would
knownas Bayesian estimation. As
we saw in of popularmethods
the previous
in the maximum-likelihood approach,
we
sections
maximize the
OMLE
= argmaxp(y; 0)
Althoughin the MLE sections we wrote
probabilityof
(4.69)
to indicatethat
a parameter, here we use instead p(y; 0)
e was
to emphasize
that e is an
unknownparameter, not a random variable.
In the MLE
approach,e is
a random variable, not O, and we assess the
confidenceintervals
In Bayesian estimation, on the other hand,
for 0.
itself
is modeledas
a random variable. The information that we have
about e beforethe
experimentis denoted by p (O). In the experiment,
we imaginedrawing
well
as
e
as
of
the
a value
measurement errors to createthe

data =
x! e + ei, i = 1, ... , n. With the measured y available,we

thenmaximize
p(ly)over O to obtain the estimate
= argmaxp(\y)
Theconditional density p (OIy) is known as the POSTERIOR
density,i.e.,
the density for O after the experiment, and the density p (0) is knownas
thePRIOR,
i.e., the density before experiment. In Bayesianestimation,
weassess how much the measurement of y has changedourknowledge
aboute. From Bayes's theorem we can express the posterioras
Variables, and
Estimation
Probability,Random
416
same functional form as p(y; 0) in

the
exactly
is
not depend on O
Noticethat p(y10) Sincethe denominator does
the MLEapproach. we estimate O by the following equivalent
Bayesianestimation
mization
argmaxp(ylO)p(O)
(4.70)
OBE =
estimators
difference in the
(4.69) and (4.70) is the presence of
approach. In the absence of knowledge

Bayesian
the
in
the prior p(O)
that p(O) is a uniform distribution. This is
assume
often
we
about O,
prior. Since p (O) does not depend on Owith
noninformative
calledthe
MLEand BEestimates are identical in this
the
prior,
the noninformative
case.
Bayesian estimation is a useful way to sumThe posterior density of
about the parameter Ogiven the available
knowledge
of
state
the
marize
available the posterior density, confidence
experiments. Sinceone has
determined directly from p (Oly). Box
levels on random variable O are
discussion of Bayesian estimation. In
and Tiao (1973)provide further
problem of state estimation, we
Chapter 5 when we address the
use the Bayesianapproach.
The only
4.8 PCA and PLS regression

Principalcomponentsanalysis (PCA)and projection onto latent structures (also known as partial least squares) (PLS)are two methods used
to developempiricallinear models between a vector of predictor or
environmentalvariablesx, and a vector of responses y. This is the
same linear model discussed in Sections 4.7.3 and 4.7.4, so we can
viewthese methods as alternatives to the maximum-likelihood estimation approachpresented in those sections. The focus of these methods
is on determiningestimates of the linear model that can handle situations with possible collinearities in the x variables, and missing or
erroneousinformation,such as unknown error structure. Collinearities in the data can make the maximum-likelihoodestimator highly
sensitiveto outliers and nonnormal errors. Because the measurement
error structure is regarded as unknown or at least unreliable, robustness of the estimated model to unmodeled effects is the goal, rather
than statistical optimality as in the maximum-likelihood methods.
As in Section4.7.3, let p-vector y and q-vector x be related by the
linear modely = Ox + e, and we wish to determine the parameter

matrix (+)e [RPxq
given data on y and x. We use Xi,Yi,i = 1, 2, , n
to denote the availablesamples. We assume
n > q (often n > q) so
and PLS regression

4.8 PCA
477
that we have more equations than
unknowns,
a well-conditioned estimation problem.
It is which is necessary
customaryto
for
definedata
1
XT
to
Y e NIXP,x e Rnxq, and the

model is Y =
X9T
use a more standard notation
model
the linear
we let B =
Y=XB+E
e Rqxp+ E. In order
,
and we have
Wewish to estimate parameters B from

measurements X
and Ywithout
knowledgeof the statistical structure of E.
Given
what
we already
aboutleast squares from Chapter 1, a natural
know
minimizesome measure of the size of the approachwouldbe to

residualmatrixE
choicesof B. If we choose the sum of the squares
overall
of all the elements

matrixE as our measure, we have (the square of)
of
the so-calledFrobenius
matrix
normof the
Soour first candidate for estimating matrix B is
min IIYIt is not difficult to show that the solution to this problemis the following
Bis =
= xtY
withthe usual pseudoinverse that we have seen in the standardvector

least-squares problem in Chapter 1. Notice that by taking the transpose, this is also the maximum-likelihood estimate givenin (4.58)for
the case in which the measurement error in y is assumednormally

distributed with covariance R, whether the covariance is known, or un-
knownand must be estimated from the data.
Also, we already know that XTX has an inverse if and only if the
columns of X are linearly independent; see Proposition1.19.Sincewe
maynot have control over the experimental conditions,we oftenmust
dependent
contend with datasets in which X has dependent or nearly
X. In such
columns,i.e., we have near collinearity in the columnsof
418
011
the maximum-likelihood
estimate Bis is unreliable and sensitive

the
small errors in
in the data or
to small changes
cases,
assumedmodel
structure.
idea what to do about this issue given

clear
a
have
SVD. But we also
decomposition (SVD).We first replaceX
value
singular
background with
more rows than columns
USVT,and since X has
its (real) SVDX =
we obtain
>
E VT
E = diag(1, ,q),
first q columns of U, and U2 containsthe

the
contains
UI
which
in
Multiplyingthe partitioned matrices gives
remainingn q columns.
X = UIEVT
which E has several small singular values
Next to handle the case in
columns that are nearly collinear,we
correspondingto matrix X with
singular values to zero. ASsume
approximateX by setting any small
and q C small singular values that
we have C large singular values,
of X may be q, but with small
are nearly zero. In this case, the rank
drop to rank C.We
perturbations to the data in matrix X, it can easily
andti
have
tom
that
= UcEcv{ + UqEqv{
X
UcEcv cT
(4.71)
int(
of
Using this lower-rank SVDin place of X then gives the followingmore

robust least-squares estimate
= VcEi 1U/Y
(4.72)
The ill-conditioningcaused by inverting E with all q singular valuesis

overcomeby inverting only the largest C singular values. Thus the SVD
s
estimateis less sensitiveto errors in the data than the least-square
or maximum-likelihoodestimate. Realize also that only the maximumlikelihoodestimate is unbiased. By suppressing the small singularvalues, we introduce a small bias in BSVD,
but greatly reduce the variance
in the estimate.
and PLS regression

PCA
8
orthogonal matrices T,
known as
the
only
first
loadings.
C principal the scores,
components and p knOwn
are
columns of T and P, respectively.
retained as the
The
Principal
component
BPCR= PI
regression
so the correspondence with the SVD

approach is
in pcR are the product of the singular
as follows.
values
and the left The scores
tors Tc = UcEe. The loadings are the
singularvecsubstitutingthese relationships into right singular vectors,
the formula
Pc = ve
for
showsthat
B PCR = BSVD
andthe two approaches are equivalent. so

one advantage
the SVDas part of linear algebra is that you
of learning
have also learned
PCR.
pLSR. A potential drawback of the PCR
approach is that
only the predictorvariables are evaluated. The principal
componentsare selected
to maximize the information about matrix X.
But there is no
guarantee
that these components can represent the responses
Y. To improvethe
predictive capability of the model, the PLSregression
(PLSR)
adds a very
interesting wrinkle. In this approach, one does not start
with the SVD
the
SVD
of
with
XT
Y, which includes information
of X but
about both
X and Y and the correlation between them. Notethat XTYe

whichis a small matrix regardless of the numberof samples,n. So
computing the SVD of a matrix of the dimension of XTY,which is done

repeatedly in PLSR, is a fast computation. The components, calledlatent variables, are obtained recursively as follows (Mevikand Wehrens,
2007).The first left and right singular vectors
and VI, are used to
obtainthe scores ti and WI, respectively,via

ti = Xul = Elul
WI = YVI = FIVI
and Y,respectively.
in which the matrices El and Fl are initialized as X
t{tl. Wenowdefine
=
tl/
ti
normalized
usually
TheX scores are then
420
the two loadings, PI and
Estimation
using the same score tl
PI = El ti
Next the data matrices are deflated by subtracting the informationin
the current latent variablevia
T
Ei+l = Ei -- tipt
Fi+l = Fi titli
in place of X TY
The next iterate starts VNiththe SVDof Ei+1Fi+1
and the
process is repeated. As in PCR,the number of latent variables C q is

chosen as the number of iterations of the algorithm. The left singular
vectors ui, the scores ti, and the loadings Pi and qi for i = 1 2
are stored as the columns of the four matrices U, T, P, and Q. We
do not require the right singular vectors Vi. Finally we compute the R
matrix from
-1
R = U(P TU)
and use the low-rank approximation, X TRT. The PLSsolutionis

T
then the least-squares solution of Y = TR B giving
= R(T TT) I TTY = RQ T
Cross validation. In both PCRand PLSwe need to decide how many

principal components or latent variables to retain in the model. The
most widely accepted method to make this decision is known as cross
validation. In cross validation the dataset is divided into two or more
sets; one set is used for fitting the parameters, and the other set is used
to evaluate the predictive power of the model using the remainingdata
that have not been used in the fitting process. The validation error, de-
fined as Ev = YvXvBc, in which c is the estimated model parameter
matrix using the fitting dataset (X, Y) and the chosen number of principal components or latent variables, C. To determine the best valueof
C to use for estimating B, one finds the C that minimizes llEvIl}.This
value of C is large enough that the model fits the data accurately,but
not so large that the model has been fit to the noise in the data. We
demonstrate the cross validation technique with the following example.
Example 4.23: Comparing PCRand PLSR

Consider a dataset with five predictor variables, x e R5, to modela
vector of two responses, y e R2. The dataset has 200 samples. The
4.8 PCA
and PLS regression
421
4
3
llEpcRllF
16
12
llEpcRvllF
4
0
1
4
5
Figure 4.11: The sum of squares fitting error

(top)and
error (bottom) for PCR versus
the numberofvalidation
components C; cross validation
principal
indicates
that
four principal components are best.
dataare available in file pca-pl s-data. dat on the website

www.che.
wisc.edu/Njbraw/pri nci pl es.
Wewould like to estimate the coefficient B in the model
Y = XB.
Comparethe results using PCR and PLSRfor the regression. Show
the
prediction error in Y for the number of principal componentsor latent
variablesranging from one to five (full least squares). Whichregressionmethod provides the best fit with the smallest number of principal
components/latent variables?
Solution
Firstwe divide the 200 samples into two sets, and use the first 100
samples for estimating the parameter matrix B, and the second100
samplesfor cross validation. For principal componentanalysis,we
compute the SVD of the 5 x 100 X matrix. The five singularvaluesare
E = diag(15.1, 3.26, 2.72, 2.67, 0.0226)
422
Estimation
8
7
6
5
4
3
Il
IlF
4
5
Figure 4.12: The sum of squares validation error for PCR and PI-SR
versus the number of principal components/latent variables C; note that only two latent variables are required
versus four principal components.
We see that X has four large singular values and one near zero, indicating that the rank of X is nearly four. Next we estimate BPCR
using
(4.72)for V = 1, 2, 3, 4, 5 and calculate the sum of squares of the fitting
error, IlY XBpcRllF.The results are shown in the top of Figure 4.11. It
is not surprising that the fitting error decreases with increasing number of principal components. As we see, the fitting error contains little
information about how many principal components to use. After estimating the parameters, we then compute the output responses for the
validation data and compute IlYvXvBpcRIIF
in which Xv, Yv are the
predictor and response variables in the validation dataset. This validation error is plotted in the bottom of Figure 4.11. Here we see that
we should use four principal components in the model, in agreement
with the SVDanalysis of X. Using the unreliable smallest singular value
in the regression causes a large error when trying to predict response
data that have not been used in the fitting process.
'
regression
0.4
423
1.2
pcR -0.4
0.8
-0.8
-1.2
-1.2
-0.8
00
00 0
-0.4
0.4
0.8
1.2
PLSR -0.4
-0.8
1.2
0.8
9000
0.4
00
-1.2
-1.2 -0.8
PLSR
00
0.4
0.8
1.2
Figure 4.13: Predicted versus

measured
dataset.
Top: PCR using outputs for the

validation
four
Bottom: PLSR using
two latent principalcomponents.
variables.
Left: first out-
Next we implement the PLS regression

algorithm as described
above
for C = 1, 2, 3, 4, 5 latent variables. The
validation error is shownin
Figure4.12 along with the validation error of PCR.
Noticethat onlytwo
variables are required
latent
to obtain the sameerror as fourprincipal
components. This reduction in model order is the primarybenefitof
the PLSR approach.
By evaluating
the SVD of XTY instead of only X, we
obtainthe latent variables that can explain the responsesY,notjust
the variables with independent
information in X, whichis whatPCR
provides.
Next,in Figure 4.13 we present the predicted responsesversusthe

prediction
measured responses for the validation dataset. A perfectthesedata
Notethat
wouldbe a straight line with a slope of 45 degrees.
displaysthe predictive
plot
this
so
process,
werenot used in the fitting
withtwolatent
model
PLS
the
that
Capabilityof the model. We see
as the PCRmodel
capability
Variableshas roughly the same predictive

Estim
424
Qti0h
0.8
008880
0
00
PCR
-0.4
00
-0.8
-1.2 -0.8 -0.4
0 0 00 0
0.4
000 0
0.4
00
0.8
0.8
1.2
00 0
PLSR
00
000 0
-0.8
-1.2 -0.8 -0.4
0.4
00
0 08 000
0 &00
0.4
0.8
PLSR
1.2
Figure 4.14: Effect of undermodeling. Top: PCR using three principal components. Bottom: PI-SR using one latent variable.
with four principal components. Finally, in Figure 4.14 we makethe

same comparisonif we use only three principal components and one
latent variable. Notice that we obtain significantly worse predictionsof
the validationdataset, indicating that we have undermodeled the data
by choosingtoo few variables for the regression.
Bynow there is an extensive literature including many books and

research monographs on model regression with PCR and PLSR.Many
researchers have documented the usefulness and robustness of these
techniquesto identify linear empirical models in numerous applications. The understanding of PCR is reasonably complete, since it is
based on the SVDof the single matrix, X. By contrast, the understanding of PLSRis not as complete. PLS was introduced by H. Woldin the
1960s in the field of econometrics (Wold, 1966). The use of PLSin
the fields of analytical chemistry and chemometrics was pioneeredby
KowalS. Wold, Martens, and Kowalski.
The tutorial by Geladi and
(2001)
ski (1986)and historical reviews
by S. Wold (2001) and Martens
Appendix----Proof of
the
central
unmarize the approach and
eoreth
whichmake it easy for the user

to try
out these
As demonstrated in the
approaches
example,
ratherthan X is useful for finding
and
starting
With
the
that have the most
the
Predictive smallestnumber
yet provided a complete
capability.
oflatent
analysis of
But
we do not know, for
the method PLSR
research
example,
and
has
its
or
in
whether there
malestimator
what sense salient
might be,
methods.Adding to the complexity,
is an
as Yet PLSR
undiscovered,
severaldifferent
algorithms have been developed.
better
The
alternative
appearance
algorithmshas in turn generated
some confusion of manydifferent
clarifymatters, connections between
and controversy.
the
To
properties of
different algorithms have been
several
established. But
until some of the
properties of PLSR are uncovered,
optimality
research on the
PLSRapproach
likelycontinue. In any field, a
valuable technique
that alsodefies
explanationis a prime target for further
easy
research.
4.9 Appendix Proof of the Central Limit

Theorem
In this appendix we provide a complete proof of Theorem
4.16.Wefol-
lowthe basic approach outlined in the stimulatingpapersbyLeCam

(1986)and Pollard (1986). Moreover, in this version of the centrallimit
theoremwe not only establish convergence to the normaldistribution
as n -Y 00,but also develop an approach that leads to boundsvalid
forfiniten on the distance of the sum's distributionfromthenormal
distribution. This version of the central limit theoremand,moreimportantly,the techniques used to establish it are widerangingandworth
variables.As
knowingfor researchers making extensive use of random
that noneof
mean
we
which
by
Youwill see, the proof is elementary,
alreadyfamiliar
not
are
that
the steps require any advanced techniques
also that thismateNote
long.
to the reader. But the proof is rather
of anyother
understanding
rial can be skipped without affecting the
Probability,
40
426
the text.
section in
proof.we
start
two
by considering
sums of independent random
var(Xk) = var(Yk) = al. The zero mean

and
O
=
have nonzero mean
(Yk)
original
which(Xk) not restrictive. If the
variables Rk = Xk - IQ. Next
assumptionis the zero mean, shifted
considerinstead
follows
2)
define Rk as
Rk EXJ+EYJ,
so that
RI
+ Xnl
Notice from this definition
that Rk and Xk as well as Rk and Ykare also
shortly why the Rk variables

independentfor k = 1,2, . , n. We see
are useful.
the form we choose here
Wealso require an approximation theorem;
F.W. Scholz (2011).
is motivated by a nice, unpublished note of
and
expectations
Theorem4.24(Taylor'stheorem with bound on remainder). Letf be
a boundedfunction on R with three continuous, bounded derivatives.

Considerthesecond-orderTaylor series with remainder
this ineqU?
Establishing
Fsnand FTn,
toachievethisgoal.
Theremaindersatisfiesthe following bound for all h e R

sup Ir(x,
xeR
Kf min(h 2, lh13)
(4.73)
fixed
The term lh13is expected from the standard Taylor expansion, but
includingthe term 112gives a better bound

for large h, which we shall
find useful subsequently.
Exercise 4.54 discusses how to prove this
theorem, which is not difficult.
Indicates
th
4.9 Appendix
Proof of the
central
so we assume that f has

three
Limit
Theorem
continuous,
bounded
derivatives
f(Xk + Rk) = f (Rk)

+ Xkf' (Rk) + X:
2
f" (Rk) +
Y(Rk,Xk)
performing a similar expansion
for f (Yk
+ Rk),
taking
expectations,
+ Rk)
(112)
f(Yk + Rk)) =
((Xk)
(Rk)) +
wherewe have used the fact that
(AB)
dependent random variables. Noting =
that the first
+ Rk) - f(Yk + Rk))
I
for A and B
two terms incancel,
+ g(Yk))
where we used the fact4 that
and
(4.74)
to compress the
g (X) = min(X2 ,
and defined
notation.
Next
comesthe reason
for introducing the Rk variables. Notice
that
f(Rk + Xk) and f (Rk + Yk)leaves only two differencingthe sum of
terms
n
f(Rk +
-f(Rk + Yk) = f(Rn +Xn) -f(R1

+YI) = f(Sn) -fan)
Taking expectations and then absolute values and using

(4.74)then
gives
-
+g(Yk))
(4.75)
Establishingthis inequality is the first major step.
But we wish to bound the distance between the two cumulativedistributions Fsn and FTn, so we next choose an appropriate functionf G)
to achieve this goal. Consider the step function fl(w;x) depictedin
Figure4.15, in which w is the argument to the functionand x is considered a fixed parameter. Using fl (w; x) we have immediately
= Fsn(x)
fl(Sn)
The function fl ( ) is known as an indicator function,because
So this is
indicates when the random variable Sn satisfies Sn x.
4Sincef (x)
density px(x)
If (X) I for all x, multiply by the
and integrate.

Probability,
428
Figure 4.15: The
indicator (step) function
and its Smooth
f(w;x). A piecewise fifth-order polyapproximation,

derivatives up to third order;
nomialgives continuous
see Exercise4.57 for details.
the kindof functionwe seek, but, of course, fl does not have even
a boundedfirst derivative,let alone three bounded derivatives as re-
quiredin our development.So we first smooth out this function as
depictedin Figure4.15. Exercise 4.57 gives an example of a piecewise

polynomialfunctionf with the required smoothness. Moreover,
there
existsan Lo > 0 such that Kf = 20L-3 is a valid upper bound
in (4.73)
for everyL satisfying 0 < L Lo; see (4.97). We will require
this bound
shortly.
Computing ff(Sn) gives
Psn(w)f(w;x)dw =
psn (w)dw +
Psn
and, subtracting the
Fsn(x)-FTn(x) =
Psn (w )f (w; x)dw
x)dw
analogous expression
for Tn and rearranging
-fan))-f(Tn)) +
f (Tn)) +
bnL
(Psn(w)
gives
(w))f(w;x)dw
(w)f(w;x)dw
ndix ----Proof of the Central Limit

Theorem
maxw PTn(tv.
429
choose
control over the constant bn.

(4.73) then gives the followingboundabsolutevalues
\ FOX) --
Kf
this
L small,
side
+ bnL
is the second
step. Notethat
make K f large, so making the sum on the

right-hand
require a judicious choice of L.
choose the Yk to be N (O,
, and
scaled sum Tnl sn =
mean and unit variancefor n,

a stan8k I
distribution tunction
This gives
denoted Z
al,
norm
ar
for (4.76) the value bit = maxwpz(w = \ I
variable Zn = Snlsn =
n for this choice of Yk.
independent 01
also
scaled Xk, and also has zero mean
is a sum of
Applying (4.76) to these variablesgives
n.
sariance for
the interval ot integration
side, we
right-hand
evaluatethe
discussedbetore
as
g (Xkl
\Xkl
>
sn\
the sum
gives
enoughn itis
smaller
430
of n. So as n
00we have that

n
26
normally distributed Ykvariables satisfy

Wealso can show that the
the
4.56 for the
Exercise
See
do.
Xk
the
if
steps. So
Lindebergconditions
00
we have that as n
n
46
+ g(Yk/sn))
(4.78)
Nextwe choose L, and therefore Kf, as follows

(46) 1/4
+ g(Yk/sn))
L= (
To use the bound in (4.97),we require L Lo. Therefore setting 0

Ld/4 > 0, we have from the previous inequality that for every < 0,
Kf = 20L-3
(20
3/4
Substituting these values for Kf and L into (4.77) and using (4.78)gives
sup IFzn(x)
x
ce l / 4
with c (1/2)5-3/4 + 1/ F
0.71. Since this bound holds for all
0,we have established that
lim00 sup IFzn(x)
=0
and the proof is complete.
4.10 Exercises
Exercise4.1: Consequences of the axioms of probability
(a) If B g A, show that Pr(A \ B) = Pr(A)
Pr(B).
(b) From the definition, the events
and B are independent ifPr(AnB) = Pr(A) Pr(B).
If and B are independent, show
that A and 'B are independent.
4.10 Exercises
Exercise4.2: Statistical independence
random variables
showthat two
and
431
condition
in
are statistically densities
independentif
and onlyif
all x, y
(4.79)
Exercise4.3: Statistical indpendence of
functions
considerstatistically independent random variables, of random variables
e [Rmand
randomvariable e RP and 13e RQas =
e
Vt. Define
statisticallyindependent for all functions f(.)
and O.g(n). Showthat and
are
statistical independence of random
Summarizing
variables
(E,
n) implies
dependence of random variables
for all f(.) statisticalin-
Notethat f G) and g() are not required to be invertible.
andg(.).
Exercise4.4: Trace of a matrix function
Derivethe following formula for differentiating the trace

of a
dtr(f(A))
g(A T )
functionof a square
g(x) =
dx
in whichg is the usual scalar derivative of the scalar functionf.
(4.80)
Exercise4.5: Derivatives of determinants

ForA e
nonsingular, derive the following formulas

d det A
d IndetA
= (A-I ) TdetA
= (A-I)T
Exercise4.6: Transposing the maximum-likelihood problem statement
consider again the estimation problem for the model givenin (4.56),but this time
expressit in transposed form
YiT = xT + ei
(4.81)
(a) Derive the maximum-likelihood estimate for this case. Showall steps in the
derivation. Arrange the data in matrices

Yl
be expressed as
and show the maximum-likelihood estimate can
T)-IT
formulathat is analogousto
estimate
an
gives
way
this
Expressing the model
what other problem?
and givethe analogous
estimate
the
for
(b) Find the resulting probability density
result corresponding to (4.59).
and why?
(c) Whichform of the model do you prefer
Random Variables,
Probability,
432
and
Estimation
covariance
mean, and
and n. Calculate the joint density
marginal,
variables,
Joint,
random
the means,
4.7:
Exercise
discrete-valued
densities,
and covari'_
two
marginal
each die.
we consider
both
the values on
(x, y),
are
and and
dice,
two
and is the sum of the two values
die
throw
one
on
(a) we
is the value
dice,
throwtwo
inverse function
(b)cwe
the
of
density
the random variable be definedby
Probability
let
and
R
Exercise4.8: random variable e
Considera scalar
function
the inverse
O < a < 1, what is the density of n?
11
uniformly on [a,
distributed
your answer.
(a) If is
allow a = 0? Explain
we
if
well defined
(b) Is n's density
operator
as a linear
Expectation
defined as a linear combination of the
Exercise4.9:
variable x to be
the random
(a) Consider
variables a and b
random
Show
(b)
independent for this statement to be true?

statistically
to
need
Doa and b
x to be defined as a scalar multiple of the
variable
random
the
Nextconsider
random variable a
in which is a scalar. Show
(x) = cx(a)
by the linear combination

(c) Whatcan you concludeabout (x) if x is given
i
in which Vi are random variables and
are scalars.
Exercise4.10:Calculatingmean and variance from data
Weare samplinga real-valued,scalar random variable x(k) e R at time k. Assume

the randomvariablecomes from a distribution with mean and variance P, and the
samples at different times are statistically independent.
A colleaguehas suggested the following formulas for estimating the meanand
variance from N samples
4.10
Exercises
(a) prove
433
the estimate of the mean is unbiased for all N, i.e.,

show
(b) prove the
= R,
all N
estimate Of the variance is not unbiased for

any N, i.e., show
'E(PN) * p,
any N
result above, provide an improved formula

for the variance
(c) Using the
all
that is unbiased for N. How large does N have to be before these two estimate
1%?
estimates
of P are within
The sum of throwing two dice

Exercise4.11:
the
probability density for the

what is
Using(4.23), want to place your bet? How oftensum of throwing two dice? On what
do you expect to win if you
numberdo you
bet on
outcome?
this
Makethe standard assumptions: the probability density for each die is uniform
values from one to six, and the outcome of each die is
independent of
overthe integer
die.
other
the
product of throwing two dice

Exercise4.12: The
the probability density for the product of throwing two

dice?
Using(4.23),what is
want to place your bet? How often do you expect to win On
if you
whatnumber do you
outcome?
this
bet on
Makethe standard assumptions: the probability density for each die is uniform
overthe integer values from one to six, and the outcome of each die is independent of
the other die.
Exercise4.13: Expected sum of squares
Givenrandom variable x has mean m and covarianceP, show that the expectedsum
of squares is given by the formula (Selby, 1973, p.138)
(XTQx) = mTQm + tr(QP)
Recallthat the trace of a square matrix A, written tr(A) , is definedto be the sum of the
diagonalelements
tr(A) = E Aii
i
Exercise4.14: Normal distribution

Givena normal distribution with scalar parameters m and
1
exp
21T2
Bydirect calculation, show that
var) = 0 2
(4.82)
434
Estimation
R = arg max
in whicharg returns the solution to the optimization problem.
Exercise4.15: The size of an ellipse's bounding box

Here we derive the size of the bounding box depicted in Figure 4.3. Consider
positivedefinite,symmetricmatrix A e Rnxn and a real vector x e Rn. The a real
for which the scalar xTAx is constant are Il-dimensional ellipsoids. Find the set of x
length of
x TAX = b
Hint: consider the equivalent optimization problem to minimize the

value of xTAx
such that the ith component of x is given by = c. This problem
defines
the ellipsoid
that is tangent to the plane Xi = c, and can be used to answer the
original question.
Exercise4.16: Conditionaldensities are positive definite
Weshowed in Example 4.19 that if
and
are jointly normally distributed

as
17
mx
PX
Pxy
my ' Pyx P
then the conditionaldensity of
in which the conditional
mean is
and the conditional
given is also normal
N(mxly, Pxly)
mxly = mx +
my)
covarianceis
Pxly = PX PxyPy-1Pyx
Giventhe joint density

is well defined, prove
densities are also well
the
defined, i.e., given P marginal densities and the conditional
> O, prove PX >
O, Py > O, Pxly > O,
Exercise4.17: Transform
of
Showthe Fourier
the multivariate
transform of the
normal density
multivariate normal
density given in (4.12)is
(P(u) = exp iu Tm
-- uTPu
4.10
Exercises
435
difference of two exponentially

Exercise4.18: The
distributed
randomvari-
ables
random variables T 1 and T2 are statistically independent and
identically distributed
exponential density
withthe
pefinethe
new random variable y to be the difference
calculate y's probability density py.

Wewishto
a new random variable z = T2 and
define the transformation
(a) First introduce
to
from (Tl, T2) (y, z). Find the inverse transformation from (y, z) to
what is the determinant of the Jacobian of the inversetransformation?
density pT1,T2
(tl, t2)? Sketchthe region in (y,
(b) What is the joint
z) that correof
region
nonzero
probability of
the
sponds to
the joint density
(c) Apply the formula
(d) Integrate
in
given in (4.23) to obtain the transformed joint
T2.
densitypy,z.
over z in this joint density to obtain py.
samples of Tl and T2, calculate y, and plot y'shistogram. Does

(e) Generate 1000
Explainwhy
your histogram of the y samples agree with your result from
not.
why
or
Exercise4.19: Surface area and volume of a sphere in n dimensions

In three-dimensional
by
space, n
3, the surface area and volume of the sphere are given
S3(r) =
4TTY 2
V3(r) = 413Ttr 3
formulas for n = 2, in which case "surface area" is the

you are also familiar with the
circumferenceof the circle and "volume" is the area of the circle
V2(r) = Ttr2
S2(r) = 2Ttr
such that
If we definesn and vn as the constants
Vn(r) = vnrn
Sn(r) = Snrn-l
we have
S2 = 2Tt
S3 = 4TT
= TT
= 413TT
n-dimensiona\case. Compute
Weseekthe generalizationof these results to the
formulasfor sn and vn and show
Vn
Ttn12
pot
10
Probability, Random
Variables, and
Estimation
436
of an ellipsoid in n dimensions
and volume
area
surface
sphere in n dimensions can be
volume of a
extended
Exercise4.20:
and
area
ellipse (ellipsoid,
hyperellipsoid) in
for surface
volume of an
is
defined
ellipse
an
The results surface area and
by the
The surface of
equation
the
to obtain Let x be an It-vector.
dimensions.
and R2 is the
square of the
positive definite matrix
symmetric,
denoted
be
a
R
by
is
size
the
of
set
vtxn
ellipse
e
interior of the
in whichA
the
Let
ellipse"radius."
= {x I x TAx R2}
which is defined by the following
of the ellipse,
volume
the
to compute
we
dx
=
vf(R)
The surface area,
have the following relationship with the volume

(R), is defined to
av(r)
e
= sn(r)
v(R) = s e (r)dr
such that
for stet and
(a) Deriveformulas
e n- 1
e
= snR
sn(R)
(detIS1
r 12
rxr,
for the ellipse.
(b) Showthat your result
subsumes the formula for the volume of the 3-dimensional
ellipse given by
=1
V = rabc
Exercise4.21:Definiteintegrals of the multivariate normal and X2
4.25: Use
Exercise
over an elliptical region

(a) Derivethe followingn-dimensional integral
y(n/2, b)
-xTAxdx = TTt1/2
b= {x I x TAx b}
(detA) 1/ 2 rot/2)
the follo
Establish
(b) Let be distributed as a multivariate normal with mean m and covarianceP

and let denote the total probability that takes on a value x inside the ellipse
(x - m) TP -1 (x - m) b. Use the integral in the previous part to show
r(n/2)
(4.84)
(c) The X2(n, a) function is defined to invert this relationship and give the size of
the ellipsethat contains total probability
X2(n, a) = b
(4.85)
Plot (n/2,x/2)
rot/ ) and x2(n,x) versus x for various n (try n = 1,4), and display
the inverse relationship given by (4.84) and (4.85).
Exercise
4.26: Le
Considera model
inwhichy e
RP
the
measurement
estimate
of giv
assume
is dis
4.10 Exercises
Exercise4.22: Normal distributions under
linear
RPI,obtained by the linear transformation
437
transformations
, consider
the random
variable
which A is a nonsingular matrix. Using the

if
result on
sities, show that
then
N (Am, transforming
linear transformations
APAT). This probabilitydenthat (invertible)
of (nonsingular)
result
normal random
establishes
variablesare
4.23: Normal
Exercise
with singular
covariance
consider the random variable e

and an
matrixPXwith rank r < n. Starting with the arbitrary positive semidefinite
definition of a
singularnormal,covariance
Definition
)1/2 exp [ - (x -
in whichmatrices A e [Rrxr and orthonormal Q e [Rnxn

are
decomposition of PX
value
PX= QAQT = [QI
(22]
- mx)]
mx))
obtained from the

eigen-
QT
2
andAl > O e Rrxr, QI e Rnxr, (22 e Rnx(n-r). on

what set of x is the
density
Exercise4.24: Linear transformation and singular normals
proveTheorem 4.12, which generalizes the result of Exercise

4.22 to
lineartransformation of a normal is normal. And for this statementestablish that any
to hold,we must
expandthe meaning of normal to include the singular case.
Exercise4.25: Useful identities in least-squares estimation
Establishthe following two useful results using the matrix inversionformula
(A-I + c TB l c)
(A -I + C TB A C)
-1
-1
= A-AC T B+CAC T) CA
-1
c TB -1 = AC T (B + CAC T)
(4.86)
Exercise4.26: Least-squares parameter estimation and Bayesianestimation

Considera model linear in the parameters
(4.87)
in whichy e RP is a vector of measurements, O e [Rmis a vector of parameters,
X e Rpxm is a matrix of known constants, and e e RP is a random variablemodeling

themeasurement error. The standard parameter estimation problem is to find the best
e, which
estimateof O given the measurements y corrupted with measurement error
weassume is distributed as
Probability,
438
errors are independently and iden_

measurement
the
case in which
(T ,
the
variance
Consider
With
(a)
distributed
solution are
tically
and
T
(x Tx) I x y
2
squares problem
X011
min lly --
to be sampled
from (4.87) with true parameter value
measurements
formula,
Considerthe using the least-squares
00. Showthat
tributedas
the parameter estimate is dis_
(x Tx)
estimation problem. As_

(4.87) and a Bayesian
of
model
again the
random variable O
(b) Nowconsiderdistribution for the
sume a prior
measurement y, show this density
density of O given
conditional
covariance
Computethe
find its mean and
and
normal,
a
is
(Oly)
give the same result
and least-squares estimation
estimation
Bayesian
other words, if the covariance of the
Showthat
noninformative prior. In
a
of
limit
measurement error, show
in the
the covariance of the
to
compared
large
prior is
T
P
m (x Tx) -1x y
is solved for the general

least-squaresminimization problem
(c) What(weighted) covariance
measurement error
formula for this case.

Derivethe least-squares estimate
sampled from (4.87)with true param(d) Againconsiderthe measurements to be

formula gives parameter
eter value 00. Showthat the weighted least-squares
estimates that are distributed as
N (00, P)
and find Pfor this case.

(e) Showagain that Bayesian estimation and least-squares estimation give the same
result in the limit of a noninformative prior.
Exercise4.27:Least-squaresand minimum-variance estimation

Consideragain the model linear in the parameters and the least-squares estimator from
Exercise 4.26
x TR -1 x)-
X T R -1 y
Showthat the covarianceof the least-squares

estimator is the smallest covarianceOf
all linear, unbiased estimators.
4.10
Exercises
439
Two stages are not

better than
Exercise4.28:
one
often decompose an estimation
can
problem into
we wish to estimate x from
which
stages.
in
case
measurements
consider the
an
intermediate
variable, Y,
x and
of
z,
following
and the
but we
model
have the
betweenY
and z model
cov(el)
QI
z = By + e2
cov(Q) =
(22
the optimal least-squares
(a) write
problem to
surements and the second model. Given
solvefor
y, write
giventhe z
in
terms
R
downthe
of S'. Combine
problem for
meaoptimal
these
two
least-squares
results
resulting estimate ofR given measurements
of z. call this togetherand writethe
the two-stage
estimate
(b) combine the two models together into a single
model and show
the relationship
z = BAX+ e3
cov(e3) = (23
Express (23 in terms of (21, (22 and the
models A,B. What
is the optimal
squares estimate of R given measurements of
leastz and the one-stage
model?call
(c) Are the one-stage and two-stage estimates of x the
same? If yes, prove
no, provide a counterexample. Do you have to
it. If
make any
assumptions about the
Exercise4.29: Let's make a deal!

consider the following contest of the American television game show
of the 1960s,Let's
Makea Deal. In the show's grand finale, a contestant is presented
with three doors.
Behindone of the doors is a valuable prize
such as an all-expenses-paid
vacationto
Hawaiior a new car. Behind the other two doors are goats and donkeys.
The contestant
selectsa door, say door number one. The game show host, MontyHall,then says,
"BeforeI show you what is behind your door, let's revealwhatis behinddoor
number three!" Monty always chooses a door that has one of the boobyprizes behindit.
As the goat or donkey is revealed, the audience howls with laughter. Then Montyasks
innocently,
"Before I show you what is behind your door, I will allow you one chance to change
yourmind. Do you want to change doors?" Whilethe contestantconsidersthis option,

theaudiencestarts screaming out things like,
"Staywith your door! No, switch, switch!" Finally the contestant choosesagain,
andthen Monty shows them what is behind their chosen door.
Let's analyze this contest to see how to maximizethe chanceof winning.Define
to be the probability that you chose door i, the prize is behind doorj and Montyshowed
youdoor y (named after the data!) after your initial guess. Then you wouldwantto
max
foryour optimal choice after Monty shows you a door.
(4.88)
Probability, Random
440
probability that the prize is behind
and give the
density
this conditional
door j * i.
(a) Calculate original choice, and
door i, your
behavior. Please state the one that is
Monty's
of
model
to specify a
(b) Youneed
Make a Deal.
appropriate to Let's
behavior is the answer that it does not matter
Monty's
model of
model for the game show?
poor
a
(c) For what other
this
is
why
if you switchdoors.
and conditional density

transformation
variable y, and
Exercise4.30:A nonlinear
relationship between the
following
Considerthe
textbook
The author of a famous
random
x and u,
wants us to believe that
pylx(YIX) = pw(Y - f(X))

additional assumptions on the random variablesx
what
state
and
Derivethis result
result to be correct.
and w are required for this
Exercise4.31: Least squares and
confidence intervals
dependence of the reaction rate is the Arrhenius

A commonmodel for the temperature (rate constant, k) is given by
model. In this model the reaction rate
(4.89)
k = ko exp(E/T)
factor and E is the activation energy,
in which the parameters ko is the preexponential
in Kelvin. We wish to estimate ko
scaledby the gas constant, and T is the temperature
constant), k, at different temperand E from measurements of the reaction rate (rate
atures, T. In order to use linear least squares we first take the logarithm of (4.89)to
obtain
In(k) = In(ko) -E/T

Assumeyou have made measurements of the rate constant at 10 temperatures evenly
distributed between 300 and 500 K.Model the measurement process as the true value
plus measurementerror e, whichis distributed normally with zero mean and 0.001
variance
e NO, 0.001)
In(k) = In(ko) -E/T + e
Choosetrue values of the parameters to be
In(ko) = 1
E = 100
(a) Generatea set of experimentaldata for this problem. Estimate the parameters
from these data using least squares. Plot the data and the model fit using both
(T, k) and (1/ T, Ink) as the (x, y) axes.
(b) Calculatethe 95%confidenceintervals for your parameter estimates. Whatare

the coordinatesof the semimajor axes of the ellipse corresponding to the 95%
confidence interval?
(c) Whatare the coordinatesof the corners of the

box corresponding to the 95%
confidence interval?
(d) Plot your result by showingthe
parameter estimate, ellipse, and box. Are the
parameter estimates highly correlated?
Why or why not?
4, 10
Exercises
A
Exercise4.32: fourth moment of
the normal
established the following
Youhave
matrix
distribution
integral
result
involving
the second
m04me4ntl
(2Tr)/2
following
matrix result
(detP)l/2p
Establishthe
involving a
fourth
moment
xx Txx Texp xTp-1
X dx =
(27T)tt/2
00
Firstyou may
(detP)1/2
want to establish the
[2pp +
following result
for scalar
x
I x2
XPexp
2 T2 dx
0
p odd
2 51
p even
Exercise4.33: The X2and X densities
Let Xi, t = 1, 2, ... , n, be statistically independent,

normally distributed
zero mean and unit variance.
Consider the
random variable randomvariYto be the sum
(a) Find Y's probability density. This density is

known as the X2
degrees of freedom, and we say Y X;. Show
density
that the meanof this withn
densityis
(b) Repeat for the random variable

This density is known as the X density with n degreesof freedom,andwe
say
Exercise4.34: The t-distribution

Assumethat the random variables X and Y are statisticallyindependent,and X is
distributed as a normal with zero mean and unit varianceand Y is distributedas X2
withn degrees of freedom. Show that the density of randomvariablet definedas
is given by
pt(z;n) =
rC t +2
1) z
v/' r (t)
71+1
+1)
t-distribution(density)
its
This distribution is known as Student's t-distribution after
Student.
W.S. Gosset (Gosset, 1908), writing under the name
(4.90)
discoverer,the chemist

Random
Probability,
442
Exercise 4.35:
The F-distribution independently distributed as X2 with n and

are
Y
variables X and Define the random variable F as the ratio,
random
Given
respectively.
degrees of freedom,
Y/m
probability
Show that F's
density is
(zn)nmm
PF(z; n, m) =
tt+m
(zn + m)
Beta function
in which B is the complete
by
This density is known as
z 20 n,ml
(Abramowitz and Stegun, 1970, p. 258) defined
B(n,m) =
the F-distribution
Exercise 4.36: Relation between
(density).
t- and F-distributions
distributed as PF (z; 1, m) distribution with parameters

Giventhe random variable F is
transformation
n = 1 and m, consider the
distributed
Showthat the random variable T is
as a t-distribution with parameter m
= pt(z;m)
Exercise4.37:Independenceand conditional density
Consider two random variables A, B with joint density po(a, b), and well-defined
(alb). Show that A and B are statismarginals PA(a) and n (b) and conditional
ticallyindependent if and only if the conditional of A given B is independent of b
PAIB(alb) f (b)
Exercise4.38:Independentestimates of parameter and variance

(a) Show that O and 2 given in (4.48) and (4.49) are statistically independent.
(b) Are the random variables Oand y XOstatistically independent as well? Explain
why or why not.
Exercise4.39: Manysamples of the vector least-squares problem

We showed for the model
that the maximum-likelihoodestimate is given by (4.64)

(x TR -1 x) -l X TR -1 y
4.10
Exercises
result to solve the n-sample problem
Yp i
stack the
443
givenby
the
following
model
ep
sampleS in an enlarged
vector Y,
and define
the
Ytt
en
corresponding
et N(O,R)
covariance matrix R for

the new E
(a) What is the
measurement error
vector?
corresponding
the
is
formula
(b) what
for e in terms
(c) Whatis the probability density for this O?

(d) Does this
of for this
problem?
result agree with (4.65)? Discuss

why or why not.
Exercise4.40: Vector and matrix least-squares problems
old but good piece of software

A colleaguehas an
that solvesthe traditional
with
problem
constraints
on
vector
the
least-squares
parameters
e N(O,R)
in which y, O,e are vectors and A, R are matrices. If the constraints are not active,
the
produces the well-known solution
code
= (A T R -I A) -I ATR -1 y
(4.91)
Youwould like to use this code to solve your matrix modelproblem

in which Yi, Xi, ei are vectors,
is a matrix, i is the sample number, i = 1,
,n, and
you have n statistically independent samples. Your colleague suggests you stack your
problem into a vector and find the solution with the existing code. Soyou arrangeyour
measurementsas
E = tel
andyour model becomes the matrix equation

(4.92)
444
6
4
00
900
(Txy
-2
-4
-6
-8
1
00
Figure 4.16: Typical strain versus time data from a molecular dynamics simulation from data file rohit. dat on the website
www.che. wi sc. edutj braw/pri nci pl es.
Youlooked up the answer to your estimation problem when the constraints are not
active and find the formula
= yx T (XX
-1
(4.93)
You do not see how this answer can come from your colleague's code because the
answer in (4.91) obviously depends on R but your answer above clearly does not depend
on R. Let's get to the bottom of this apparent contradiction, and see if we can use Vector
least-squares codes to solve matrix least-squares problems.
(a) What vector equation do you obtain if you apply the vec operator to both sides
of the matrix model equation, (4.92)?
(b) What is the covariance of the vector vecE appearing in your answer above?
(c) Apply (4.91)to your result in (a) and obtain the estimate vec9.
(d) Apply the vec operator to the matrix solution, (4.93), and obtain another expression for vec.
(e) Compare your two results for vec. Are they identical or different? Explainany
differences. Does the parameter estimate depend on R? Explain why or why not.
Exercise 4.41: Estimating a material's storage and loss moduli from molecular simulation
Consider the following strain response mode15
xy(wt) = Gl sinwt + G2cos (0t
4.10
Exercises
445
12
10
8
6
0
2
o
o
o o
0.2
-0.2
0.4
0.6
0.8
Figure 4.17: Plot of y versus x from data

file errvbls.dat
website www.che.wi sc.
onthe
nci es.
in which xy is the strain, GI is the storage modulus,
and G2is the
positive scalars). we wish
to estimate GI
loss
and cr2from modulus(Gl
measurementsof
The strain "measurement" in this case actually comes
from
simulation. The simulation computes a noisy realizationof a moleculardynamics
for the given
material of interest. A representative simulation data set is
provided
in Figure4.16.
Thesedata are given in file rohit.dat on the website
www.che.wisc.eduhjbraw/
pri nci pl es so you can download them.
(a) Without knowing any details of the molecular dynamicssimulation,
suggesta
reasonable least-squares estimation procedure for Gl and G2.
and G2 are
Find the optimal estimates and 95% confidence intervals for your
estimationprocedure.
recommended
Plot your best-fit model as a smooth time function alongwith the data.
Are the confidence intervals approximate or exact in this case?Why?
(b) Examining the data shown in Figure 4.16, suggest an improvedestimationprocedure. What traditional least-squares assumption is violatedby these data?
How would you implement your improved procedure if you had accessto the
molecular dynamics simulation so you could generate as manyreplicate"measurements" as you would like at almost no cost.
Exercise4.42: Who has the error?
Youare fitting some n laboratory measurements to a linear model

i = 1,2,...,n
Yi = mxi + b + eyi
Probability,
variable
that the X
ill
high accuracy and the

is known with
told
havebeen error ey
youmeasurement
file errvbl
are given in
and
Figure 4.17
shown in braw/pri nci pl es.
are
as
Thedata wi sc.edu/-j
the model
che.
www.
assumptions,
these
(a) Given
and intercept
of the slope
estimate
find the best
variable has
confidence ellipse,
probability
s . dat on the website
and also the plus/minus bounds on
and the
estimates.
the parameter
these data.
of best fit to
line
the
data, and
(b) Plot the
you are told later that actually y is known
lab,
confusionin thevariablehas measurement error ex distributed as
x
(c) Dueto some
the
highaccuracyand
ex
in a transformed parameter vector (l)
so that it is linear
model
the
Transform
+ exi
= f(Yi, (PI, (2)

the transformed model?
Whatare f and ( for
write the model as
(d) Giventhese assumptions,
model. Add this line of best fit to the plot of

find the best estimate for thisfrom the previous model. Clearly label which
the data and the line of best fit
line corresponds to which model.
plus/minus bounds for 4).

(e) Compute the 95%confidence ellipse and
(f) Canyou tell from the estimates and the fitted lines which of these two proposed
modelsis more appropriate for these data? Discuss why or why not.
Exercise4.43:Independenceof transformed normals

Considern independent samples of a ar,
zero-mean normal random variable with
variance (' 2 arranged in a vector e =
enl
Con
so that
tneragain
Considerrandomvariablesx and y to be linear transformations of e, x = Ae and
an!
Etntio
Y = Be.
(a) Providenecessaryand sufficient conditions

y are independent.
for matrices A and B so that x and
(b) Giventhat the conditions on

A and B are satisfied, what
can you conclude about
x and y if e has variance 21n
but is not necessarily normally distributed.
and
ellipses
10 Exercises
ercise4.44:The multivariate
sume that the random variables

(a) show that the density of
-distribution
Xe
and
random
variable
ko are
statistically
defined
as
in
deendent
with m e RP a constant
is given
by
nn 02
Exercise4.45: Integrals of the

multivariate
t-distribution
Giventhe random variable t is distributed
as a multivariate and the F-statistic
t definedin
(), centered
4.44,
at
RP
showthat the value of b that gives

probability in the
multivariatet-distribution
is
b = pF(n,p,cx)
inwhichF(n, p, cx)is defined in (4.54).
Exercise4.46: Confidence interval for unknown

variance
Consider again O and 2 from (4.48) and (4.49) and
define the new random variable
Z
as the ratio
00
+ 00
nnp
in which 0 and (Y2 are statistically independent as shown in Exercise4.38.

(a) Show Z is distributed
as a multivariate t-distribution as defined in Exercise4.34.
are
(b) Showthat lines of constant probability of the multivariatet-distribution
ellipsesin O as in the normal distribution.
Probability, Random
448
using the multivariate t-distributionin

interval
tx-levelconfidence
and show that
an
distribution
Define
(c)
normal
TI Pn
place of the
xTx (- 00) n np
2
( 00)
in agreement
with (4.55).
distributed
two uniformly
random variables
Adding
random variables, X U[O,11 and
distributed
Exercise4.47:
uniformly
that the transformation from (X, Y) to Z
two independent,
X + Y. Note
Given
density for Z =
U[4, 5], find the transformation.
is not an invertible
variance normals
unit
two
of
Exercise4.48: Product scalar random variables distributed identically as N(o, 1).
independent
well defined for all z? If not, explain
Let X and Y be density for Z = xy. Is pz (z)
Find and plot the
why not.
integral in Fourier transforms of normals

useful
A
4.49:
Exercise
transform of the normal density
in taking the Fourier
integral
Derivethe definite
used
-a2x2 cos bxdx
on (00,00). We wish to
exponential version of the integral
the
first
consider
Hint:
show that
eibxdX
a2x2
(4.95)
e
which gives the integral of interest
as well as a second result
a2x2sinbxdx = 0
a*0
argument of the exponential and show that

Toproceed,completethe square on the
a2x 2 + ibx = a2 ( (x
Thenperform the integral by noticing that integrating the normal distribution gives
= Q7r
even when m' = im is complex valued instead of real valued. This last statement can
be establishedby a simple contour integration in the complex plane and noting that
the exponentialfunction is an entire function, i.e., has no singularities in the complex
plane.
Exercise4.50:Orthogonaltransformation of normal samples

LetvectorsXl,X2,...,Xn e RP be n independent samples of a normallydistributed
random variable with possibly different means but identical variance, Xi

Considerthe transformation
n
Yi =
Cijxj
N(mi, R).
4 10
Exercises
matrix C is orthogonal.
are independently
bich
the
that
d deduce the
Let
449
distributed
as Yi -v
relationship between X, Y, and

C given
in the
in which
problem
statement.
4.51: Estimated variance and the Wishart
, en e RP be n independent
distribution
el , '2,
vectors
randomvariable
samples of a
zero mean and identical
normally
variance,
distributed
N(o, R). Define
the matrix
eieT
distributionfor random matrix S is known as the
Wishartdistribution,
is known as the number of degrees of

andintegern components (R is a p
x p matrix) is freedom. Sometimesthe fact
p
have
the
also indicated
that
using the notation
consider the estimation problem of Section 4.7.4 written
in the form
(a) Showthat the
EET Wp(R,n).
(b) Define E = Y OX and show that nR = T
(c) Showthat EET W(R, n q), and therefore that n

Hint:take the SVDof the q x n matrix X for q < n
q).
DefineZ = EV, which can be partitioned as Zl Z21= E VI V21,and show

that EET = Z2Z2 . Work out the distribution of Z2Z2T
from the definitionof the
and
the
distribution
result
of Exercise4.50.
Wishart
Exercise4.52: Singular normal distribution as a delta sequence

Twogeneralized functions f ( ) and g ( ) are defined to be equal (in the sense of distri-
butions)if they produce the same integral for all test functions
Thespaceof test functions is defined to be the set of all smooth(nongeneralized)

for somec > 0.
functionsthat vanish outside of a compact set C =
Showthat the zero mean normal density n(x, U)
1
is equalto the delta function (5(x) in the limit
(x/)2
-+0.
Probability, Random
450
of the exponential
the Taylor series
for
bound
limit theorem for sums
Exercise4.53: Error used in establishing the central
Of
(4.31)
Derivethe bound
random variables
identically distributed
I-MIn + 1
at x = O and take magnitudes

with remainder term
series
Taylor
turns out to be an equality.
a
the inequality above
Hint: expand eix in
function,
particular
Note that for this
remainder term in Taylor series

the
for
bound
Exercise4.54: Error
Taylor series of a bounded function f having
for a second-order
Derive the bound (4.73)

derivatives
three continuous, bounded
f(x)
r(x,h) = f (x + h)
Kf min(h 2, lh13 )
sup Ir(x,
xeR
is valid for any b > 0
Showthat the followingKf
Kf = max
(4.96)
(i
with M/ ) = SUPxeRf ) (X).
to second order using the standard Taylor

Hints: first expand f (x + h) about f (x) bound. For the second-order bound,
first
lh13
theorem with remainder. This gives the
triangle
the
inequality.
use
and
Choose
h)
r(x,
of
definition
take absolute values of the
and lhl > b. Develop second-order
a constant b > 0 and consider two cases: lhl b
a second-order bound for all
obtain
to
them
bounds for both cases and then combine
by taking the smaller.
bounds
h. Finally, combine the second-order and third-order
Exercise 4.55: Lindeberg conditions

Show that the following are special cases of the Lindeberg conditions given in Assumption 4.15.
(a) The de Moivre-Laplacecentral limit theorem assumption that the Xi are independent and identically distributed with mean zero and variance 0 2.
(b) The Lyapunovcentral limit theorem assumption that there exists > 0 such
that as n 00
1
E f (I X k1
2+
Note that the Lyapunov assumption implies only part (b) of Assumption 4.15.
(c) The bounded random variable assumption, i.e., there exists B > 0 such that
IXil <
Therefore,by proving Theorem 4.16, we have also proved the de Moivre-Laplace

and the Lyapunovversions of the central limit theorem. We have also shown that the
central limit theorem holds for bounded random variables, provided that sn 00
4.10
Exercises
451
1
0.5
0.5
1
Figure 4.18: Smooth approximation
1.5
2
to a unit
stepfunction,
H(z-l).
Exercise4.56: Normal random variables

satisfy Lindeberg
- 1, 2, ...,n be independent
with mean
conditions
zero
l, 2, , n be independent
and variance
normals with mean
op. Let
zero
and
satisfy the Lindeberg conditions listed
in Assumptionvariance
Hint: using the
variables, show that
4.15, then soShowthat if the
for
do the Yi.
n sufficiently
vQ for all i. This result shows that
large
no
and
a significant fraction of the sum's variance as single random variablecan any O,
accountfor
condition for the Yi variables, and use n becomes large. Next
evaluate
the fact that
the Lin-
(maxiq)
Exercise4.57: Smoothing a step (indicator)

function
weconstruct a suitably smooth indicator function

as shown in Figure
4.15.To
thepresentation, first consider the set up in Figure
4.18. Weseek a monotonesimplify
f(z) with three continuous derivatives that increases
function
from
zero
at
z
=
0
to
then
rescale the z-axis to make this
one
z = 2. We shall
function as sharp as weplease. at
(a) Divide the interval in half and consider a fifth-order

polynomialon z e [0,l].
p(z) = ao + al Z + a2Z 2 + a3Z 3 + a4Z4 + a5Z5
To have p (z) and its first three derivatives vanish at z = 0, werequire

ao = al =
= a3 = 0. We will reflect this functionaboutthe y = 1/2 andz = I lines
to provide the matching function q(z) on z e [1, 2], or, in equations,q(z) =
p(2z) + 1. Note that the symmetry implies p( i) (1) =
so that all
odd derivatives are automatically continuous at z = 1, and the evenderivatives
are negatives of each other at z = 1. So we require that the even derivativesat
z = 1 are zero. We therefore have two conditions, p(l) = 1/2 and
find the remaining two coefficients
p(l) = a4 + a4 = 1/2
p" (1) =
Solvethese equations and show that a4 = 5/4 and
+ 20a5= o
3/4
= 0,to
Random Variables,
Probability,
and Estimation
452
is therefore
f(z)
function
candidate
(b) The
(3/4)z5,
1K z < 2
(5/4)Z4
f(z)
and its first

function
Plotthis
uous at Z
given by
Mf
also
these values
check
and
and
three derivatives,
(2) = 20/9
check that they are

M(f3 = 15
on your plots.
(w -x)/L)) = f(u,).
andf(z) =
(1-z/2)L+x
of Figure 4.15. Showthat
rescale.Letu, has the required properties
(c)
f(w) now
Thefunction bounds are scaled by
the derivative
M CI) =
exists Lo > O such that the

scaling with L, there
this
of
that because
(d) show finally
givenby
boundin (4.96)is
(4.97)
every L satisfying 0 < L < Lo
for
20LK- =
polynomial, with a smaller third derivative, see
seventh-order
For an even smoother,
Thomasian (1969, p.486).
PI-SRalgorithm
Exercise4.58: Properties of
described in Section 4.8, show the following properties.
Giventhe PLSRalgorithm
(a) T=XR
(b) TTT=1q
F2 for given Y and T.

(c) Q minimizes Illy TQ T11
Exercise 4.59: Using PCR and PLSR

WriteyourownPCRand PLSRalgorithm and apply it to the data given in Example 4.23.
The dataare availablein file pca-pl s-data. dat on the website www.che. wi sc.edu/
Nj braw/pri nci pl es.
(a) Reproducethe results given in Example 4.23.
(b) Estimateparameter B using both PCR and PLSR

using the number of principal
components/latent variables C = 1, 2, 3, 4, 5. Compare
your estimates to the
valuethat wasused to generate the
data.
000
000
Bibliography
and l. A. Stegun. Handbook

An Introduction to Multivariate
Anderson.
T.W.
National
John
Wiley
essay towards solving a problem
An
in the
T.Bayes.
Trans. Roy. soc., 53:370-418, 1763. Reprinted doctrineof
chances.Phil.
in Biometrika,
35:293-315
and
G.E.p. Box
E.A. Cornish.
G. C. Tiao. Bayesian Inference

in Statistical
The multivariate
t-distribution
Analysis.
associated with
Addison-
a set of
normal
and M. Sobel. A bivariate

C. W. Dunnett
generalizationof
distributiOnwith tables for certain special cases.
student'stBiometrika,
41.153,
1954.
Probability: Theory and
R.Durrett.
Examples. Cambridge
UniversityPress,
W.Feller.tjber den zentralen Grenzwertsatz der

Warscheinlichkeitsrechnung.
p. Geladi and B. R. Kowalski. Partial least-squares regression:
A tutorial. Anal.
185:1-17,
china.Acta,
1986.
W.S.Gosset. The probable error of a mean. Biometrika,6:1-25,

1908.
M.H. Kaspar and W. H. Ray. Partial least squares modellingas
successive
value
singular
decompositions.
Comput. Chem. Eng., 17(10):985-989,

1993.
A. N. Kolmogorov. Foundations
of Probability. Chelsea Publishing Company,
NewYork, 1950. Translation of "Grundbegriffeder Wahrscheinlichkeitrechnung,Ergebnisse der Mathematik," 1933.

L.Le Cam. The central limit theorem around 1935. Statist.Sci.,1(1):78-96,
1986.
P.Lvy.Proprits asymptotiques des sommes de variables indpendanteson

enchaines. J. Math. Pures Appl., pages 347-402, 1935,
453
Bibliography
454
J. W.
Lindeberg.
Eine neue
Exponentialgesetzes in der
des
Herleitung
1922.
Math. Z,
15:211-225,
Multivariate
statistical
Skagerberg.
B.
and
and W. H. Ray, editors
J. Kresta,control. In Y. Arkun
Marlin,
T. F.
and
analysis
CACHE, 1991.
process
in
Control-CPCIV.
methods
J. F.
MacGregor,
world data: a personal

modeliing of real
relevant regression. Chemom. Intell. Lab. syst.,
and
Reliable
of PLS
development
H. Martens.
account of the
component
Chemical Process
and partial
Principal
package:
The pls
2007.
R. Wehrens. J. stat. Softw., 18:1-24,
and
B.-H.
regresion in R.
squares
least
and Stochastic Processes. McGrawVariables,
Probability, Random
Papoulis.
A.
edition, 1984.
second
Inc.,
1935. Statist. sci.,
Hill,
limit theorem around
central
The
on:
D. Pollard. Comment
1986.
Grenzwertsatz der Warscheinlichkeitsrechnung

zentralen
1920.
G.poly. tjber den
Math. Z, 8:171-180,
Momentproblem.
und das
modeling. Comput. Chem.
algorithms for adaptive data
PLS
Recursive
S.J. Qin.
1998.
Eng.,
Tables. CRC
S.M.Selby.CRCStandardMathematical
Press, twenty-first edition,
1973.
6:1-25, 1908.
Student. The probable error of a mean. Biometrika,
A.J. Thomasian. Thestructure of probability theory with applications. McGrawHill, 1969.
A. W.van der Vaart. Asymptotic Statistics. Cambridge University Press, 1998.
J. Wishart.Thegeneralisedproduct moment distribution in samples from a

normalmultivariatepopulation. Biometrika, 20A:32-52, 1928.
H.Wold.Estimationof principal components and related models by iterative
least squares. In P. R. Krishnaiah, editor, Multivariate Analysis, pages 391420. Academic Press, 1966.
S. Wold. Personalmemories of the

early PLS development. Chernom. Intell.
Lab. syst.,
58.83-84,2001.
stochastic Models and Processes
5.1 Introduction
now expert in using (deterministic)

Weare by equations as
models of chemical differential and partial
differential
and biological
capture equations of motion,
equations
systems.
These
conservation
of mass
and energy, and many Of the fundamental principles
useful in analyof chemically reacting systems.
sis and design
Chapters
2 and
mainlydevoted to developing this program. The motivationfor 3 were
stochasprocesses and differential equations is to
tic
incorporate
into the model

the random effects of the internal system (discretemolecules)and
the
externalenvironment on the system of interest. In someapplications
at fine length scales, the random effects are mainly due to the internal
random behavior of the molecules. But even in applicationsat large
scales,the random effects of the external environmentare often quite
important to understand and interpret the (noisy)measurementscomingfrom a system.
In this chapter, we illustrate the usefulness of random variablesand
randomprocesses in the modeling and analysis of systems of interest
to chemical and biological engineers. We find the basic probability and
statistics that we covered in Chapter 4 indispensable tools in carrying

out this program. We study three main examples: (i)the Wienerprocessas a model of diffusion in transport phenomena,(ii)the Poisson
process as a model of chemical reactions and kinetics at the small scale,
and (iii)the Kalman filter for reducing the effects of noise in process
measurements, a fundamental task in systems engineering.Bycovering
representative examples from transport phenomena,chemicalkinetics,
and systems engineering, we hope to both introducerandommodels
and processes, as well as demonstrate their wide range of applicability
in modern chemical and biological engineering.
455
Stochastic
456
5.2
Stochastic
ables
Processes for
Models
and
Processes
Continuous Random
Processes
Stochastic
Time
52.1 Discrete
chapter is an understanding of the struc_
the
of
this partcontinuous time stochastic processes: the stochas_
in
target
our
dymamicsof
differential equations. In building up to
ture and of deterministic
simpler stochastic
with the conceptually
start
tic analogs
to
example
instructive
these,it is equation. Consider the following
difference
c(k)
(5.1)
+ 1) = Ax(k) +
number in discrete time, is a random
sample
the
120is
in whichk e
some fixed and known probability density,
Gil'
to have
variable,assumed . .. are independent, identically distributed sam= 0, 1,2,
interval At, then t kAt. Because of
and
sampling
a
define
ples of E. If we
variable the variable x is also a random
random
the
of
call it a continuous
the influence
can take any value, so we
variable.In generalit
to the integer-valuedor discrete random
contrast
in
variable
random
Section 5.3.
variables we encounter in
statistical properties of the process x(k) due

Wewish to study the
Because the process is linear, an explicit
to the random disturbance
(5.2)
Thereis no difficultyexpressing the solution to the stochastic differenceequation;in fact we cannot determine by looking at the form of
lyits
probability den
the solution if E(k) is a random variable or simply a deterministic func-
tion of time. This is the perfect place to start because everythingis

welldefinedregardlessof whether or not is a random variable. We
buildsome simpleintuition with stochastic difference equations and
thenproceedto continuoustime systems. We shall also see that differenceequations arise whenever we wish to numerically approximate
the solutionto stochastic differential equations, so some facility with
the differenceequationsis highly useful.
TheINTEGRATED
WHITE-NOISE
process provides a starting point for
understanding
manyimportant aspects of stochastic processes. Considera systemwith scalarx,
=
A = 1, zero initial condition and
zy,fweletG
=
x(t)N
p(x,t)
for Continuous Random Variables

processes
were
wiS
to
tile
find
are independent and unit normals w
x(0) = 0
+ Gw(k)
(5.3)
+ 1) = x(k)
density of x(k) versus time for this
the probability
x(l)
p we sequence
w(k)
tile
N(O,1)
x(0) + Gw(0) = Gw(O), so x(l) N(0, G2). Since

x, we have for k = 2
is in dependent of
x(l) + Gw(l)
w(l)
that
Noting
J w(l)
the linear transformation of a normal we
Theorem 4.12 on
and using
have that
NCO, 2G 2 )
x(2)
this process gives

Continuing
x(k)
N(O, kG 2)
the variance of x (k) increases linearly with time and

andwehave that zero for the integrated white-noiseprocess. If we
themeanremains
chooseG = At, then x(k) N(O,kAt) and the system satisfies
its probability density p (x) satisfies

orequivalently
1
I x2
2TTt
Similarly,
if we let G = 2DAt where D is a constant, then
x(t)
N(O,2Dt)
or
p(x, t) =
2 TTDt
Thisis precisely
(3.70) from Chapter 3, which describes the transient
spreadby diffusion
of a delta-function initial condition. Thus we see
Stochastic
sign
first
the
already
of what
turns
the mean
a deep
be
to
out
square
then
variable,
position
(x2(t))
processes,
diffusion
For
Models and Processes
and important
by
displacement is given
square
the mean
displacement increases lin-
the case where the random

to
be extended
can write =
can
we
which
above
Mm, 1),
The analysis mean:
nonzero
term has
above. Now
as
w defined
x(k+
Gm + Gw(k)
+
x(k)
1) =
this becomes
Gm/At
v
Defining
vAt + Gw(k)
+
x(k)
=
x(k+ 1)
position, then the particle travels or

particle
interpret x as a
we
if
as well as diffusing. Letting
Again,
interval
time
vAt in one
"drifts" a distance
G=a
a velocity v
The particle drifts with
early sNithtime, while
so its mean position changes Iin-
also diffusing.
where is drawn from an arbitrary

case
the
to
return
Finally,we
A = 1 and x(0) = o, (5.2)
With
normal.
a
than
distributionrather
becomes
That is, the solution becomes a sum of independent identically distributed (Ill))random variables. In Section 4.5 we learned the remarkable fact that sums of IID random variables converge to a normal distribution. Thus as k -+ 00,x(k) becomes normally distributed even if the
noise that drives it is not. So, for example, if we can only observe the
processx(t) at time intervals that are infrequent compared to At, it
willbe Srtuallyimpossible to know whether the underlying noise was
Gaussianor nottheresulting process x(k) will be. This result is one
reasonwhy, in the absence of further
information, taking the noise in
a systemto be normallydistributed
is often a good approximation.
stochastic processes
for Continuous
RandomVariables
459
Process and Brownian Motion

5.2.2 Wiener
Wenowwish to define the continuous time version of the
discrete time
ffltegratedwhite noise or Brownianmotion just
presented. This proW(t) , is known as a Wiener
cess, denoted
process in honor of
ematicianNorbert Wiener. The property that we retain the mathin taking the
0 is that W(t) is normally
limitas At
distributedwith zero mean
linearly increasing variance or
and
Byanalogywith the results above, a diffusion process x(t)

with diffusivityD and x(0) 0 would simply be
x(t) = N/w(t)
(5.4)
Note that the linear increase in variance with time should

hold for any
starting time s, giving
s0, ts
(5.5)
The increment of the Wienerprocess is denoted

AW(t -s) = W(t) - IV(s)
Consideringdistinct time instants ti, with ti > ti-l, we defineAtt

ti-l and AW(ti)
W(ti) W(ti-l). Increments involving non-
overlapping time intervals are independent. The Wienerincrements

have a number of important properties that followfrom their definitions
(5.6)
(AW(ti)
= B 2AtiiJ
(5.7)
= 0 for n odd
(5.8)
ocAtim for integer m
(5.9)
In Theorem 4.12 we saw that the distribution of a sum of normally

distributed random variables is also normally distributed. A number of
important results for Wienerprocesses follow from this fact. A Wiener
process can be written as a sum of N Wienerincrements for any N
(5.10)
Stochastic Models and Processes

460
>
on ti is that ti ti-l. Accordingly
restriction
only
canbe written as a sum of Wiener
where tN = t and the
process
motion)
a diffusion (Brownian
by 21)
increments multiplied
- to) = E dAW(ti)
Furthermore,
(5.11)
processes WI, W2,W3

Wiener
for separate
2(DI +
(5.12)
diffusion processes is equivalent to a

two
of
sum
the
words,
In other
diffusivity is the sum of the first two.
different diffusion process whose
motion process x(t), we

To visualizea trajectoryof a Brownian
constant time intervals At.
can use (5.11),generatingpoints x(t) at
equivalent to evaluating the
Observing that now AW N(O,At), this is
discrete time process
x((k+ 1)At)=
with w(k)
2DAt w(k)
x(0) = 0
N(O,1) defined as above. Figure 5.1 shows a trajectory of
5
this process for sampletime At = 10-6 and diffusivityD = 5 x 10
Noticethat the roughnessis quite apparent in the top row of Figure
5.1. But by looking at finer time scales, we can see the effect of the
finite step size in the discrete time approximation. The continuous
time Wiener process defined in (5.5)maintains its roughness at all time

scales; Figure 5.2 shows how the path should appear between the samples if we chose the step size properly for this magnification. Unlike
more familiar functions, the Wiener process is very irregular. Thus it
is important to address its continuityand smoothness properties.

The Wiener process is continuous. A crude argument for this state-
ment is that IAWI at, which approacheszero as At -+ 0. A more

refined one is presented in Exercise 5.4. On the other hand, because of
the Atl/2 behavior of AW,we arrive at the perhaps surprising fact that
5.2 Stochastic processes for continuous
Random
Variables
461
400
-200
-800
-400
-1200
0.5
0.45
-220
0.55
-240
-240
-280
W(t)
-260
-320
-280
-360
0.495
0.5
0.505
0.4995
-220
0.5
0.5005
-228
-230
-232
-240
-250
0.49995
0.50000
0.50005
-236
0.499995
0.5
o.soooos
Figure 5.1: A simulation of the Wiener processwith fixed sample

time At = 10-6 and D = 5 x 105. The boxedregion
in each figure is expanded in the next plot to displaya
decreasing time scale of interest. The true Wienerprocess is rough at all time scales and thereforedW(t) Idt
does not exist. The top row shows an adequatesampling
rate to display the roughness of the Wienerprocess. The
middle row shows the time scale of intereststartingto
become too small for the given sampletime. The bottom row shows a time scale of interest muchtoo small
for the given sample time; one can see the samples and
the straight lines drawnbetweenthem.
Stochastic Modelsand
462
Processes
-228
-232
-236
0.499995
0.5
0.500005
Figure 5.2: Sampling faster on the last plot in Figure 5.1; the sample time is decreased to At = 10-9 and the roughness is
restored on this time scale. Thought question: how did
we generatea random walk that passes exactly through
the solid sample points taken from Figure 5.1? Hint: certainly not by trial and error! Such a process is called a
Brownian bridge (Bhattacharya and Waymire, 2009).
the Wienerprocess is not differentiable l

1
At
= f (IAwl)
At
1
At 27TAt
IXIexp
2At/TT
At
7Tv/
This diverges as At -1 / 2 as At -Y 0.
1The results of Exercise 5.8 were applied
in this derivation.
x2
2At
dx
5.2 Stochastic Processes for
Continuous
Random
variables
463
Now let us return for the
moment to
white-noise process, (5.3).
Considering a the discretetime
sampling
integrated
interval At
and let, We Can
rewrite this
Ax = BAW
Under other circumstances
we could
divide by At
and let it
NQdVv
dt
shrink to
dt
we have just found, however,
that
less, we can define a differential of dW/dt does not exist.
the Wiener
+ Ats) - W(t)
increment
process Neverthe-
dW(t) =
when At becomes
as the Wiener
the infinitesimal
dt
This is also known as the white-noise

process. It is not
Nowwe can write (5.13) in
differential form
continuous.
(5.14)
This is the most elementary STOCHASTIC
Withinitial condition x(0) = 0, its solution DIFFERENTIAL
EQUATION.
is (5.4).
5.23 Stochastic Differential Equations
Basic ideas
To motivate and introduce stochastic differential
equations,consider
first the deterministic differential equation
dx
dt
(5.15)
When we wish to augment this model to include some random effects,

one might try
dx
dt
in which n (t) is a random variable, often a normallydistributed,zero

mean random variable, as discussed in Chapter 4.
We have already run into problems with this formulation. Evento
model a "well-behaved" (e.g., continuous) stochastic process like diffusion, we have seen that the random term would have to take on the
form
dW
Stochastic Modelsand Processes

464
Extending what we did abovefor

dB'does not exist.
differentials instead of derivatives
consider
we thus
equation (SDE)in the form
Brownianmotion, stochastic differential
general
and vsTitea
dW
dt + B(x, t)
t)
A(x,
=
dx
integrate
Formally,we can
x(t) = x(0) + J
(5.16)
this to yield
t') dW(t')
t') dt' + J
(5.17)
would be as well if dw
second
dt existed
The
classical.
The first integral is
just write that
would
case
which
in
dW
B(x(t'), t') dW(t') =
dt'
'B(x(t'), t')
dt
and to understand it we need to understand

nontrivial
is
integral
This
of stochastic processes.
a little bit about the calculus
ElementaryStochastic Calculus
Stochasticintegrals of the form
to
are more complex than conventional integrals
because both G and dw
can vary stochastically(think of the case G(t) = W(t)). Nevertheless,

as with conventionalintegrals, we can divide the interval [to, t] into n
subintervalsto tl t2... tn-l t, and choose intermediate time
points Ti such that ti-l Ti ti. Now the integral S is approximated
by the sum
sn = E
- W(ti-l))
In normal calculusthis sum converges to the same value independent

of the choiceof the Ti; in stochastic calculus this is not the case. We
will choose Ti = ti-l, yielding the IT STOCHASTICINTEGRAL

2. Thus
(5.16)is an It stochastic differential equation.
20ther choices are used in various situationsfor example the STRATONOVICH

stochastic integral takes Ti = (ti-l + ti)/2. Stochastic calculus is complex and technical; Gardiner (1990)provides a detailed discussion that is accessible to the non-
mathematician.
for Continuous Random

Stochastic processes
Variables
5.2
465
stochastic integral corresponds to a stochastic

"rectangle
with the function value chosen at the left side of the subinter-
rule, one practical reason for this choiceis that is the onemost
applied in
numerical solutions of stochastic

differenThe EULER-MARUYAMA
scheme
generalizes
the
explicit
equations.
val.
straigY1tforWardly
to the stochastic case, using this rectanglerule approxiEulermethod

mation
+ St) = x(t) +
(t + At)
SW
where
+ At)
N (O,At). This is the standard method for finding
trajectories ofit is not highly accurate, but higher-order schemes for

sINULATION;
very complex to implement (K\oedenand Platen, 1992).
are
spEs
reason for working with the integra\is that,
more fundamental
to (3.17), it corresponds to a noise term that does not
applied
when
mean of x (t) , because its expected value is zero
the
change
to
G(t') dW(t')
taking the expected value of the discretesum and

by
seen
easily
are
W
the Ito integral, G(Ti)and (W(to

for
that
fact
using the
independent
Tilisis
makes
This
O.
=
because 'E (W(ti)matters. if were
the choice of Ti(ti-l)) would notbe independent and
and (W(ti) - W
necessarilybe zero.
of the form
integrals
By considering
and using the Ito

t
to
one can
expression for Sn
show
466
Stochastic Models
and
Processes
This result tells us how to treat higher differentials

involving
dt in general
and so on. If dWi and dWj are different white-noise

processes,e.g.
corresponding to different components of a vector of such
processes
dWidWj ijdt
Unlike in regular calculus, in working with differentials of W (5.20)

keep terms up to dw2. To understand why, simply recall that , onemust
We can use the above observations about stochastic differentials

derive the IT STOCHASTIC
CHAINRULE. Let F be a function of t to
and
W(t). Then
F 1 2F
dF(t, W(t)) =
dt +
dW(t)
t 2 W2
W
For example, if we let F x(t, W(t))

to) +
W
where and B are constants, then application of the chainrule(to)),
gives
us back the constant coefficientSDEdx 31 dt + 'BdW.
Now consider a function f(x(t)), where x(t) evolvesaccordingto
(5.16). The differential of f can be written
df(x(t))
+ dt) f(x(t)))
f'(x(t))dx(t) + f"(x(t))(dx(t)) 2
=
dt + B dW) + f"(x(t))(A dt + 'BdW)2
Noting that dt 2 = 0 and dW2 dt, we have ITO'sFORMULA
(Af' + A 2f") dt + Bf' dw
(5.21)
Example 5.1: Diffusion on a plane in Cartesian and polar coordinate
systems
We can write two-dimensional Brownian motion in Cartesian coordi-
nates as
dx = B dvvx
dy = B dWy
(5.22)
(5.23)
5.2 Stochastic Proces
ses for
where Wx and W
How would we write
As a brief prelude,
Continuous
Random
the same
observe
ariables
467
Process
in
that for
a Particle
starting
(2 (vv2)
and1B
at the
origin,
the'
4Dt
This result easily

extends to
Brownian
Returning to the
specific
motion in
any number
d of diquestion at
ordinate first and keep
in mind
hand, consider
that we
the
may need
to keep radialcoterms up to
2
1 r
ax +
ydy +
x
I 2
2 x2 dx2 + r
+ I 2r
b3dy2
Here all the partials can be
evaluatedfrom
the formulas
Now using the
r = x2 + Y2
SDEsand noting
B2dt, dxdy
0, we have that
that dx 2 = dy 2 =
dr = cos 0B dWx +
sin 0B
dWy + B2 dt
Now, using (5.12) we see that cose

dWx +
process with variance dt. We will denote sin 0B dWy is a diffusion
this process as dWr,
so
B2
+ B dwr
(5.24)
Consider a particle that starts at r = 0. Applying

It'sformula
with
f = r 2 and taking the expected valuewefindthat
= 2B2 dt
as we
Letting B2 = 21) we find that (r 2) = 4Dt in twodimensions,
should.
Nowwe turn to the equationfor O
+ x2
dx + dy
y
x
1 2e
dxdy
2 xy
1 2e
---dy2
+ y2

468
drift
cannot be any
termpositive and negative
there
Evaluating derivatives we find
symmetry
likely.
By
must be equally
changesin e
Using(5.12)again
de = (ydWx+ xdWy)
-ydWx + xdWy with r
replace
we can
(5.25)
from sampling
properties
Average
Example5.2:
property of the model rather
"average"
an
in
Often we are interested
stochastic equation. Consider again
the
of
realization
than a single
diffusion process on the plane, (5.22)the
of
model
walk
the random
and compute an estimate of the mean
(5.23). Simulatethe process
time.
square displacementversus
Solution
simulation with the discrete process

Weapproximatethis process for
(5.26)
X(k+l) = X(k) + VAt+ 2DAtP
whereX = (x, y)T, k is the sample number, At is the sample time, and
time is t = kAt. The velocityof the particles is V = (vx, vy)T and the
random two-vectorP is the two-dimensional normal distribution with
zero mean and covarianceequal to a 2 x 2 identity matrix
This choiceprovidesuncorrelatedsteps in the x and y directions. In

the ensuing discussionwe choose At
1 so k t. We also take
0 here so there is no drift, only diffusion. A representavx
tive simulation of (5.26)is given in Figure 5.3.

Wecan approximate average properties by simulating many trajectories or equivalentlymany independent particles, and then taking the
average.Let Xi(k) be the position of the ith particle at sample time k,
which follows the evolution
Xi(k+ 1) =
(5.27)
5.2 Stochastic Processes for
continuous
Random
variables
469
20
-20
-40
-60
-80
-100
-120
140
-160
-80 -60 -40 -20 0
20 40 60 80 100
Figure 5.3: A representative trajectory of the discretelysampled

Brownian motion;D = 2, V = 0, n = 500.
12000
8000
4000
200
mean square
The
5.4:
Figure
500.
--
600
400
1000
800
time k
D
versustime;
displacement
2, V

470
The squared
displacement
and the mean
square
particles
is given by
particle
of the ith
(5.28)
given by
is
displacement
(k)
(r2) (k)
the average over many
n large
for the random walk

displacement
mean square
the
coefficient. We use n = 500
shows
diffusion
Figure5.4
the
square displacement
D = 2 for
no drift andsimulation.
Noticethat the mean
Einstein's analysis
particlesfor this time. The simulation agrees with
1990, pp.3-5) as well
growslinearlywith 1905). see also (Gardiner,
(Einstein,
of diffusion
as our analyses
above
(5.29)
(r(k) = 4Dk
Equation
5.2.4 Fokker-Planck
about solving an SDE.We can find particuthink
to
ways
two
There are
the Euler-Maruyama scheme above will do.
lar trajectoriesthisis what
probability density p (x, t).
evolution of the
We can also consider the
white-noise and Wiener processes, we obIn consideringthe integrated
evolution of p (x, t) and the diffuservedthe connectionbetween the
the solution to dx = dW. Because
sion equation. The Wienerprocess is
density p (x, t) for a trajectory
its trajectoriesx(t) x(0) N(O,t), the
equation
starting at x = xo is a solution to the transient diffusion
p
t
2p
X2
p(0, t) =
xo)
(5.30)
withD 1 To generalizethis result, consider the time evolution

of the expectedvalue of an arbitraryfunction f(x(t)), where x(t)
evolvesaccordingto the It SDE(5.16). UsingIt's formula and the
result (Bf' dW) = 0, which is the infinitesimal version of (5.19)
= ((Af' + -B 2f") dt + Bf' dw

= ((Af' + -B 2f") dt
5.2 Stochastic processes for Continuous
Random
variables
rewritten as
This can be
471
Rearrangingand integrating by parts yields
f(x) p(x,t)
t dx
(x Gap (x, t))
2 x 2
dx
Finally, since f is arbitrary, this result can only

hold in generalif
p(x, t)
2 1
x2 2
(5.31)
This is the evolution equation for p(x, t), often calledthe

FOKKEREQUATION(FPE).For a trajectory
PLANCK
starting at x = xo, the
this
equation
is
for
again
p (0, t) = (x xo). The initial
condition
equation
put into conservation form
canbe
p(x,
t)
A(x, t)p(x,t) -
(5.32)
The term inside the curly brackets is the flux of probabilitydensity

and this equation bears obvious similarities to equationswe are famil-
iar with from transport phenomena. It showsus that trajectoriesof

an It SDEhave a drift coefficientA(x, t) and a diffusioncoefficient
)'B2(x, t). This is sometimes called the "short-time"diffu'D(x, t)
sivity, because one can show using It's formula (Exercise5.5)that for
particle at position x' at time t'
2
d(x x')
dt
t=t'
(5.33)
of the trajectoryis (as in the

Similarly, the instantaneous drift velocity
deterministic case)
d(x x')
dt
integrate to unity
The probability density must
p(x,t) dx = 1
(5.34)
(5.35)
Stochastic Models and
Processes
472
is
the curly brackets is

expression inside
there is an important
A = v, but
differs
with
The
the FPE,
for
gousto that
not equivalent to the (gradient) diffus
is
it
position,
transport equation. Exercise 5.2
varies
the
FPE
the
appears in
detail.
D that
in further
differences
analysis to an Il-vector random process
the
these
generalize
... , n. The SDE and FPEs for this
Wealso can
Xi, L -- 1, 2,
in
first term
X, with
the flux
components
are
dXi = Ai(x,
t)dt + Bij(x, t) dWj

2
Xixj
t
of all components
function
a
is
Herep
(Dij(x, t)p)
(5.36)
(5.37)
Xi and time, p = p (Xl,... ,xn, t),
the elements of the diffusion coefficientmaare

LBikBjk
and Dij = 2
of (5.37)from (5.36) makes use of the MULTIDI.
derivation
The
trix
FORMULA
MENSIONALIT
+ BLkBJk
df(x)= At
Xixj
ijdwj
(5.38)
conserved
As in the scalar case, probability is
...,xn) dX1dX2 ... dxn = 1
(5.39)
In vector/matrixnotation the equations are written
dx = A x(t)dt + B(x, t) dw
(5.40)
(5.41)
with
(5.42)
Thisresultindicatesthat D
is symmetric positive semidefinite.For
numerical
integration of multidimensional
scheme extends
straightforwardly.
SDEs, the Euler-Maruyama
5.2 stochastic Processes for continuous
Random
variables
473
Example 5.3: Transport of many
particles
large number of particles, each
suspended
obeying the
in a fluid
equation
dx
are
vdt + N/bdw
in a fluid. How do
we
describe the
Solution
The probability density for an individual
For the many-particle
evolutionof
the
particle
vpx + Dpxx
concen-
evolvesas
system, we define
an n-particle
joint density
func-
... , xn, t)dx1dx2

dxn
probability density
that particles
through n are
1
located at
respectively, at time
through xn,
The concentration of particles at x, c (x,

t), is then
n
xj)fldxt
i=l
(5.44)
The jth term in the sum represents the probability

is located at x at time t, and the sum over all that the jth particle
particles
concentration. If the particle motions are independent givesthe total
n
p(xl,...
Performingthe integral in (5.44)gives
c(x,t) =
which indicates that the linear superposition of each particle'sprobability of being at location x produces the total concentrationat x. If
the particles are identical, pj(x, t) = p(x, t), j = 1,... ,n, this reduces
to
c(x, t) = np(x, t)
gab
Iti
474
The evolutionequation for c is
(5,
Processes
36) P
0
therefore
vcx(x,t) + Dcxx(x, t)
The conclusionis that the concentration profile created by many
interacting, identical particles obeys the same evolution equation non.
probabilitydensity of a single particle. Averaging the behavior as the
particles does not "average away" the diffusion term in the of
evolution
equationof the total concentration c(x, t). See Deen (1998,pp.
59-63)
Example 5.4: Fokker-Planck equations for diffusion on a
plane
Example5.1 introduced the stochastic differential equations
for diffu_
sion on a plane in Cartesian and polar coordinate representations.
For
the Cartesian representation, (5.22) and (5.23) have
probability density
' eld
ser
not
factoj
x2
e dic
xY
with normalization (conservation of probability)
on the
e ands
condition
- r,X2 message
p(x,y) dx dy = 1
If we rewrite this equation in polar
only a
here,
we wish
Finally,
fac
by the
Motivated
coordinates we get
guess that
wemight
p (r, O)
and
rr(
Do we get the same result
not?
(5.45)
substit
g this
cc
equationin polar
2n 00
o
of the stochastic
r2
pi
p(r,
dr dO = 1
if we start with the
polar coordinateform
differential equations,
(5.24) and (5.25)? Why or
why
and
Solution
Equations(5.24)and
(5.25)can be
written as the system
B O
dt +
O
dwr
dwe
Stochast
53.1
Introduc
Our
next
netics applic
takin
5.3 Stochastic
Kinetics
475
with regard to (5.36) and (5.37),Xl r,X2

= Oand
1 BOB
Inserting these expressions into (5.37)and

denotingthe
probability
pp(r, O)
2
t
r2
Y2 02 P P (r, O) (5.46)
Thisis not the transient diffusion equationin polarcoordinates.
We begin to understand this difference by writing the
normalization
condition, (5.39)
Pp(r, O)dr de
This differs by a factor of r in the integrand from the conventionalarea
integral in polar coordinates. The reason is simple: in goingfromthe
SDEto the FPE,we did not tell It's formula about the geometryof area
elements on the plane, but only to take an SDEwrittenwithvariables

= e and write the correspondingFPE.Thereis no paradox
Xl = r,
here, only a message to be careful about coordinatetransformations.
Finally, we wish to understand the relationshipbetweenp and pp.
Motivated by the factor of r difference in the normalizationconditions,

we might guess that pp (r, O) = crp (r, O)where c is a constant. Indeed,
making this substitution into (5.46), we recover the transient diffusion
the origin
equation in polar coordinates, (5.45). For a process starting at
(Exercise5.9)are
at t = 0, the normalized solutions
1
4TTDt
and
_r2 / (4Dt)
pp(r, O,t) =
5.3 Stochastic Kinetics
and Time
Length
and
Introduction,
53.1
Scales
chemicalkiand
networks
is reaction
start with a
we
First
Our next application of interest
numbers of molecules.
small
at
place
netics taking
476
Processes
define some useful nomenclature.

continuum kinetics example to
reaction
sider the followingtwo-step series
k2
c = c A CB CC
We define the species vector of concentrations
,and
network with
denote the stoichiometryfor the reaction

metric matrix
the stoichio_
-1
Weletvt, i 1, 2, , nr denote the rows of the stoichiometric matrix

written as column vectors
-1
-1
We assume the reaction takes place in a well-mixed reactor and aSSume

some rate law for the reaction kinetics, such as
= klCA
IQ = k2CB
As taught in every undergraduate chemical engineering curriculum, the
material balances for the three species is then given by
C = VTr (c) =
dt
ViYi(c)
(5.47)
The solution of this model with a pure reactant A initial conditionis

shown in Figure 5.5.
Next we consider reactions taking place at small concentrations. Instead of the common case in which we have on the order of Avogadro's
number of reacting molecules, assume we have only tens or hundreds
of molecules moving randomly in a constant-volume, well-mixed,reactor. At such low concentrations, the deterministic concentration assumption makes no sense, and we have to consider the random behavior of the molecules. But we still have to choose an appropriate length
and time scale of interest. Indeed, if we move down to the length scale
of the atoms, we can model the electron bonds deforming continuously in time from reactants through transition states to products. We
choose instead a larger time and length scale so that each reaction that

477
1
0.8
CA(t)
0.6
cc(t)
CB(t)
0.4
0.2
0
Figure 5.5: Two first-order reactions

in
= 1, CBO= CC()= O,kl = series in a batch reactor,
2,
1
takes place can be regarded as a single instantaneous

eventcausinga
discrete change in the number of reactants and
products. At this scale,
we track the integer-valued numbers of reactant and
products, and we
treat the reaction events as random jump processes.
This
length and time scale makes the discrete Poissonprocess choiceof
the natural
description for stochastic kinetics.
53.2 Poisson Process

Just as the Wiener process W(t) is the simplestmathematicalprocess
appropriate for modeling diffusion, the Poisson process Y(t) is the sim-
plest mathematical process appropriate for modelingstochasticchemical kinetics. The POISSONPROCESS

is an integer-valued counting process. Time is modeled as a continuous variable,but the value of the
Poisson process is discrete. The Poisson process is characterizedby a
rate parameter, > 0, and for small time intervalAt, the probability
of an event taking place in this time interval is proportionalto At.To
start off, we assume that parameter is constant. The probabilitythat
an event does not take place in the interval [0, Atl is thereforeapproximately 1 At. Let random variable T be the time of the first event of

478
process
the Poisson
have for small

t = O.we then
from
starting
Pr(T > At)

process has independent
Poisson
process, the number of events in disjoint time inter.
Wiener
Likethe
that the
increment assumption coupled
means
independent
ments,which
implies that the probability that
independent. The
change
are
not
vals
does
two consecutive time intervals [O,
the fact that
in
place
not take
does
Continuing this argument to n intervals
event
an
At)2.
(1 --is Pr(T > 2At)
gives for t = nAt
z (1 At)t/At
t)
Pr(T >
At
Takingthe limit as
0 gives
At
Pr(T > t) = e
of T's probability disaxioms and the definition
From the probability

tribution, we than have
Pr(T
t) = FT(t) =
gives the
Differentiatingto obtain the density
PT(t)
At
exponential density
At
(5.48)
chemical and bioThe exponentialdistributionshould be familiar to

logicalengineersbecause of the residence-time distribution of a wellmixedtank. The residence-timedistribution of the CSTRwith volume
V and volumetric flowrate Q satisfies (5.48) with
rate or inverse mean residence time, = V/Q.
being the dilution
Figure5.6 showsa simulationof the unit Poisson process, i.e.,the
Poissonprocess with = 1. If we count many events, the sample path

looks like the top of Figure 5.7, which resembles a "bumpy" line with
slope equal to A,unity in this case. The frequency count of the times to
next event, T, are shown in the bottom of Figure 5.7, and we can clearly
see the exponentialdistribution with this many events. Note that to

generatea sample of the exponential distribution for the purposes Of
simulation,one can simply take the negative of the logarithm of a uniformly distributed variable on [0, 1]. Most computational languages
provide functions to give pseudorandom numbers following a uniform

distribution,so it is easy to produce samples from
the exponential distribution as well. See Exercise 5.14 for
further discussion.

479
10
10
Figure 5.6: A sample

path of the
unit Poisson
12
process.
1000
800
600
400
200
200
400
600
800
1000
250
200
150
100
50
Figure 5.7: A unit Poisson process with more events;samplepath

(top) and frequency distribution of eventtimes T.
480
Stochastic Models
and
The time of the first event also characterizes the

0) for t 0. The probability that Y is still zero at time t
is the
the probabilitythat the first event has occurred at some
sarne
time
as
=
t)
>
1 - Pr(T t).
than t, or Pr(Y(t) = 0) = Pr(T
Therefore
the relationships
= 0) = I-FT(t) = eAt
Wenext generalize the discussion to find the probabilit
the time of the second and subsequent events. Let ran y density
dom
for
denote the time of the second event. Wewish to comput variable
(t2, t). Because of the independent increments
PT(t2
0
Integrating the joint density gives the marginal

PT2(t 2) =
PT(t2 t)pT(t)dt
(t2) = 2t2e -t2
or
(t) = 2te-t.We can then use induction to obtain the
of the time for the nth event, n > 2. Assumingthat Tn-l has density
density
t / (n 2)!, we have for Tn
t)
e-(tn-t)
(VII
AnI
(5.49)
From here we can work out Pr(Y(t) = n) for any n. For Y(t) to be n at
time t, we must have time Tn t and time TTL
+1 > t, i.e., n eventshave
occurred by time t but n + 1 have not. In terms of the joint density,we
have
P Tn+l'Tn(t', t)dtdt'
481
Kinetics
5tocastiC
increments property allowsus to express

independent
as PTn+1,Tn
the previous equation gives
to
evreensity
in
(5.49)
and
t nle At
dtdt'
n)
(5.50)
alternative derivation. The discrete density

an
for
5.13
(5.50),i.e, p(n) = e-aan/n! with
the right-hand side of
Eserc1Soen
the Poisson density. Its mean and varias
known
seeearing
At, is
= At,
5.12). so we have that
(see Exercise
equal to a
Figure 5.7.
with
consistent
ear only as the product At, the Poisson process
is
app
t
and
of the
Because X, now denoted YR(t), can be expressed in terms
Y(t), with the relation
intensity
process, denoted
poison
Y(t) = Y(t)
follows. we
as
is
justification
have just shown

At
we have Pr(Y(t) = n) = tne t/n!,

process,
Poisson
unit
and,for the
on the substitution of At for t. Becausethe increequivalent
whichis
we also have the property for all n 0
independent,
mentsare
t s
= n)
Pr(Y(t) Y(s) = n) = Pr(Y(t s)
(5.5) for the Wiener process.
to
similar
is
which
we consider the nonhomoNonhomogeneousPoisson process. Next
the intensity Mt) is time varying. We
geneousPoissonprocess in which
general case so that the probdefinethe Poisson process for this more
proportional to
is
abilityof an event during time interval [t, t + At]
for At small. We can express the nonhomogeneous process
alsoin terms of a unit Poisson process with the relation
Y(t)
Toseethat the right-hand side has the required property, we compute

theprobabilitythat an event occurs in the interval [t, t+At]. Let z(t) =
482
Processes
Jot (s)ds. We have
Pr(t ST St + At) =
+ At)
= Pr(Y(z(t + At)
YR(t) > 0)
z(t)) > 0)
(s)ds) = 0)
(s)ds
For At small, we can approximate the integral as

giving
Pr(t < T st + At) 1 (1 (t)At) =
(s)ds
and we have the stipulated probability.
Random time change representationof stochastic kinetics. With

these results, we can now express the stochastic kinetics problem in
terms of the Poisson process. Assume ny reactions take place between

s, and denote
chemical species with stoichiometric matrix v e RY1rXn
Let X(t) e ons
its row vectors, written as columns, by Vi,t -
be an integer-valuedrandom variablevector of the chemicalspecies

= 1, 2, ... , be the kinetic rate expressions
numbers, and let
for the ny reactions. We assign to each reaction an independent Poisson
process Yiwith intensity Yi. Note that this assignment gives nr nonhomogeneous Poisson processes because the species numbers change
with time, i.e., ri = ri(X(t)). The Poissonprocesses then count the
number of times that each reaction fires as a function of time. Thus
the Poisson process provides the extents of the reactions versus time.
From these extents, it is a simple matter to compute the species numbers from the stoichiometry. We have that
i=l
(5.51)
This is the celebrated random time change representation of stochastic

kinetics due to Kurtz (1972).
Notice that this representation of the species numbers has X(t) ap-
pearing on both sides of the equation. This integral equation representation of the solution leads to many useful solution properties and
simulation algorithms. We can express the analogous integral equation
5.3
for
stochastic
483
Kinetics
mass balance given in (5.47)

stic continuum
tile
dolli
timeS
Simulation
Stochastic
53.3
ime change representation suggests a natural simulation
t
random strategy for the species numbers X(t). We start with a
The
samplingown initial condition, X (0). We then select based on each
or
kn
chosenor exponentially distributed proposed times for the next rereaction, t 1, 2, . .. , nr. These exponential distributions have inTi,
actions,
As mentioned
to the different reaction rates,
equal
tensities
we obtain a sample of an exponentialFTt(t) =
previously,
sample of a uniformly distributed RVon [0,1], u, and
a
drawing
by
logarithm
rescalingthe
Ti = (l/ri)lnui
the
Wethen select
to fire,
giving
i=
reaction with the smallest event time as the reaction
tl
min
Ti
il = arg min
Ti
1 , nrJ
numbers at the chosen reaction time with

Wethenupdate the species
thestoichiometriccoefficients of the reaction that fires
X(tl) =
Thisprocessis then repeated to provide a simulationover the time

intervalof interest. This simulation strategy is known as the FIRST
(Gillespie, 1977). We summarize the first reaction

METHOD
REACTION
methodwith the following algorithm.
Algorithm 5.5 (First reaction method).

Require: Stoichiometric matrix and reaction-rate expressions, Vi,ri (X),
1,2, ... , nr; initial species numbers, Xo;stopping time T.

1:Initializetime t 0, time index k = 1, and species numbers X(t) =
xo.
484
Processes
Figure 5.8: Randomly choosing a reaction with appropriate Probabil_
ity. The interval is partitioned according to the relative

sizes of the reaction rates. A uniform random numberu
is generated to determine the reaction. In this case, since
P2 s u
P3, m = 3 and the third reaction is selected.
2: Evaluate rates ri = ri(X(t)). If ri = 0, all i, exit (system is at steady

state.)
3: Choose nr independent samples of a uniformly distributed RV
Compute random times for each reaction Ti = (1/ri) In ut.
4: Select smallest time and corresponding reaction, Tk = min
ik arg
-rt.
5: Update time and species numbers: tk t r k, X (tk) = X (t) + Vtk

6: Set t = tk, replace k k + 1. If t < T, go to Step 2. Else exit.
Gibson and Bruck (2000) show how to conserve random numbers

in this approach by saving the ny -- 1 random numbers that were not
selected at the current iteration, and reusing them at the next iteration. With this modification,the method is termed the NEXTREACT10N
METHOD.
An alternative, and probably the most popular, simulation method

was proposed also by Gillespie(1977, p. 2345). In this method,the
reaction rates are added together to determine a total reaction rate r =
The time to the next reaction is distributed as PT(t) =
re-rt. So sampling this density provides the time of the next reaction,
which we denote T. To determine which reaction fires, the following
cumulative sum is computed
Pi = E rd r,
Note that 0 = po PI P2
= 1, so the set of Pi are a partition of [0, 1] as shown in Figure 5.8 for nr = 3 reactions. The length
of each interval indicates the relative rate of each of the nr reactions.
So to determine whichreaction m fires, let u be a sample from the
485
Kinetics
5tocastiC
[O,1], and determine the interval m in which
53
pmI u pm
fires m and the time of the reaction T, we then
rli Y
falls
that
reaction umbers in the standard way
n
te
tne
Gj%ante
LIP
species
or simply the
wn as Gillespie's DIRECTMETHOD
(SSA).we
this method
direct method or SSA).

tthe 5.6 (Gillespie's
matrix and reaction-rate expressions, Vi,ri (X),
goritlll
stoichiometric
N
initial species numbers, Xo;stopping time T.
uire:
nr;
,
gea
time index k = 1, and species numbers X (t) =
O,
--t
time
Initialize
1:
and total rate r = Eiri. If r = O,exit

=
rates
Evaluate steady state.)
at
(systemis independent samples, u 1, 112, of a uniformly distributed

choose two Compute time of next reaction T = (l/r) Inui.
3:
1].
RVon [0, reaction, ik, as follows. Compute the cumulative sum,
which
4:selectStg. _1 rj/r for i e [0, nr]. Note po = 0. Find index ik such that
Pik'
and species numbers: tk

time
5:Updatetk, replace k k + 1. If t < T, go to Step 2. Else exit.
6: Sett
results when starting with n A = 100 molecules.
Figure5.9 shows the
the simulation gives a rough appearance
of
aspect
random
Noticethe
molecules versus time, which is quite unlike the deof
number
to the
presented in Figure 5.5. Becausethe number of
terministicsimulation
simulation is discontinuous with jumps at
moleculesis an integer, the
in spite of the roughness, we can already
thereactionevent times. But
series reaction: loss of starting
makeout the classic behavior of the
intermediate
2:
materialA, appearance and then disappearance of the

5.9
speciesB, slow increase in final product C. Note also that Figure
isonlyone simulation or sample of the random process. Unlikethe deterministicmodels, if we repeat this simulation, we obtain a different
sequenceof random numbers and a different simulation. To compute
accurateexpected or average behavior of the system, we perform many
oftheserandom simulations and then compute the sample averages of
quantitieswe wish to report.
486
Processes
100
80
nc
60
40
20
4
5
Figure 5.9: Stochastic simulation of the first-order series reaction

starting with 100 A molecules.
5.3.4 Master Equation of Chemical Kinetics

The simulations in the previous section allowus to envision many posSible simulation trajectories depending on the particular sequenceof
random numbers we have chosen. Some reflection leads us to consider instead modeling the evolution of the probability density of the
state. We shall see that we can either solve this evolution equation directly, or average over many randomly chosen simulation trajectories
to construct the probability density by brute force. Both approaches
have merit, but here we focus on expressing and solving the evolution
equation for the probability density.
Consider the reversible reaction
kl
A+B=C
(5.52)
taking place in a constant-volume, well-stirred reactor. Let p (a, b, c, t)
denote the probability density for the system to have a moleculesof
species A, b molecules of species B,and c molecules of species C at time
t. We seek an evolution equation governing p(a, b, c, t). The probability density evolvesdue to the chemicalreactions given in (5.52).
Consider the system state (a, b, c, t); if the forward event takes place,
53
stochastic
487
Kinetics
from state (a, b, c, t) to (a 1,b 1,c + 1, t + dt).

s
move
takes place the system moves from state
system reaction event
+ 1, c 1, t + dt). We have expressions for the
reverse
tlfllle
events
two
these
{ales'of
= k-1C
rl = ab
are
required for
rates
the
the trajectory simulationsof the pre-
But
events occurring at these rates change the probreaction

these
the system is in state (a, b, c, t). This evolution
110Wdensitythat
density is known as the master equation
ability for the probability
equation for this chemicalexample
1kinetics. The master
sections.
system IS
p(a, b, c, t) = (klab + k-1C) p(a,

t
(5.53)
reaction rate for each event is multiplied by the probthe

that
Wesee
that the system is in that state.
density
ability
reaction, we can simplify matters by definBecauSewe have a single
extent of the reaction. The numbers of molecules of
ingEto be the calculated from the initial numbers and reaction exeachspecies are
reaction stoichiometry
tentgiven the
C = CO+ E
Wesee that E = 0 corresponds to the initial state of the system. Using

thereactionextent, we define p (E,t) to be the probability density that
thesystemhas reaction extent E at time t. Converting (5.53)we obtain
p(E,t)
t
(5.54)
Thefour terms in the master equation are depicted in Figure 5.10.

Givenao,bo,co we can calculate the range of possible extents. For
simplicity,
assume we start with only reactants A and B so co 0.
Thenthe minimum extent is E = 0 because we cannot fire the reverse
488
Stochastic Models
and P
reaction from this starting condition. If we fire the
forward
the range of possible extents is 0 E n. We now can

equations stemming from the master equation and
place them
.
In I
po
(30
(31
P2
the
Yo
Yl
(32 Y2
dt
pn-l
pn
Yn-l
Pn-l
(5.5S)
pn
in which pj(t) is shorthand for p (j, t), and (Xj,j, and

lowingrate expressions evaluated at different extents ofYj are thefol.
the reaction
cxj= kl(ao j +
j + 1)
kl(ao j) (bo j) 1<-1(co + j)
We can also write this model as
dt
(5.56)
in which P is the column vector of probabilities for the different

reaction extents
and the A matrix contains all the model parameters.
The essential connectionbetween the stochastic and determinis-
tic approaches to the well-mixed chemical kinetics problem is thatthe

stochastic model's probability density becomes arbitrarilysharpat the
solution to the deterministicproblem as the number of moleculesincreases. Figure 5.11 displays the solution to (5.55)starting with20A
molecules, 100 B molecules and 0 C molecules. The extent of reactionis
scaled by the initial number of A molecules. Notice that the probability
density spreads out rapidly as time increases and there is significant

uncertainty in the equilibrium state.
If we increase the starting number of molecules by a factorof 10,
we obtain the results depicted in Figure 5.12. Noticethe sharpening
489
Kinetics
5toca5tlC
p(E, t)
-1
equation for chemical reactionA + B=C. The

Master
5.10:
at state E changes due to forward
probability density
Figure
reaction events. The rate of change is proand reverse
reaction rate times the probability denportional to the
in that state.
sity of being
time is
density. We can see that the extent versus

probability
is approaching the mass
the by the peak in probability density
You can imagine the sharpness in the density if
tracedout
limit.
kinetics
with on the or
out
near that limit, the
westarted earlier,however, if we are not operating
stressed
may be an important physical behavior to include
fluctuation
random
this behavior, the stochastic approach is
describe
To
inthemodel.the deterministic approach cannot be substituted.
and
essential
(5.56),is a simple linear, constant-coefficient
equation,
master
The
equation, and the solution is
differential
P(t) = e Atpo
master equation directly is its high dimenThechallengein solving the

different species values
sion.Thedimensionof P is the number of
have a single reaction, the
thatthe system can reach by reaction. If we
extentcanrange from zero, its initial value, to a value that exhausts
somelimitingspecies. Denote this limiting species's initial number by
no,thedimensionof the state vector P is then no. But if we have mul-
tiplereactions,we multiply nr by the limiting species corresponding

toallthecombinationsof reactions. The scalingis on the order of the
product
nonr. If we have 1000 initial molecules with 10 reactions, the
dimension
of the master equation P vector is already on the order of
490
Stochastic
Models
10
0.2
0.4
t
0.4
0.6
0.6
0.8
.2
0.8
1
Figure 5.11: Solution to master equation for A +
20 A molecules, 100 B moleculesand 0 startingWith

C
3. Congratulations, you molecules
now understand what is displayed on the cover of
the text.
104. The A matrix already contains 108 elements, although

it wouldbe
quite sparse.
Thus solving the master equation becomes

ally intractable for problems of even modest size. The bestComputationwe canhope
for with these larger models is to sample the master equation
withsimulations. Even simulating enough trajectories to obtain reliable
averages can be quite time consuming, which motivates researchsample
efforts
to develop efficient simulation algorithms and sampling strategies.
Giventhis basic understanding, we now express the general
master
equation for nr reactions with the random variable X (speciesnumbers)
as the state of the system rather than the reaction extents. Given
a
system in state x e ons, reaction i with stoichiometricvectorVican
reach state x from only state x Vi,and can leave this state to reach
state x + Vi. We then have for the evolution of the probabilitydensity
dt
(5.57)
tocastic
5
491
Kinetics
30
20
10
0.2
0.4
0.6
0.2
0.8
starting with
Solution to master equation for A +
5.12:
Figure
200 A molecules, 1000 B molecules and 0 C molecules,
-- 1/200, 1<-1=
PX(x, 0) m (x). Equation (5.57)is the CHEMIwithinitial condition

for a general reaction network. It is also known
EQUATION
MASTER
CAL
equation in the mathematics literature.
as the forward Kolmogorov
Applying(5.57) to the previous example we have
rl(x) = IonA r-l(x) = k-lnc
nc
andmaster equation
dt
(rl (x, t) +
(x, t))px(x, t)
Stochastic Models
and
Processes
or, written to show the species numbers

Px
dt
nc
, t kl (nA + l)px
k -1 (nc + 1) PX
t)
+1
nc 1
(1<1
n A + k-1nc)Px
nc
5.3.5 Microscopic, Mesoscopic, and Macroscopic Kinetic
Models
Next we would like to explore how the discrete stochastic kinetic
of a microscopic system transforms into the deterministic kineticmodel
of a macroscopicsystem that is familiar to undergraduate model
chemical
and biological engineers. Along the way, we derive a model for the
the regime bridging the microscopic and macroscopic levels,which
sometimes called the mesoscopic regime. Our goal is to start with is
the
microscopic chemical master equation and take the limit as the sys.
tem size becomes large. We use the system volume Q for the size
parameter. The procedure we follow is given by van Kampen(1992
EXPANSION.
pp. 244-263)and is known as the OMEGA
The essentials
of the approach are perhaps best explained by taking a concrete(and
nonlinear) example. Consider the bimolecular reaction
In the deterministic macroscopic description, we have a reaction-rate

expression r = kc2, in which c is molar concentration of A, an intensive
variable, and the rate constant k has units of 13/(mol t), so the rate
has units of mol/ (t 13),a rate of reaction per volume,whichis also
intensive. The mole balance for species A in a well-mixed system is the
familiar
dc
dt
= 2kc2
c(0)
co
(5.58)
For these same kinetics, at the small scale, we have the microscopic
chemical master equation
n(n 1)
dt
20
20
(5.59)
in which n is the number of A molecules in the well-mixedsystemof

volume Q. Here n is a discrete (nonnegative,integer-valued)random
493
Kinetics
53
5tocastiC
0.2
0.15
o 20 0.0114
var(E) --
0.1
0.05
0.4
0.2
0.6
0.8
0.06
0.04
Q = 200
var(E) = 0.00176
0.02
0.2
0.4
0.6
0.8
The equilibriumreaction extent's probabilitydensityfor

5.13:
Figure
Reactions 5.52 at system volume Q = 20 (top) and Q =
200 (bottom). Notice the decrease in variancein the
reaction extent as system volume increases.
large, we expect the concentration c = n/Q to

variable.As Q becomes
by the ODE(5.58). It is initially far from clear how we
bewelldescribed
transition from a discrete-valuedrandom
takethis limit to make this
deterministic variable c.
variablen to a continuous-valued
Tomotivate the appropriate analysis, we first look at solutions to
themaster equation for increasing values of Q. Figure 5.13 shows the
finalequilibrium distributions of the scaled reaction extent, E, from
Figures5.11 and 5.12. We have increased the system size from Q = 20
in the top figure to Q = 200 in the bottom figure. We also show the
variancein random variable E in the two simulations. Notice that for a
ten-foldincreasein Q, the variance has decreased by almost this same
ten-foldamount. From these solutions to the master equation we have
someidea what to expect. For a large system, the integer increments in
thenumberof molecules n become so fine that we can approximately
replacethem by a continuous variable c. But we also see randomness
494
Stochastic Models
and
Processes
in the concentration, and although the (relative) magnitude

of
centration fluctuations decreases as the system size increases, the con
it is
zero. In fact, we see that the familiar normal distribution a
not
ppears
of
the
to
fluctuations
distribution
scribe the probability
and the
des
variance
Therefore we are led to hypothesize that we can approximate

n as
a combinationof the deterministicconcentration c and a
random variable to capture the fluctuations. Based on ourcontinuous
numerical
so that the variance in n/Q scales with Q-1, i.e., var(n/Q) =
(5.60)
var).
We are neglecting terms of order 00 and lower in the expansion
of n
in (5.60). Thus we are expressing n/Q as a perturbation solution
in
increasing powers of small parameter Q-1/2. The additional complica_
tion in this case compared to our previous perturbation examples
in
Chapters 2 and 3, is that we are also changing from a discrete variable
n to continuous variables c and E.
The master equation describes the density of random variable
n
P(n, t), and we wish to deduce an evolutionequation for the density
of random variable which we denote FI(;, t). And we also expectthe
analysis to show that the familiar differential equation (5.58)describes
the deterministic variable c. As a transformation of random variables
we are considering the two densities to be related by
POI, t) = P(cQ + 0 1/2, t)
in which we suppress the dependence of n on c. Consider c to be some
known function of time when expressing the transformation between
the two random variables n and
Given this transformation, the partial derivatives are related by Pt =
and is found by differentiating (5.60)holding n constant,
Flt +
which yields
Fot Q1/2
in which represents the time derivative of c (t). Substitutingthis into

the relation for the partial derivativesgives
This is the first step. Wehave the left-hand side of the master equation
evaluated in terms of the new density Il. Next we work on the right-
hand side.
495
Kinetics
5tocastiC
53
simply the transformed Il(E, t), but wc also reis

t)
p(n,
(5.60) for so that we know what
term t). First we solve

flie 4-2,
p(ll to in variable
Llire onds
E(n) = nQ-1/2 -- cQl/2
corresp
+ 2), t) in terms of
and
(5.61)
= 20-1/2. Next we use a Taylor series
E(n+ 2) --to
n +2
t), denoted simplyas
its
-11 +
80-3/2
3!
2!
(5.62)
in the Taylor series determines the order

terms retained
of
number ation for the density Il. We now can easily transform
Tlae approxim
terms in n using (5.60)
of the
remaining
n(n 1)
(11+
+ 1) - c2Q +
+ (3C+
+ 20-1
all of these ingredients by substituting them into the

combine
we
Now
equation (5.59) giving
master
-[4C +
+
k[c2Q1/2
+ 20 -1 ] 11+
+ (3c +
k[c2+ 2c;Q-1/2+ (3C+
+ 30-1 + 20-3/2111;+
+ 30-3/2 + 20-2]
to the second-order term in (5.62).

inwhichwe have kept up
Thethird and final step is to extract from this large equation the informationprovided at the different orders of the expansion parameter
Q.
order 01/2. Collecting the terms of order 01/2 gives (C+ kc2)11; = 0
and,sinceIl; * 0, we deduce
dc
dt
= kc2
(5.63)
whichis the macroscopic equation (5.58) after noting that the usual
macroscopic
convention absorbs a factor of one-half into the definition
of the rate constant, i.e., k = k/ 2.
496
Stochastic Models
and
Processes
Order 00. Collectingthe terms of order QC)gives

+ kc 2
lit = 2kc 11+ 2kc;
which can be rearranged into
(kc 2 H)
This is the familiar Fokker-Planckequation, (5.41),which we

can Write
as an equivalent SDE
= -2kc;dt+
dw
Because this is a linear Fokker-Planckequation (the drift term is

line
in and the diffusivity is independent of g), this equation is
SOmetimes
referred to as the linear noise approximation.
To simulate the model at this level of approximation, we first
solve
(5.63) for c (t) , and then perform a random walk simulation for the
tuation term
flucwhich depends on c (t). We combine these twoparts
for n(t) using (5.60). This description in which c(t) is deterministic

and E(t) is a continuousrandom walk is the mesoscopicdescription.
We see the results in Figure 5.14. The top figure shows the discrete
simulation using KMCfor volume Q 500 and initial conditionof 500
A molecules no 500. Note that the plot has a log scale on the time
axis to more clearly show the evolution at early times. These two simulations display quite similar character. To compare them more quantitatively, we could compute several low-order moments of the densities
by computing sample averages over many simulations.
As a more comprehensive alternative, we compute the correspond-
ing cumulativeprobabilitydistributions at the selected time t 1,

shown as the dashed line in Figure 5.14. We obtain the cumulative
distribution for the discrete model by solving the master equationand
summing
0 <n
no
We can obtain the density for the omega expansion by solvingthe PDE
for Il (4 t), shifting the mean by the deterministic c (t), and integrating
for the cumulative distribution. Or we can instead derive a corresponding evolution equation for g's cumulative density
Il(,
5.3
stochastic
0.01
497
Kinetics
0.1
10
100
0.1
10
100
0.6
0.4
0.2
0.01
Figure5.14: Simulationof 2 A
B for no = 500, Q = 500. Top:
discrete simulation; bottom: SDE simulation.
c(t), t). Exercise5.22

andshiftits mean by c(t), Fc(x,t) =
discussesthis approach in more detail. The results are shown in Figure5.15.The staircase function is the solution to the discrete master
equationat time t = 1, at which time the deterministic concentration
is one-half,i.e., c(t) = 1/2 at t = 1. The steps in x = n/Q are caused
bythezero probability at all the odd integer values of n in the discrete
model.The smooth function is the omega expansion, which we can see

isin reasonablyclose agreement with the discrete model for Q = 500.
Finally,in the limit as Q 00,the fluctuation becomes negligible
compared
to c, and we have the familiar deterministicmacroscopic
description,(5.63) or (5.58). In Figure 5.15, this limit would be observed

498
-1
0.8
0.6
F(x, t) 0.4
0.2
o
0.4
0.45
0.5
0.55
0.6
B at t = 1 with no =
A
distribution for 2 equation (steps) versus
Cumulative Discrete master
Figure 5.15:
500, Q = 500.
omega expansion
by the two
(smooth).
function at the value of

to a unit step
converging
functions
Exercise 5.22.
also
See
c(t).
=
x
rapidly growing literature on stochasand
extensive
an
Thereis now
Anderson and Kurtz (2011) is highly
by
chapter
book
current and comprehensive
tic kinetics. The
a
in
interested
recommendedfor those
covered here as well as more advanced
topics
the
of
overviewof most
theorems for Poisson processes, marlimit
central
relevant
topics on:
model reduction.
tingales,and scaling and
5.4 Optimal Linear State Estimation

5.4.] Introduction
Sensorsare how we learn about the world. Our five natural senses
provideus with our first exposure to sensors, i.e., the type built in
by nature. Sincehumans are very curious about the world, people
havebeen hard at work for a long time augmenting the natural senses
by constructingartificialor man-made sensors. Some of mankind's
biggestadvancesin scienceand engineering were precipitated by a
breakthroughin sensor technology,e.g., the telescope, the microscope,
detectorsfor electromagneticradiation outside the visible range, etc.
Oneof the importantthings that we know about sensors is that
theyare limitedand imperfect indicators of the world around us. They
tima/
Linear state
Estimation
499
fteJ1
makes it challenging for us to interpret what the sensor is

as well as the systems that we are
Finally, all sensors,
trying
uncontrolled
to
and
subject
random
are
effects.
ling '
fundamental problems in systems engineeringis to detaking these imperfect measurements of imperfectly
for
methods
ran
Wemay decide in some situation that a change in a
sensor's
disturbance.
indicates that the system has changed. But we may designal
sensor
Sl
a different or disturbance to the sensor itself,
ill
and the system
cide random effect
a
unchanged. Optimally combining these two sources of
completely
is
what the sensor tells us and the other knowledgethat we
formation:
n
the system's behavior, is the task of state estimation.
about
have
these concepts precise, we consider a linear system. Let
make
To
an Il-vector containing all the relevant information about a
be
X
interest
systemof
x + = Ax + Bu
are the input variables that also affect the evolution

Theu variables we control the inputs, they are called
If
actuators, i.e.,
of the system. chemical plant. If the inputs are not
controlled by us,
thevalvesin a
as disturbances, and often given another letter to intheyare regarded
e
dicatethis difference. We use w Rn to represent the disturbances.
Becauseof the central limit theorem, these will be considered normally
distributedrandom variables with zero mean and variance Q. The dynamicmodel is then
x + Ax + Bu + w
Theinitialstate of the system xo is also generally unknown and willbe
considereda normally distributed random variable with mean xo and
varianceP(O). Now we consider the sensors. Let y e RPbe the p-vector
ofavailablemeasurements. Normally p < n indicating that we are not
measuringeveryrelevant property of the system. Becausesensors are
expensive,
often p < n indicating that we have a complex system with
manystates, but are information poor with few measurements. The
sensoris also affected by random disturbances, which we denote by v.
Because
the input u is considered known, we can removeit from the
500
Processes
model for simplicity without changing any important features of

the
state estimation problem. The linear model of interest is then
x + = Ax + w
and the disturbances and unknown initial condition satisfy
If the measurement process is quite noisy, then R is large. If the

measurements are highly accurate, then R is small. Similar COnsidera_
tions apply for the process noise, Q. If the state is subjected to large
disturbances, then Q is large, and if the disturbances are small,Q is
small. Again we choose zero mean for w because the nonzero mean
disturbances should have been accounted for in the system model. The
variance P (0) reflects our confidence in the initial state. If we know how
the system starts off, P (0) is small. If we have little knowledge, we take
P (0) large. Recall the noninformative prior is a uniform distribution,
which we can approximate by taking P (0) very large. In industrial applications, the initial condition may be known with high accuracyfor
batch processes. But the initial condition is usually considered largely
unknown when analyzing a dataset taken from a continuous process.
We require three main results concerning normals, conditionalnormals, and linear transformation. These follow directly from the properties of the normal established in Chapter 4, but see Exercise5.24for
some hints if you have difficulty deriving any of these. Recallalso the
normal function notation (4.13)

n(x,m,P)
exp
(x
m)
which was introduced in Chapter 4, and will be used frequently in the

following discussion.
Joint independent normals. If pxlz(xlz) is normal, and y is statistically independent of x and z and normally distributed
Pxlz(xlz) = n(x, mx,Px)
N(my,Py)
y independentof x and z
timal
5.4
Linear
te
teJ1
State Estimation
501
density of (x, y) given z is

conditionaljoint
px,ytz
(x,ylz) = n(x, mx,Px) n(y,my,Py)
px,ylZ
mx
PX 0
Y ' my
(5.64)
tion of a normal. If x and z are jointly normally dis-
conditional density pxlz(xlz) havingmean m and

with mean Am and varianceAPAT
pylZ
pxlz(xlz) = n(x, m, P)
Pylz(ylz)
normal.
of a joint
conditionalis normal
givenz
x z
thenthe conditional
y = Ax
(5.65)
If the joint conditional density of (x, y)

x
PX Pxy
Y ' my ' Pyx y
density of x given (y, z) is also normal
= n(x,m,P)
in which
-m= mx +PxyPy- (y -my)
(5.66)
P=Px-P xyp-lp
y yx
mean m is itself a random variable because

Notethat the conditional variable y.
the random
it dependson
5.4.2 Optimal Dynamic Estimator
Wehavespecifiedthe random process of interest
x + = Ax + w
(5.67)
(5.68)
withknown densities
N (0, Qw)
502
Stochastic Models
and
Processes
We
next derive the optimal estimator for this process.
this derivation, we will derive the probability densities of As part
function of time. This is the same pattern that we follo the state asOf
wed
a
two sections on Brownianmotion and stochastic kinetics. in the first
the random process (Wienerand Poisson processes), Westarted
derived their probability density equations (Fokker-plank and then we
and
Because we have assumed a prior, the density of x (0),
we
Bayesian estimation. The overall game plan is as follows. are using
state x(0) is assumed normal. Our optimal estimate beforeThe initial
ment is denoted R- (0). The minus sign indicates estimate measure.
before
surement. We obtain from the sensor measurement y (0).
Wethen
compute the conditional density of x(0) ly(0). We
show that is also

normal. The maximumof that conditionaldensity is our
optimal
timate after measurement, denoted R(O). We are combining
surement with the prior to calculate the posterior. Then wethe mea_
random process (5.68)to forecast the state forward one timeuse the
obtain x(l). We show that the density of x(l) (conditionedonstep to
is also normal,3 and the maximum of that density is our
estimate
k = 1 before measurement, R-(1). Then we add measurement at
y(l)
and compute the conditional density of x (1) Iy
y ( 1); its maximum
gives R(l), and we continue the iteration. So now we fill in the
details.
Combining the measurement. We start off at k = 0 with estimate

R- (0) = xo and consider the effect of adding the first measurement.
We obtain noisy measurement y (0) satisfying
y(0) = cx(0) + v(0)
in which v (0)
N(O,R) is the measurement noise. Giventhe measure-
ment y (0) , we next obtain the conditional density PX(0)b, (0)(x (0) Iy (0)).
This conditional density describes the change in our knowledgeabout
x(0) after we obtain measurementy(0). This step is the essenceof
state estimation. To derive this conditional density, first considerthe

pair of variables (x(0),y(0)) given as
x(0)
y(0)
We assume that the noise v (0) is statisticallyindependent of x(0),
and use the independent joint normal result (5.64)to express the joint
3Because we have linear transformations of normals at each step of the procedure,
every density in sight will be normal,
tuna/
Linear
503
State Estimation
X(0)
xo
o
(2(0) o
pair (x(0),y(0)) is a linear transforequation, the

Therefore, using the linear transforprevious (O),v
te
gives
(5.65), and the density of
tne
+R
CQ(O)
Y(O)
the conditional of a joint normal

density, we then use
joint
obtain
(5.66)to
(x(0)ly(0)) = n
result
ill
which
+ L(O) (y(0) CRO)

-1
P = (2(0)
density
the conditional
is normal. The optimal
weseethat is the value o

stateestimate
that is the mean, and we choose R(O) = m. We
normal,
a
For
variance in this conditional after measurement y(0)
the
denote
also
previous equation. The change in
the
in
given
P
with
byP(O)= P measurement (Q(O) to P (0)) quantifies the information
after
variance
measurement y (0). The variance after measureobtaining
by
increase
than or equal to Q(O),which implies that we

ment,P(O),is always less
measurement; but the information gain
canonlygain information by
device is poor and the measurement
maybe smallif the measurement
noisevarianceR is large.
Forecastingthe state evolution. Next we consider the state evolution
fromk = 0 to k = 1, which satisfies
x(0)
x(l) = [A 11
inwhichw(0) N(O,Q) is the process noise. We next calculate the
conditional
density
Now we require the conditional version
504
Processes
We assume that the process noise

(x(0),w(O)).
of the joint density
both x(0) and v(()), hence it
of
independent
is
w (0) is statistically
combination
linear
of
a
is
x(()) and
which
also independent of y(0),
to obtain
v (0). Therefore we use (5.64)
R(O)
x(0)
w(0)
of the linear transformation of a
We then use the conditionalversion
normal (5.65)to obtain
in which the mean and variance are
R-(1) = AR(O) P-(1) =
+Q
We see that forecasting forward one time step may increase or decrease
(O)AT may be smaller
the conditional variance of the state. The term AP
or larger than P(O),but the process noise Q always makes a Positive
contribution.
is also a normal, we are situated to add meaGiventhat
surement y(l) and continue the process of adding measurements followed by forecasting forward one time step until we have processed
all the availabledata. Becausethis process is recursive, the storage requirements are small. We need to store only the current state estimate
and variance, and can discard the measurements as they are processed.
The required online calculation is minor. These features make the optimal linear estimator an ideal candidate for rapid online application.
We next summarize the state estimation recursion.
General time step k. Denotethe measurement trajectory by
y(k) = {y(0),y(1),...y(k)}
At time k the conditionaldensity with data y(k 1) is normal
and we denote the mean and variance with a superscript minus to in-
dicate these are the statistics beforemeasurement y(k). At k = 0,

the recursion starts with R- (0) = Xo and P- (0) = Q(O) as discussed
previously. We obtain measurement y (k), which satisfies
x(k)
tuna/
Estimation
State
Linear
505
v(k)) follows from (5.64)since

of (x(k),
measurement
of x(k) and
1)
n dependent
flie
11
x(k) N
oise
gives the joint density

then
(5.65)
x(k) -N ( CR-(k) ' CP-(k)
y(k)
note
reslilt
ill
l),y(k)} = y(k), and using the conditionaldensity

-{y(k
(5.66)
which
gives
(x(k) ly(k)) = n (x(k),R(k),P(k))
L(k) (y(k) - CR-(k))

R(k) = R-(k) +
from k to k + 1 using the model

forecast
We
x(k)
x(k+ 1) = [A I]J w(k)
w(k) is independent of x(k) and y(k), the joint density of

Because
from a second use of (5.64)
(x(k),w(k)) follows
x(k)
w(k)
anda second use of the linear transformation result (5.65)gives
n(x(k +
in which
AR(k)
andtherecursion is complete.
+ 1))
Stochastic Models
506
and P
rocesses
Summary. We place all the required formulas for im
optimal estimator in one place for easy reference. The iniplementing

tial
th
for k = 0 are
condition
P-(0) = (2(0)
es
R-(0)
The update equations for time k
R(k)
0 are
- CR-(k))
R-(k) +
(k)C T + R) -l
(k)cT +
P-(k)
AR(k)
(3.69)
(5.70)
(3.71)
(3.72)
The full densities of the state before and after measurement
+ l)ly(k))
(x (k)ly(k))
(3.73)
are
n(x(k),R(k),P(k))
These formulas provide the celebrated Kalman filter (Kalman,

One of Kalman's key contributions was to use the state-space1960).
model
to describe the system dynamics. As we see here, after that step,
the
solution of the optimal filtering problem reduces to a few well-known
results about normals, linear transformation, and conditional density.
One of the main practical advantages of the Kalman filter is the ex.
tremely efficient implementation. One can update and store the conditional mean and variance with only a few matrix multiplicationsand
finding one matrix inverse. This efficient recursion makes the Kalman
filter ideal for online state estimation where one would like to findthe
optimal estimate in real time as the sensor measurements become available.
5.43 Optimal Steady-StateEstimator

Notice from (5.70)that the optimal estimator has a time-varyinggain,
L(k) , coming from the time-varying recursion for P (k) and P- (k), given
by (5.71) and (5.73). If we are willing to give up a small amountof

performance during small initial times, we can obtain an even simpler
filter. Assume for the moment that these recursions convergeto a
steady state. The steady state then satisfies
Ps =
- Ps-CT(CPs-CT + R) -I CV
Ps- = APSAT + Q
Linear State
ti1a/
tile
flie
steady-s
Estimation
507
ps from
the first equation into the second equationand
tate filter
gain then follows from (5.70)
(5.72)giving
and
(5.69)
Jfli11g s-(k + 1) = AR-(k) +
- CR-(k))
steady-state filter gain, Ls. Online one has to store only

the
computes estimate R- (k), and implement a few matrix-vector mulandcurrent vector additions after y(k + 1) is measured to obtain
tiplicationsand R - (k + 1). We have an ideal algorithm that combines
estimate,
thenext
storage requirements and extremely fast computation
small
extremely steady-state Kalman filter ideal for many applications
in
makingthe
disciplines.
engineering
many
problem, including state estimator design, we usuIn any design
conflicting, design
many,
allyhave
sometimes
objectives. Optimality is
objective. But we would also like some perforcertainlyone desirable
estimator. For example,if the disturbances

manceguarantees on the
does the estimate error become small as we
to the system are small
formulate this objective as a stability

collectmore measurements? We
questionin the final section. To motivate that discussion, consider the
followingcase: A = I, C 0, i.e., the system is an integrator and we are
notmakingany measurements. Even without disturbances, the system
evolutionis x + = x, and therefore x(k) = xo for all k 0. But (5.70)
givesthat LS 0, so the estimator equation is x = R and therefore

(k) = Xofor all k 0. Since the RVx(0) is not necessarilyat its
mean,x(0) xo, and we see that the state estimate does not converge
to the system state no matter how many "measurements" we make.
Thissystemneeds to be redesigned before we can obtain a state esti-
matorthat converges to the system state. It is clear what is wrong with
thissystemsince C = 0 provides no information from the sensor,but
todetectall such badly designed systems, we introduce the concept of
observability.
508
Stochastic
Models
and P
5.4.4 Observability of a Linear System

The basic idea of observability is that any two
distinct
ocess
states
pler First of all,

input is irrelevantand we can set it to zero.
the
Th
linear time-invariant system (A, C) with zero
th
ances
x + = Ax
Y = cx
with initial conditionx(0) = xo. The solution for
the state
Akxo, and the output is therefore
is X(k)
y(k) = CAkxo
(5.74)
The system is observableif there exists a finite N,
such that
xo, N measurements
1)} distinguish for every
the initial state xo. As
in Exercise 5.26, if we
uniquely
the initial state using n measurements, we cannot cannot determin
determineit using
develop a convenient
observability as follows. For n measurements, the system
test for
modelgives
CA
xo
(5.75)
The question of observabilityis therefore a question of

uniqueness
of
solutions to these linear equations. The matrix appearing
in this equation is known as the observabilitymatrix O
c
CA
(5.76)
From Section 1.3.6 of Chapter 1, we know that the solution to (5.75)is
unique if and only if the columns of the np x n observabilitymatrix
tima/
Linear State
Estimation
509
the system (A,
Therefore,
C) is
section that observabilityis a

next
sufficientcondition
the
stability.
stifllator
this observability analysis in a chemical
wee
illustrate
engineering
for
the following example (Ray,
contest,
we
1981, P.58).
present
observability of a chemical reactor

5.7:
ample isothermal, continuous well-stirred tank
reactor (CSTR)
an
'til
first-order
liquid-P
volumetric
flowrate Qf and tank volume VR are constant. The
f is the manipulatedvariable,
of A in the feed CA
and
concentration
0. Let
down the mass balances for species A and B and show that
Write
(a)
= Acx + Bcu
and Bc for this problem?
Whatare matrices Ac
only species A reactor concentrationwith

(b)consider measuring
sample time At > 0. What is matrix Cc in this case? Is the system
with this sampled measurement observable?
(c) Considermeasuring only species B reactor concentration. Whatis

matrix Cc in this case? Is the system with this sampledmeasure-
ment observable? Provide a physical explanationif this answer

differsfrom the answer to the previous part.
Solution
(a)Assuming constant density, the mass balances for A and B are
dt CB
-(F/V+k2)
(F/v
+ kl)
CB
CAf
510
Stochastic Models
and
Processes
We can convert this continuous time system into a dis

system by approximating the time derivative with an
dx
giving
dt
x + = Ax
+ 1) x(k)
At
y
Ccx
0
(At)kl
(b) For measuring only species A we have Cc 1 0]. We

then check
the observability matrix for the DT system, giving
1
which has rank one. Sincerank(O) < n, the system is not

observ_
able.
(c) For measuring only species B we have Cc
observability matrix
(At)kl 1-
0 1]. This givesthe

1
1<2)
which has rank two for all sample times At > 0. Sincerank(O) =
n, the system is observable.
The answers are different because measuring A tells us how much
total B we have produced, but we have no information about how
much Bwas present initially nor how much was consumed to produce C.Therefore we cannot reconstruct the Bconcentrationfrom
the model and the A concentration. Measuring species B, however, provides information about how much A is in the reactor,
because the A concentration affects the production rate of B.The
B measurement information plus the mass balances enableus to
reconstruct the A concentration. The value of the rank condition
of the observabilitymatrix is that it makes rigorous this kind of
physical intuition and reasoning.
41mproving the numerical approximation does not change the observabilityanalysis
that follows.
opt/
al
Linear
State
51 J
Estimation
Estimator
Optimal
of an
desirable characteristic,but systems engiStability
one
is
filter
such as stability. Stability
a
ut other characteristics
of
to illustrate
R(k) = x(k) R - (k)
stimate error can be givenby substitutingthe

e
the
of
evolution
AR-(k)
w(k)
+
1) Ax(k)
y(k)
system measurement
the
substitutinggives
terms
billing
- CR-(k))
Cx(k) + v (k) and com-
+ w(k) - ALsv(k)
of whether (A - ALsC)is a stable
question
the
is
stability
the unit circle.
Estimator
all its eigenvalues inside
+ 1) = (A
i.e.,has
matrix,
theorem covering the stability of the steadyfollowing
Wehavethe
stateestimator.
iteration and estimator stability). Given (A, C)
(Riccati
5.8
Theorem
> 0, P- (0) 0, and the discrete Riccati equation
R
0,
>
Q
observable,
Then
(a)ThereexistsPs- 0 such that for every P- (0)

limP-(1) = Ps-
1-+00
andPs-is the unique solution of the steady-state Riccati equation
P; = Qw + APS-A' - APs-C'(CPs-C' + R) -I CVA'

amongthe class of positive semidefinite matrices.
512
Stochastic Models
and
Processes
1.5
1
0.5
0
-0.5
-1
-1.5
-2
-1
-0.5
0.5
1
Figure 5.16: The change in 95% confidence intervals for R(klk)

versus time for a stable, optimal estimator. We start
at
k = 0 with a noninformative prior, which has an
infinite
confidence interval.
(b) The matrix A ALsC in which

LS = Ps-C'(CPs-C' + R) -l
is a stable matrix.
Bertsekas (1987, pp. 59-64)provides a proof of the "dual"of this
theorem, which can be readily translated to this case.
So what is the payoff for knowing how to design a stable, optimal
estimator? Assume we have developed a linear empirical model for a
chemical process describing its normal operation around somenominal
steady state. After some significantunmeasured process disturbance,
we have little knowledge of the state. So we take initial varianceP- (0)
to be large (the noninformative prior). Figure 5.16 shows the evolution of our 95%confidence intervals for the state as time increasesand
we obtain more measurements. We see that the optimal estimator's
confidence interval returns to its steady-state value after about only
513
the conditional variances of state given

Recall that
require the measurements. Only the optimal esSo we can assess the information
eas entdepend on the data.
before we even examine the data. But
system
sensor parameters Q and R almost alwaysneed
to be
tes our
e
data before we can perform this analysis.
feedback control to move the disturbed
use
to
plan
if we
re
optimal operating
its
to
control and therefore process performance.
our
better
fundamental topic appearing in many branches
1s a
a large literature. A nice and brief
thessttaattee'estilnation
engineering, and has
and
describing the early contributions to optimal
science
d bibliography
f
state estimation
problem for linear and nonlinear systems.
control
optimal
Ho, 1975; Stengel, 1994). The moving
Many
problem (Bryson and
estimation
estimati0n
horizonsystem nonlinearity and constraints, is presented by Rawlings
address
ch. 4).
and
(2009,
5.5 Exercises
walk with the uniform distribution
Random
5.1:
Exercise
walk simulation
againa discrete-time random
consider
(5.77)
x(k+l)
sample number, At is the sample time with t = kAt. Instead

inwhichx, w e R2,k is
ofusingnormallydistributed steps as in Figure 5.3, let w = 2N/3(u1/2) in which
1
otherwise
VS) with zero mean and unit variance. The Octave or

wethenhavethat w
MATLAB
functionrand generates samples of u, from which we can generate samples of
w withthe given transformation.
(a)Calculatea trajectory for this random walkin the plane and compare to Figure 5.3
for the normal distribution.
Stochastic Models
and
514
Processes
(b) Calculate the mean square displacement for 500 trajectories and
the normal distribution.
Figure 5.4 for
(c) Derive the evolution equation for the probability density
to
(x t)
in the limit as At goes to zero. How is this model differe'nt

th normally
steps?
distribute
d
Exercise5.2: The different diffusion coefficients D and 'D

In the chapter we compared two models for the evolution of concentratio
n
convection and diffusion processes
undergoing
= (v(x,t)c) +
and
= (v(x,t)c) +
(D(x, t)c)
x
in which we consider x, v, and D scalars. The first is derived from conservation
with a flux law defined by N = Dc/x. The second is the Fokker-Planck of mass
equation
dx = v(x,t)dt+
2D(x,t) dw
(a) Show that when the diffusivity D (x, t) does not depend on x, these two
models
are equivalent and D(t) = D(t).
(b) Showthat the Fokker-Planckequation can always be written in the following

convection-diffusion form with a modified drift term
= ((x, t)c) + (D(x,
x
and find the expression for i' (x, t).
Exercise 5.3: The diffusion coefficient matrices D and D

Repeat Exercise 5.2 but for the case in which x and v are Il-vectors and D and D are
n x n diffusion coefficient matrices
c
and
c
Exercise 5.4: Continuityof random processes

We know that the Wiener process is too rough to be differentiated, but is it evencontinuous? To answer this question, we first have to extend the definition of continuity
to cover random processes such as W(t). We use the following definition.
515
5,
ox
r(s)l s E)
for all t, s satisfyingIt sl s (5
integrating the discontinuous white-noise process smooths

that
an
one) establish continUOUS
creates a
to
It Formula and Momentsof Multidimen-
Multidimensional
5.5:
formula to derive
(5.33)and (5.34).
s
UseIto
dimensional form of It's formula, (5.38), for an SDEin the form
multi
the
perivge
dXi = Ai(x, t)dt + Bij(x, t)dWj
(b)
Recall
(c) Use
(5.20).
to derive the multidimensional versions of (5.33)and (5.34):
Lila
this form
d(Xi
X; )
dt
t=t'
d(Xi x'i)
dt
t=t'
- 2Dij(X',t')
= Ai(x', t')
Diffusionequation in one dimension with Laplacetransform

5.6:
Exercise
equation on the line
consider the
Wewishto
diffusion
= DV 2c
O < t, oo < x < 00
t
to an impulse source term at t = 0, c(x, 0) =
calculatethe response c(x, t)
(x).
already solved this problem using the Fourier transform. Here

(a)In Chapter3, we transform. Take the Laplace transform of the
one-dimensional
wetry the Laplace
diffusionequation with this initial condition and show
d 2C(x, s)
dX2
(5.78)
(b)Whatare the two linearly independent solutions to the homogeneous equation?
Breakthe problem into two parts and solve the differential equation for x > 0
andx < 0. You have four unknown constants at this point.
(c)Whichof the two linearly independent solutions is bounded for x 00?Which

of these two solutions is bounded for x -+ 00?Use this reasoning to find two
of the unknown constants.
Stochastic Modelsa
tid
516
d(x = 0+,s)
dx
(e) Use this jump condition to find
(x,s) valid for all x.
(t) Invert this transform and show
1
Processes
d(x = 0-,s)
dx
the last constant and obtain the
-x2/(4Dt)
0 < t,
e
c(x,t) = 2 7TDt
full transfo
rth
00< x < 00
State which inversion formula you are using.
(5.79)
(g) Compute the mean square displacement for this concentration profile.
p(x,t) =c(x,t)
p(x, t)x 2dx

Exercise 5.7: Random walk in one dimension
Prepare a simulation of a random walk in one dimension for D = 2. Start the particles
at x = 0 at t = 0 and simulate until t = 1000.
(a) Showthe trajectories of the random walks for five particles on the same plot.
(b) Plot the mean square displacement versus time for 1000 particles. Compare
this
result to the analytical solution given in Exercise 5.6(g). Describe any differences.
(c) Plot the histogram of particle locations at t = 1000 for 1000 particles. On
the
same plot, compare this histogram to the analytical result given in (5.79).Describe any differences.
Exercise 5.8: More useful integrals

Use the definition of the complete gamma function and establish the followingintegral
relationship
x P e -aXn dx =
For the case n = 2, this relation reduces to

x P e -aX2 dx =
(5.80)
which proves useful in the next exercises.
Exercise 5.9: Diffusion equation in cylindrical coordinates with Laplace

transform
Consider the diffusion equation in cylindrical coordinates with symmetry in the 0 coordinate
c
-
= Y rD
t rr
r 0 < t, 0 < r < 00
We wish to calculate the response c(r,t) to an impulse source term at t
erases
5
transform of the diffusion equation with this initial

Laplace
condition
tile
1 d dC(r,s)
(3)Rdes0W
-(r)
dr
D r dr
(5.81)
linearly independent solutions to the homogeneous

equation?
are
linearly independent solutions is bounded for r 00?Use
two
this
e
determine one of the unknown constants.
tile two
terval containing zero to obtain
d(r, s)
jump
Usethis
a condition on
d(r s)
r
lim
r-0-
condition to find the second constant and obtain the transform
(r,s) = 2TTD
transform and show

this
_r2/(4Dt)
(f)Invert
=
0 < t, 0 < r < co
t)
c(r,
47TDt
inversion formula you are using.
which
state
mean square displacement for this concentration profile
the
(g)compute
r 2c(r, t)r dr dO
5.10: Diffusion
Exercise
transform
the diffusion
consider
coordinates
equation in spherical coordinates with Laplace
equation in spherical coordinates with symmetry in the O and
c
= A @r 2D
t r 2 r
0 < t,
0 < r < 00
response c(r, t) to an impulse source term at t = 0,

Wewishto calculate the
(a) Takethe Laplace transform of the diffusion equation with this initial condition
and show
r2dr
s(r,s) = (r)
4Trr2
(5.82)
) Whatare the two linearly independent solutions to the homogeneous equation?

Q)Whichof the two linearly independent solutions is bounded for r - 00?Use this
reasoningto find one of the unknown constants.
l) Integrate(5.82) across a small interval containing zero to obtain a condition on

thechangein the first derivative
lim r2
lim r2
518
Stochastic Models
and P
rocesses
(e) Use this jump condition to find the second constant and obt
ain the
valid for all r.
full
transfo
rttl
(f) Invert this transform and show

1
-Y2/(4Dt)
0 < t, 0 < r <

00
State which inversion formula you are using.
(g) Compute the mean square displacement for this concentration

profile
(r2) = 4TrJ r 2c(r,t) r 2 dr
Exercise 5.11: Probabilty distributions for diffusion on the plane

This exercise prosides another view of the issues raised in Example 5.4.
Consideragain
subject to a unit impulse at the origin at t = 0.

We consider solving this equation in the plane using both rectangular
COOrdinates
(x, y) and polar coordinates (r, O).
(a) Using rectangular coordinates, let p(x,y, t) satisfy (5.30)
p
2p
2p
y2
X2
p(x,y, t) = (x)(y)
t =0
Solve this equation and show

p(x,y,t)
47TDt
(5.83)
Notice this p (x, y, t) is a valid probability density (positive, normalized).

(b) If we consider the two components (x, y) as time-varyingrandom variables
the probability density given by (5.83)
x
1
47TDt
then we say is distributed as follows

N(0, (2Dt)1)
rectangular
in which I is a 2 x 2 identity matrix. The position random variable in
coordinates is normally distributed with zero mean and covariance (2Dt)I.
519
erases
Nest
ave
define
random
new
a
variable,
for
tan-I (y/x)
17
Y
df-l(r7)
cos O
sin O
r sin O
r cos O
r cos O
r sin O
f-l(n)
7
probability density of a transformed random variflnding the

for
rule
1
Use the
and show
re -r2/(4Dt)
t) =
O,
5
pn(r,
able
4TrDt
denoted PP in Example 5.4. Calculate the marginal pr by

quantity
is the
and show
integration
re -r2/(4Dt)
21)t
are both well-defined probability densities (positive, normalized).
these
Notethat the probability density of the pair of random variables (r, O),and the
Thefirst is marginal density of the random variable r for particles undergoing
secondis the motion.
the
Brownian
Meanand variance of the Poisson distribution

5.12:
Exercise
random variable Y has the Poisson density
that discrete
Given
parameter
0 1,.. ., and
py(n) =
a e R
0, show that
var(Y) = a
5.13:Alternate derivation of Poisson process density

Exercise
the Poissonprocess probability Pr(Y(t) = n) for n 0. Showthat

consider
Pr(Y(t) = n) = Pr(Y(t) n) Pr(Y(t)n+ 1)
Youmaywant to review Exercise 4.1 (a). Using the definition of the event time Tn, show
that
(t)dt
(t)dt PTn+1
Substitute
(5.49)and use integration by parts to show (5.50)
(t)7t At
Pn(y)=
(y))
l. see (4.23).
Stochastic Models
and
520
Processes
Exercise 5.14: Generatingsamples from an exponential distribution
Let random variable u be distributed uniformly on [0, ll. Define ran dom
variable
the transformation
1
Inu
T=
Show that T has density (see Section 4.32)
PT(t) = Re-At
Thus uniformly distributed random samples can easily be transformed
nentially distributed random samples as required for simulating Poisson
Exercise 5.15: State-spaceform for master equation

Write the linear state-space model for the master equation in the extent of
the
describing the single reaction
reaction
A+B2C
(5.84)
Assume we are not measuring anything.

(a) What are x, A, B, C, D for this model?
(b) What is the dimension of the state vector in terms of the initial
numbersof
Exercise 5.16: Properties of the kinetic matrix

(a) Show that for a valid master equation the row sum is zero for each
columnof
the A matrix in (5.56).
(b) Show that this result holds for the A given in (5.55) for the reaction A +
B=c.
(c) What is the row sum for each column of the Ao matrix in the sensitivity equation?
Show this result.
Exercise 5.17: Reactionprobabilities in stochastic kinetics

Consider a stochastic simulation of the following reaction
kl
aA+bB=cC+dD
(a) Write out the two reaction probabilities hi (n j), i = 1, 1considering the forward
and reverse reactions as separate events.
(b) Compare these to the deterministic rate laws ri (cj), i = 1, 1for the forwardand
reverse reactions considered as elementary reactions. Why are these expressions
different? When do they become close to being the same?
521
5,
5.1B
of }ter
ea
ise
the mean concentration
reaction
irreversible
simple
A h B
Te
C tile
of
evolution
r = kitA
molecules and k is a rate constant. The reactor volume

p (TIA,t), the probability that the reactor volume
(n A, t) defined? Call this set N. Write the evolution

(3) equation
pefill
e the
A's
mean of
probability density by
nAp(nA, t)
(nA(t)) =
the evolution of the probability density, write an evodefinitionand(t)). The probability density itself should not appear in
(
mean.
equation for
equation for the
evolution
the usual mass action kinetics
tile
for nonlinear kinetics6
c simulation
Stochasti
5.20:
Lise reversible,second-order reaction
the
consider
C r = klCACB- k-1Cc
A+B
deterministic
(a)solvethe
with
material balance for a constant-volume batch reactor
kl = 1 L/molmin
1<-1= 1 mm
cc (0) = 0 mol/L
CB(O)= 0.9 mol/L
CA(0) = 1 mol/L
concentrations out to t = 5 min.
Plotthe A, B, and C
simulation using an initial condition of 400 A,
(b)Comparethe result to a stochastic
360Band zero C molecules. Notice from the units of the rate constants that kl
shouldbe divided by 400 to compare simulations. Figure 5.17 is a representative
for one sequence of pseudorandom numbers.
comparison
(c)Repeatthe stochastic simulation for an initial condition of 4000 A, 3600 B, zero C
molecules.Remember to scale kl appropriately. Are the fluctuations noticeable
withthis many starting molecules?
6seealsoExercise4.17 in (Rawlings and Ekerdt, 2012)
522
Stochastic
Models
Pto
cesses
0.8
0.6
0.4
0.2
cc
time (min)
Figure 5.17: Deterministicsimulationof reaction

A + B= C
compared to stochastic simulation
starting with 400
A
Exercise5.21: Whathappened to my rate?
Consider a well-mixedcontinuum setting in which we

have
centrations of reacting molecules of two types, A and B, positive, real-valued
con.
Let the concentration of A and B molecules in the volume as depicted in Figure5.18,
of
CBC).
Consider the three possible irreversible reactions betweeninterest be denotedCAO,
these speciesusingthe
elementary rate expressions
A+A
2
rl = klCA
(3.85)
Consider also the total rate of reaction

r = r 1 + r 2+Y3
(a) If the A and B species are chemically similar so the different reactions'rate
constants are all similar, kl =
=
= k, and the concentrationsof AandB
are initially equal, the total rate is given by
r = 3kc2A0
But if we erase the distinctions between A and B completely and relabeltheB
molecules in Figure 5.18 as A molecules, we obtain the new concentrationsofA
523
Exercises
0B
o
o
o
o
CBO
o
O
Species A and B in a well-mixed volume element. Con5.18:

Figure
tinuum and molecular settings.
and the total rate is then

CA= 2CA(),CB = 0
as
B
and
+ loc;
r = klCA
2 + 1<2cAcB
r = 4kCA0
two total
Whyare these
rates different and which one is correct?
analysis of the reaction rates if we reduce the length scale and

(b) Repeatyour molecular kinetic setting in which we have integer-valued n A(),
considerthe
the volume of interest.
moleculesof A and B in
(c) performa
stochastic simulation of the molecular setting using the following
parameters
nAO= 50
=
nB()= 60 nco =
kl = = = k = 10sec-1
=0
code.
Makea plot of all species versus time. Print the plot and the simulation
Exercise5.22:Cumulativedistribution for the omega expansion
density in the omega expansion

Giventhe governingequation for the fluctuation
ll=
2kcE Il) +
(kc2 n)
Definethe cumulative distribution
(a) Derivethe PDEgoverningF's evolution. What are the correspondingboundary

conditionsand initial condition?
Stochastic Models
524
and P
rocesses
VAi
Figure 5.19: Molecular system of volume V containing
moleculesof
(b) Solve this PDEnumerically and compare to Figure 5.15 in the

text.
Increase
holding co = no/Q fixed and describe the effect on F.
Q
Exercise 5.23: Properties of the Maxwell-Boltzmann distribution
Consider the simple molecular system depicted in Figure 5.19 with a

large number
of ideal gas molecules of species A with molecular weight m A. The
system
volume
1, 2, ...n. A velocity vector is
denoted v
vx vy vz with corresponding x, y, z components. These velocities
are consid_
ered samples o a random variable with fixed and known distribution.
The Maxwell-Boltzmanndistribution for the zero mean fluctuation velocity
in an
ideal gas is
2 TrkBT
in which m is the molecule mass, T is absolute temperature, and kB is the Boltzmann

constant, kB = R/NAv. This distribution is a multivariate normal with zero meanand
I, which we write as
variance matrix
Denote the A species mean velocity (drift term) as VA. The A molecule velocitiesare
then distributed as
t'Ai
kBT
VA,1
all i
(5.86)
Starting from the distribution (5.86), derive the following expectations in terms of the
mean species velocity VAand kB,T, m A.
1. (vAi)
2. (VAiVAi)
2 in which v
3. 'E(vAi)
4. (VAiVAi)
= t'Ai VAi = VAi VAi
525
Exercises
24: The normal's
tor.
(a) For
(5.64), use
properties for optimal linear estimation

used in deriving the
linear estima-
the independence of y to establish that
both
and divide
= px,z(x, z)py (y)

sides by p: (z).
we are given that (x, z) is jointly distributed as

(5.65),
(b) For
X mx
PX Pxz
px,z(x,z) = n
linear transformation
Considerthe
Y
z
and show
that
conditional
Nowuse the
Amx
mz
A ox
01
z
APxAT APxz
pzxAT
density formula to obtain pylz.
note that this property is derived in Example4.20.

(c) For (5.66),
Observability, controllability, and duality

Exercise5.25:
of controllability presented in Exercise 1.26. Show that (A, C) is

Reviewthe concept
if and only if (AT,CT) is controllable. This result marks the beginningof
observable
story of the duality between regulation and estimation.
theinteresting
with N measurements
Exercise5.26:Observability
Considerthe linear system
x + = Ax
y = Cx
text that if x(0) cannot be uniquely determined by n

provethe statement made in the
... y (n 1) }, then it cannot be determined by N measuremeasurements{y (0),
ments for any N.
Bibliography
K. J. strm. Introduction to Stochastic Control

Theory.
D. P. Bertsekas. Dynamic Programming. Prentice-H

all,
New Jersey, 1987.
Of
eti%
Academic
Inc.,
Englewood
R. N. Bhattacharya and E. C. Way-mire. Stochastic

Pr
Society for Industrial and Applied Mathematics, ocesses With
Cliffs
Philadelphia, Applications
2009
A. E. Bryson and Y. Ho. Applied Optimal Control.
Hemi
sphere
York, 1975.
Publishing,
W. M. Deen. Analysis of Transport Phenomena.

Topics in chemical
New
engineering.
A. Einstein.
ber die von der molekular
Wrme geforderte Bewegungvon in ruhenden -kinetischen Theorie
Flssigkeiten
der
suspendierten
C. W. Gardiner. Handbook ofStochastic Methods for
Physics, Chemistry,
Natural Sciences. Springer-Verlag, Berlin, Germany,
and the
second edition,
1990
M. A. Gibson and J. Bruck. Efficient exact stochastic

simulation of chemical
systems with many species and many
channels. J. Phys. Chem.

A, 104:
D. T. Gillespie. Exact stochastic simulation of coupled

chemical reactions.J.
Phys. Chem., 81:2340-2361, 1977.
A. H. Jazwinski. Stochastic Processes and Filtering Theory.
AcademicPress,
New York, 1970.
T. Kailath. A view of three decades of linear filtering theory. IEEETrans. Inform.
Theory,
March 1974.
R. E. Kalman. A new approach to linear filtering and predictionproblems.

Trans. ASME,J. Basic Engineering, pages 35-45, March 1960.
526
527
Bibliography
Numerical solution of stochastic differential equaand E. platen.

Kloeden Verlag, Berlin, 1992.
Springer
tions.
v. Interpolation and extrapolation of stationary random seN.
KolrnogoroMoscow Univ., USSR,Ser. Math. 5, 1941.

Bull.
quences.
between stochastic and deterministic models for

The relationship Phys.,
1972.
reactions. J. Chem.
nd J. G. Ekerdt. Chemical Reactor Analysis and Design Fundaa

Rawlings
publishing, Madison, WI, second edition, 2012.
J. B.
Nob Hill
mentals.
edictive Control: Theory and Design.
Ray. Advan
W.H.
ced Process Control. McGraw-Hill,New York, 1981.
Mathematical Control Theory. Springer-Verlag, New York, second

sontag.
E.p.
1998.
edition,
R.
Op timal Control and Estimation. Dover Publications, Inc., 1994.

Stengel.
F.
Kampen.
N.G. van
Stochastic Processes in Physics and Chemistry. Elsevier
Amsterdam, The Netherlands, second edition, 1992.

sciencePublishers,
Extrapolation, Interpolation, and Smoothing of Stationary Time
N.Wiener.The
Applications. Wiley, New York, 1949. Originally
series with Engineering
MITRad. Lab. Report in February 1942.
issued as a classified
MathematicalTables
A. 1 Laplace Transform Table

The Laplace transform pairs used in the text are collected
stated.
f(t)
1
2
3
af(t) + g(t)
Page
cxf(s) + (s)
105
df(t)
dt
d2f(t)
105
s 2f(s)
s f (0)
dttf(t)
n- t
-f(sj
6 tnf(t)
e-aSf(s)
eatf (t)
i
f( - l ) (0)
105
105
105
dsn
7
8
in
TableA
they are
derived or 1
first
105, 225
106
106
t')dt'
106, 223
continued on next page
528
apiace
L
I
A,
from previous
continued
10
11
12
13
529
Transform Table
lini f(t)
lini f(t)
page
f(s)
initial value theorem
limsf(s)
106, 224
final value theorem
limsf(s)t
106, 224
00
00
107
113
fl(t)
(t)
113
1
107
107
s?t+l
16
18
page
107
-1
eAt A e
1
109
107
19
teat
20
sin (0t
(0
s 2 + (02
107
21
cos (0t
s2 + (.02
107
22
Sinh (0 t
s 2 (0 2
23 cosh wt
s 2 (0 2
(0
24
eat sin cot
(s
25
eat COSwt
(s
P(s)
a(s)
26
p(sn)
q (sn) simple zero
n=l
27 E
E anit
28
e-qr
n=l
2
7Tt 3
q(sn) zero of order rn
107
107
+ (02 107
+ (02 107
a(s)
e-k$ k > 0
308
308
330
530
Mathematic
QI
Qbles
continued from previous page
f(t)
1
29
30
e-T
erfc (2ka
ekv/
31
k (N
erfc
ekvaerfc
32
33
34
e-.f
k2
331
sinh(xv'k)
Sinh
35 1-2 E
37
e-
n=l
36
Ko(kv)
sinh(xvk)
(1 ) tl + l rn
n2Tr2t
sin(nrx)
e
nrrx
(l)tt+l
1-2 E
n (n + 1/2)TT
1-2 E
anJ1 (an)
38 2
39
n=l
sin(nrx)
e -(n27T2+k)t
s sinh s + k
333
sinh(xv)
xs Sinh
335
+ 1/2)7TX)
e
cosh(xx/)
S cosh
333
e-ant
10 (x VS)
n+1sin(nra) sin(nrb) cos(nrt)

(1)
n7T sin(nrra) sin(nTrb)sin(nrt)
slo(v)
sinh(as) sinh(bs)
sinhs
sinh(as) sinh(bs)
s sinhs
Table A. 1: Larger table of Laplace transforms.
t Final value exists if and only if sf (s) is bounded for Re(s) 0.
ani =
331
335
314
341
statistical
531
Distributions
Statistical
Distributions
distributions that have been discussedin the

probability
different
in Table A.2 with a reference to the page in the
ummarized
Tile s
are
first mentioned.
are
they
test
t where
Density
Page
p(x)-l/(b-a) xe [a,b]
382
pistribution
uniform
I (x-m) 2
p(x)
normal
multivariate
352
p(x) = -------r-rexp
358
normal
exponential
Axxo,
p(x)
11>0
478
481
Poisson
12
441
rot/2)
chi
chi-squared
p(x) =
n/2-1e-X/2
rot/2) x
(xn)nmm
(xtt+tn)l+tn
p(x)
441
x 20, nl
x0, n,ml
442
n+l
Student'st
multivariate t
x
n
p(x) =
p(x) = (nr)P/2
Wishart
441
m))
1211/2
IR12rp(2)
e-ltr(R-lX) X > 0
412
12
Maxwell
X
p(x) = x 2e-2
Maxwell-
pu(ux, uy, uz) =
524
+19
Boltzmann
Table A.2: Statistical distributions defined and used in the text and
exercises.
447
524
532
Mathematical
Qbles
A.3 Vector and Matrix Derivatives

Definition. First consider s(t), areal-valued, scalar fu
valued scalar, s : R R. Assume the derivative,
wish to extend the definitionof the derivativeto vect

valued functions of vector and matrix-valuedargument?r
e functions
respect to scalars, vectors, and matrices, can be conveniently
With
expressed
using the rules of vector/matrix operations. Other
derivative
ed functions
with
vectors and matrices, produce tensors having more than respectto
two indices
ulas that can be
in matrix/vector calculus. To state how the derivatives expressed
are
into vectors and matrices, we require a more precise notationarranged
used in the text. Moreover,severaldifferent and conflictingthan we
tions are in use in different fields; these are briefly described
A.3.1. So we state here the main results in a descriptive in Section
expect the reader can translate these results into the notation,and
conventionsof
We require a few preliminaries. Nowlet s(x) be a

function of vector x, s : [VI-YR. Assume that all partialscalar-valued
derivatives
s/xi,i 1,2, ... , n exist. The derivative ds/dx is then
definedas
the column vector
s
X1
ds
X2
dx
scalar-vector derivative
xn
The derivative ds/dx T is defined as the corresponding row vector
ds
s
X1
X2
xn
and note that (ds/dx) T = ds T/dx T = ds/dx T. Next let s(A) be a

scalar-valued function of matrix A, s :
R. Again, assumingall
I AIIof the formulas in this section are readily extended to complex-valuedfunctions
of a complex variable.
vector
al
atrix
and M
derivatives
s
Derivatives
ds/dA is then defined as

exist, the derivative
s
A1n
s
A2n
,A11
ds
A21
scalar-matrix derivative
533
JAtl
case, we define ds/dAT as the transpose of this result,

vrector
illthe
(ds/dA)T. These more general matrix derivatives do
ds/dA the two vector derivatives previously defined.
to
specialize
vector-valued function of a vector, f (x). Let f :
the
is
Nestup
of most interest is usually the Jacobian matrix, which
quantity
The
df/dxT, defined by
by
denote
fl
fl fl
B}21
X1
dxT
X2
X/{
f2
xn
X2
fm fm
X1
vector-vector derivative
(Jacobian matrix)
xn
X2
serves as a convenient reminder that the colThenotationdf/dxT

distributed down the column and the row vector x
umnvectorf is
the entries
in the Jacobian matrix. The

the row in
isdistributedacross
of the Jacobian is simply dfT/dx = (df/dxT)T, which is
transpoSe
easyto remember.Note that df/dx is a long column vector with mn
entriescomingfrom stacking the columns of the Jacobian matrix. This
have
isthevecoperator, so we
df
dx
Thetranspose,denoted dfT /dxT, is a long row vector. These vector

of the derivatives are not usually of much interest comarrangements
paredto the Jacobian matrix, as we shall see when we discuss the chain
rule.
Innerproduct. The inner product of two vectors was defined in Chapter I
n
(a, b) = a T b =
aibi
a, b e
534
Mathematical
Qbles
We can extend this definition to linear spaces of m
atrices as
follows
Because tr(C) = tr(CT) for any square matrix C, the

matrix inner
uct can also be expressed as (A,B) = tr(BTA), which is
valid also
in th
Chain rules. One of the most important uses of these

mulas is a convenient expression of the chain rule. Forderivative
scalar-value
d
ds
ds
(YR, ax)
ds
dx
dA) = tr ( {SAT
scalar-vector
scalar-matrix
Notice that when written with inner products, these two

formulasare
identical. The vector chain rule can be considered a special
caseof
the matrix chain rule, but since the vector case arises
frequently
applications and doesn't require the trace, we state it separately. in
For
vector-valued functions we have one additional form of the
chainrule
dx
vector-vector
which is a matrix-vector multiplication of the Jacobian matrix of f
with
respect to x with the column vector dx. Becausedf is a vector,this

chain rule is not expressible by an inner product as in the scalarcase.
But notice the similarity of the vector chain rule with the second equalities of the two scalar chain rules. Because of this similarity, all three
important versions of the chain rule are easy to rememberusingthis
notation. There is no chain rule for matrix-valuedfunctions that does
not involve tensors.
Finally, we collect here the different matrix and vector differentiation formulas that have been used in the text and exercises. Theseare
summarized in Table A.3, with a reference to the page in the text where
they are first mentioned or derived.
vector
and
Derivative
cis
(chainrule l)
dx
ds
--dA)
ds tr
df
Page
Formula
ds dxT
535
Derivatives
trix
Ma
(chainrule 3)
----dx
dxT
g
lix
(chain rule 2)
dgT
(product rule)
+ f
dx
d x Tb = b
dx
d bTx b
6
dxT
d x TAx = Ax + ATx
dx
dx
10
p(A) =
dt
q() = p()
= det(A)tr A-IA), detA*0

11 detA
dt
dt
12 tr(p(A))
q() =
detA = (A-I ) TdetA,
14
In(detA)
15
T
tr(AB) = tr(BA) = B
16
tr(A TB) = tr(BA T)
= (A-I ) T, detA
328
431
detA
13
328
431
431
536
Mathematical
Qbles
continued from previous page
Derivative Formula
Page
tr(ABA T) = A(BT + B)
18
tr(A TBA) = (B + BOA
Table A.3: Summary of vector and matrix derivatives

used in the text and exercises; s, t e R, definedand
x, b
f() and g() are any
differentiable functions, and p() is any matrix
function defined
A.3.1 Derivatives: Other Conventions
Given the many scientific fields requiring vector/matrix

chain rules, and so on, a correspondingly large number derivatives
of different
and conflicting notations have also arisen. We point out
here someof
the other popular conventions and show how to translate
them intothe
notation used in this section.
Optimization. The dominant convention in the optimization
field
to define the scalar-vector derivative ds/dx as a row vector instead is
of
a column vector. The nabla notation for gradient, Vs, is then used
to
denote the corresponding column vector. The Jacobian matrix is then
denoted df/dx.
literature
So the vector chain rule reads in the optimization
dx
dx
optimizationconvention
Given that ds /dx is a row vector in the optimization notation, the first
scalar chain rule reads
ds =
ddSx)T
ds
-
dx
dx
optimization convention
vector
trix
and Ma
Derivatives
537
with adopting the optimization field's conventions

lem
prob
iggest considering the scalar-matrix derivative. The derivative
meaning in the optimization literature as that used
same
bas the
test.
d
s(iS
test
So th
optimization convention
ds dA)
in the chain rules: the scalar-matrixversion
sistency
incon
the
contains a transpose and the scalar-vector and vectorNotice
do not. The burden rests on the reader to recall these
the chain rule and remember which ones require the
forms of
of the notation used in this section is that all
The advantage
transpose.
with a transpose, which is what one might anticipate
appear
rules
chairlchain rule's required summation over an index. Also, in the
the
dueto used in this section, the V operator is identicalto d/dx and
notation
a transpose should be taken. Finally,there is no hint
implies
neither
optimization
and which a row vector. The notation used in this

vector
bea column and ds/dxT, makes that distinction clear.
ds/dx
section,
of physics (transport phenomena, electromagnetism)
theories
Field
Chapter 3, the literature in these areas primarily uses Gibbs
in
noted
AS
notation and index notation. For example, the derivative
vector-tensor
functions with regard to a vector argument x is
scalar
a
of
in the
vs or x
In Cartesian
coordinates
x i
co
Xi
Thederivativeof a scalar s with respect to a tensor argument is similar

s
Thederivativeof a vector function f with respect to a vector x is
Vf
or
where
( Vf)ij =
x ij
Xi
538
Mathematical
Tables
so
is the transpose of the Jacobian. Therefore, the
chain
becomes
rule
T
vf
=
(Vf)
df = dx
dx
Consistent with this notation, one can write the Taylor-series
sion of a vector field f around the origin as
expans
f (x) = f (O) + x Vf + xx : V Vf +
where derivatives are evaluated at the origin and

2fi
- XjXk
(VVf)jki
One must beware, however, that this ordering of indices is

not used
universally, primarily because some authors write the Taylor
expansion
f (x) = f (O)
x + K: xx
where
XJ
Kijk
XjXk
A.4 Exercises
ExerciseA.l: Simpleand repeated zeros
Assume all the zeros of q(s) are first-order zeros, rpt = 1, n = 1, 2, ... , m, in entry 27
of Table A.l, and show that it reduces to entry 26.
Exercise A.2: Deriving the Heaviside expansion theorem for repeated roots
Establish the Heaviside expansion theorem for repeated roots, entry(27) in TableA.l.
Hints: Close the contour of the inverse transform Bromwichintegral in (2.7)to the
left side of the complex plane. Show that the integral along the closed contour except
for the Bromwichline goes to zero, leaving only the residues at the singularities,i.e.,
the poles s sn, n = 1,2,...,m. Since

has no singularities,expandit in a
Taylor series about the root s = sn, Find the Laurent series for f (s) and showthat
the residues are the coefficientsain given in the expansion formula. Note that this
procedure remains valid if there are an infinite number of poles, such as the case with
a transcendental function for Q(s).
539
4
Exercises
relations
Laplacetransform
Table A.l and show that it produces entry 35.
A.3:
in entry 34 of
lirnit k
invalid derivative formulas

Some
AA:
--bTx = b
dx
x Tb
bT
db transposing the scalar numerators in entries 5 and 6, respecdo not find companion forms for these listed in Table A.3
co u
matrix B replaang
b with general matrix B above does not generate
at simply replacing
show th
and
tises formulas
correct
d BTx
ax
xTB
to use the vec operator to express the correct formulas. Next

want
may
you
Notethat the correct matrix versions of these derivatives do reduce to the above forshowthat B b, a column vector.
mulasfor
Companion trace derivatives

A.5:
Exercise
to establish that Formulas 15 and 16 in Table
the fact that tr(AB) = tr(BA)
(a) Use
formulas, i.e., assuming one of them allows you to establish
A.3 are equivalent
the other one.
show that Formulas 17 and 18 are equivalent by taking trans(b) on the other hand, to produce the other one.
posesof one of them

Modeling and Analysis of Principles For Chemical and Biological Engineers

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Modeling and Analysis of Principles For Chemical and Biological Engineers

Caricato da

Copyright:

Formati disponibili

Modelingand Analysis Principles

Modeling and Analysis Principles

James B. Rawlings, and

by Nob Hill Publishing, LLC

orders@nobhi11publ i shi ng.com

Library of Congress Control Number: 2012956351

ModelingandAnalysis Principles for

Chemicaland Biological Engineers \

by Michael D. Grahamand James B. Rawlings

Includes bibliographical references (p.) and index.

To my father and the memory of my mother.

Tomy graduate students, who have been some of

courageinstructors and students to browse the exercises. Manyof

numerical stability of the Verlet algorithm used in molecular dynamics

Duringtheir undergraduate education in chemical and biological

we believe the text contains a healthy mix of funshort proofs.

and tools, because these guide analysis and underprinciples,

Wehavehad numerous helpful discussions with colleagues on many

topicscoveredin the text. JBRwould like to acknowledge especially

Severalcolleaguesgave us helpful reviews of book chapters. We

wouldlike to thank Prodromos Daoutidis, Tunde Ogunnaike, Patrick

respondedto a surveythat we conducted to gather information on

ing graduatestudents. Their

1.12 Length, Distance, and Alignment .

1.1.3 Linear Independence and Bases

1.22 Transpose and Adjoint .

12.4 Gram-Schmidt Orthogonalization and the QRDe-

1.32 SolvingAx = b: LU Decomposition

1.3.4 Rank of a Matrix .

1.3.11Linear Coordinate Transformations

22.3 QualitativeDynamicsof Planar Systems

22.5 Delta Function

2.33 SeriesSolutionsand the Method of Frobenius

2.4 Function Spaces and Differential Operators

2.42 Self-AdjointDifferentialOperators and SturmLiouville Equations

2.4.3 Existenceand Uniqueness of Solutions

2.5 LyapunovFunctions and Stability

2.5.3 Applicationto Linear Systems

Calculus and Partial Differential Equations

and Partial Differential Equations

3.32 Separation of Variables and Eigenfunction Expansion with Equations involving V2

3.3.7 Laplace Transform Methods .

3.4 Numerical Solution of Initial-Boundary-Value

3.4.1 Numerical Stability Analysis for the

4 Probability, Random Variables, and Estimation

4.4.1 Linear Transformation

4.7.3 Vector of Measurements y, Different Parameters Corresponding to Different Measurements,

Models and Processes

5.2 Stochastic Processes for Continuous

52.4 Fokker-Planck Equation

5.3 Stochastic Kinetics .

5.32 Poisson Process

5.3.4 Master Equation of Chemical Kinetics .

53.5 Microscopic, Mesoscopic, and MacroscopicKinetic Models

5.4 Optimal Linear State Estimation

5.4.3 Optimal Steady-State Estimator

A.3.1 Derivatives: Other Conventions

1.1 The four fundamental subspaces of matrix

1.2 Least-squaressolution of Ax = b; projection

1.4 The four fundamental subspaces of matrix

regimes for the planar system dx/dt =

2.2 Dynamicalbehavior on the region boundaries for the

2.4 Functionf (x) = exp (