Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
I. Differential equations
1. Linear equations
(a) Systems of linear equations & matrix-vector multiplication
(b) Matrix multiplication
(c) Solving systems of linear equations by elimination
(d) Inverse matrices
(e) Symmetric and orthogonal matrices
(f) Determinants
2. Vector spaces & subspaces
(a) Column space
(b) Nullspace
(c) The complete solution to Ax = b
(d) Linear independence, basis and dimension
(e) The fundamental theorem of linear algebra
(f) Projections
(g) Least squares
(h) Gram-Schmidt orthogonalization
3. Eigenvalues & Eigenvectors
(a) Intro & basic properties
(b) Diagonalizing a matrix
(c) Systems of differential equations
(d) The exponential of a matrix & solutions with inputs
CONTENTS 2
Lecture Notes
Contents
1 Differential equations and where they come from 3
8 Matrix multiplication 43
10 Inverse matrices 53
12 Determinants 63
13 Column space 71
14 The nullspace of A 81
18 Projection 105
dy
y 0 (t) =
dt
which is the rate of change of y, i.e. the slope of the curve, to y and/or t. The equation y 0 (t) = f (t, y)
means that y is changing according to the rule prescribed by f .
Example 1.1. Free fall. The downward force of gravity acting on an object with mass m is F =
mg, where g is a constant, approximately given by g = 32 ft/sec2 . If you drop a stone, how far will
it fall in 1 second? 2 seconds?
Example 1.2. The spring. Hooke’s Law for the force acting on a spring is: F = −ky
1 DIFFERENTIAL EQUATIONS AND WHERE THEY COME FROM 4
Example 1.3. Growth and decay. If the amount y of a substance grows or decays in proportion
to the amount present, we have
dy
= ky
dt
If y grows, k > 0; if y is decaying, k < 0.
Example 1.4. Newton’s Law of Cooling. The rate of change of the temperature of an object is
proportional to the difference of its temperature and the temperature of the medium.
dT
= k (Tm − T )
dt
Examples 1.3 and 1.4 are examples of separable equations, that is equations that can be written in
the form
dy
p(y) = q(t) (1.1)
dt
Recall the technique for solving a separable DE: “Multiply both sides of (1.1) by dt” and integrate.
Z Z
p(y)dy = q(t)dt
1 DIFFERENTIAL EQUATIONS AND WHERE THEY COME FROM 5
dy 1 + y2
= , y(1) = 1
dt 2ty
Example 1.6. Find the general solution to the exponential growth/decay problem (Example 1.3)
and Newton’s Law of Cooling (Example 1.4).
1 DIFFERENTIAL EQUATIONS AND WHERE THEY COME FROM 6
Example 1.7. Orthogonal projection. Differential equations also arise in many other areas
of mathematics where one wants to know how two quantities relate given that we know how they
change.
A nice geometrical example involves finding families of curves that intersect at right angles. Notice
that if we have an equation
F (x, y, c) = 0
and we hold c constant, then we have an equation in x and y. This describes a curve in the xy−plane.
The set of all such curves is a family of curves. For example, if
F (x, y, c) = x2 + y 2 − c2
F (x, y, c) = 0 ⇔ x2 + y 2 = c2
So, for each c the equation F (x, y, c) = 0 describes a circle. The equation F (x, y, c) = 0 describes
the family of all circles in the xy−plane.
F (x, y, c) = 0
Recall that if two lines with slopes m1 , m2 are perpendicular, then m1 m2 = −1, or m2 = − m11 . Thus,
dy
if curves F (x, y, c) = 0 have slopes m1 = = f (x, y), then to find the orthogonal family, we need
dx
curves with slope
1
m2 = −
f (x, y)
Since the slope is the same as the derivative, we need to solve the differential equation
dy 1
=−
dx f (x, y)
Example 1.9. Find the set of curves orthogonal to the curves y = cx4 .
Exercises
1. A stone is dropped from a height of 1600 feet. How long does it take for the stone to hit the
ground?
2. Find the solution to the IVP
y 0 = −2y, y(0) = 1
How long does it take for the solution to decay to 1/e?
3. (a) Show that y(t) = A cos(ωt − φ) is a solution of the harmonic oscillator
y 00 + ω 2 y = 0
for any A and φ.
(b) Determine the solution with the initial conditions y(0) = a, y 0 (0) = 0.
4. Find the equation for the orthogonal trajectories to the following families of curves. For each
family, sketch a few of the curves and their orthogonal trajectories on the same graph.
(a) y = cx2
(b) y 2 = 3x + c
5. Solve these separable equations.
dy
(a) = 2ty 2
dt
(b) y 0 = yt + y + t + 1
dy
(c) = tr y s
dt
6. A dead body is found in a room with ambient temperature of Tm = 70 F. Suppose you know
that it takes 3 hours for a body to cool to 85 F.
(a) Use Newton’s law of cooling to write the temperature of the body as a function of time.
(b) Suppose you measure the temperature of the body and find that it is 75 F. How long has
the person been dead?
2 FIRST ORDER LINEAR DIFFERENTIAL EQUATIONS 8
dy
b(t) + a(t)y = q(t)
dt
If b(t) 6= 0, we can divide by b(t), relabel and obtain the standard form:
dy
+ a(t)y = q(t) (2.1)
dx
The way to think about linear equations is to think of the RHS term q as a source or input term. q(t)
is the input and y(t) is the response. When solving a linear system we are determining the response
from the input.
Suppose q = 0, and a is constant, so we have the homogeneous equation
y 0 + ay = 0 (hom)
This has the general solution yn (t) = Ce−at . Now, if we add a source term q (supposing it is constant
again), we have
y 0 + ay = q (full)
A solution to this equation is yp (t) = q/a (check). Now, the complete solution is
Notice that the null solution comes from the starting value y(0). The particular part comes from the
input q.
This is generally true. To find the complete solution to (2.1) (the general solution), we find the
general solution of the homogeneous part (the null solution) and add a particular solution. Finding
the general solution of the homogeneous part, we already know how to do: it is separable. So we
just need to be able to find a (any!) particular solution.
Sometimes it can be easy to find the particular solution. In the above instance, for example, when
p and q are both constant, we can deduce that a particular solution is just going to be a constant.
But for other inputs, it may not be so obvious, and we want a method for determining the particular
solution in general.
Rather than just pick q out of a hat, we will concentrate on the types of inputs that occur most often
in applications. These are
2 FIRST ORDER LINEAR DIFFERENTIAL EQUATIONS 9
d
Then the LHS of equation (2.2) is [I(t)y(t)], so this can be written
dt
d
[I(t)y(t)] = I(t)q(t)
dt
which we can then integrate and solve for y. The function I(t) is called an integrating factor.
Z t
−at −at
y(t) = e y(0) + e eas q(s) ds
0
Thus, we get a solution in terms of an integral. Notice that (as long as a > 0) the term e−at y(0)
tends toward zero, so in the long run, all that matters is the response to the input given by the
integral on the RHS.
Example 2.1. Constant source. Find the solution to Newton’s Law of Cooling by treating it as a
linear equation.
Example 2.2. Exponential source. Here the source is growing (c > 0) or decaying (c < 0) expo-
nentially. What is the response?
y 0 + 2y = 5e3t y 0 + 2y = 5e−3t
2 FIRST ORDER LINEAR DIFFERENTIAL EQUATIONS 11
y 0 + 2y = 5 cos(3t)
Example 2.4. Step function source. What if a source suddenly “turns on”? Here H(t − T ) = 1 if
t ≥ T and 0 if t < T .
y 0 + 2y = 3H(t − 1), y(0) = 4
2 FIRST ORDER LINEAR DIFFERENTIAL EQUATIONS 12
Example 2.5. Delta function source. The delta ‘function’ δ(t − T ) is defined so that
Z ∞
δ(t − T )f (t) dt = f (T )
−∞
This is a ‘function’ whose integral is 1, but is zero everywhere except at T . How can it be!? Actually,
such a function can’t exist, but it plays a very useful role in modeling. For example, a sudden
injection of a drug at time T can be modeled by a delta function.
Example 2.6. Lastly, let’s look at a case where a(t) is not a constant. Determine the general
solution to the DE
dy 2
− y = t2 sin t, t > 0.
dt t
2 FIRST ORDER LINEAR DIFFERENTIAL EQUATIONS 13
Exercises
1. Solve the IVP
dy 1
(1 + t2 ) + 4ty = , y(0) = 2.
dt (1 + t2 )2
2. Find the linear DE that produces the null and particular solutions
5. Solve y 0 + y = 3δ(t − 4) + 2H(t − 5). What happens as t → ∞? Compare with the solution of
y 0 + y = 2.
6. Solve y 0 +3y = 3H(t−1) 1 − e1−t . Sketch the solution and determine the long-term behavior.
3 MODELS OF GROWTH AND DECAY 14
Example 3.1. Suppose you invest $1 in an account paying 10% interest. How long does it take for
your account to reach $100? $200? $1 million?
Exponential growth will occur in any situation where the growth of some substance, whether it be
money, bacteria, etc., is proportional to the amount present.
y(t) = y(0)2t/t2
What is the doubling time for the account paying 10% interest?
Exponential decay
Now suppose the reverse is happening–a substance is decaying in proportion to the amount present.
This occurs in radioactive decay. Some substances, e.g. a certain isotope of Carbon, C-14, contin-
uously decay. The more C-14 there is, the faster it decays; the less there is, the slower it decays.
Thus, we have
y 0 = −ky
Example 3.4. C-14 has a half-life of 5700 years. Suppose you are able to measure the amount of
C-14 in an object, and you are able to determine that the amount currently present is 1/10 of the
amount T years ago. What is T ?
Mixing Problems
Consider the mixing problem of determining the amount of some solute in a tank. Solution flows
into the tank at some rate, with some concentration, and solution flows out of the tank at some rate
with some concentration. Thus, the amount of the solute is modeled by
dA
= flow in − flow out
dt
Solution. We know the initial amount (A(0) = 32). The rate of flow in is
dA 1
=8− A
dt t+4
3 MODELS OF GROWTH AND DECAY 16
Logistic growth
The growth model y 0 = ry is not very satisfactory for most populations. It doesn’t account for
the fact that most populations will stop growing after reaching some level. This is due to many
factors, including buildup of waste products, competition for food, and lack of space to grow. These
factors cause the growth rate to slow when the population gets larger. To account for this effect,
we can modify the exponential growth model by adding a term that slows the growth rate when the
population gets large. One of the most popular such models is the logistic model:
y0 = r y − b y2
y small: y 0 ≈ ry
y large: y 0 ≈ −by 2
Example 3.6. Without solving the equation, sketch representative solution curves. What happens
in the long run?
Notice that the logistic equation is separable, so we can solve it using techniques already learned. This
involves using partial fractions to evaluate an integral. Another approach is to make a transformation
that turns the equation into a linear equation. We can solve the linear equation and get back the
solution to the logistic equation.
Example 3.7. a) Make the transformation z = y −1 . Show that the equation for z is
z 0 = −rz + b
b) Solve the equation for z, and then recover the solution for y.
r
c) Show that lim y(t) = . The ratio K = r/b is called the carrying capacity of the population.
t→∞ b
3 MODELS OF GROWTH AND DECAY 17
Exercises
1. A fungus doubles in size every day, and it weighs a pound after 20 days. If another fungus
were twice as large at the start, would it weigh a pound after 10 days?
2. Charcoal is found in a cave. It is determined that the amount of C-14 present is 1/50 the
amount when the wood was burned. How long ago was it burned?
3. The E. coli bacteria has a volume of 6 µm3 . In optimal conditions, an E. coli bacteria will
double about every 30 minutes. Under these conditions, how long will it take for a single
bacterium to grow to fill a thimble with volume 1 cm3 ? How long will it take for the volume
to fill the entire earth 1.08 × 108 km3 ?
(a) Without solving the equation, determine the amount of chemical in the mixture in the
long run.
(b) Determine the amount A(t) of salt in the container as a function of time. Sketch A(t).
(c) How long will it take for the concentration to reach 90% of its limiting value?
6. The logistic equation has been used to model the spread of technology. Let N ∗ be the number
of ranchers in Uruguay, and N (t) the number who have adopted a new pasture technology.
The rate of adoption dN/dt is proportional to the number who have adopted the technology,
and the fraction who have not (and thus are susceptible to changing). So the equation is
dN N
= αN 1 − ∗
dt N
According to a study by Banks (1993), N ∗ = 17, 015, N (0) = 141, α = 0.49 per year. De-
termine how long it takes for the new technology to be adopted by 80% of the population of
ranchers.
4 THE HARMONIC OSCILLATOR 18
d2 y dy
A 2
+B + Cy = f (t)
dt dt
We will start with the most important differential equation of all – the harmonic oscillator.
The DEs that occur most often in applications are second order, meaning that the highest derivative
occuring in the equation is the second: y 00 . The reason for this is Newton’s 2nd Law:
F = ma
As the acceleration is the 2nd derivative, this means
d2 y
m =F
dt2
Thus, if we know the force, we can write down a 2nd order DE.
Of the 2nd order equations, the most important is the harmonic oscillator. This describes the motion
of an object subject to a restoring force. Examples include the spring attached to a wall, or the small
motions of the pendulum.
In the absence of friction, the restoring force is F = −ky. So the equation of motion is
d2 y k
Harmonic Oscillator: + ω2y = 0 2
ω = (4.1)
dt2 m
y(t) = A cos(ωt − φ)
A here represents the amplitude of oscillation and φ is the phase shift. Alternatively,
In this form, we can determine B and C from the initial position and velocity:
B = y(0), C = y 0 (0)/ω
Notice that for this second order DE, we need two initial conditions to specify the solution. This is
true in general: An nth order equation requires n initial conditions.
Example 4.1. Find the solution of the harmonic oscillator (4.1) with parameters k = 4, m = 1,
and initial conditions y(0) = 2, y 0 (0) = 1. Write the solution in both forms. Find the amplitude and
frequency of oscillation.
Hint:
p
B cos θ + C sin θ = B 2 + C 2 cos(θ − φ), where φ = tan−1 (C/B)
4 THE HARMONIC OSCILLATOR 20
F = −ky + f (t)
d2 y
m + ky = f (t) (4.2)
dt2
f (t) is the input and y(t), the solution, is the response. This is now an inhomogeneous equation,
since it involves a term that does not include y. It is still linear, even if f (t) is a nonlinear function
of t, since it is linear in y. The solution of such equations is made up of a null and a particular
solution.
We know already the null solution. The question, then, is how to come up with a particular solution.
Example 4.2. Sinusoidal forcing. Suppose the external force is periodic (pushing a child on a
swing). Then f (t) = F cos(ωf t). Show that a particular solution of (4.2) is
F
yp (t) = cos(ωf t)
m ω2 − ωf2
as long as ωf 6= ω. If ω = ωf , we have resonance, which means the forcing freqency is the same as
the natural frequency ω of the system. In this case, show that a particular solution is
F
yp (t) = t sin(ωt)
2mω
What happens to yp (t) as t grows?
4 THE HARMONIC OSCILLATOR 21
Other applications
LC Circuits
The Harmonic oscillator equations also describe the current in an electrical circuit. Consider an
inductor with inductance L in series with a capacitor of capacitance C.
Let I(t) be the current flowing around the circuit. The potential drop across the capacitor is V =
Q/C, where Q is the charge stored on the capacitor’s positive plate. The current is equal to the rate
at which charge accumulates, so I = dQ/dt. Kirchhoff’s second current law says that the potential
drop across the various components of a close circuit loop is equal to zero, so we have
dI
L + Q/C = 0
dt
Differentiate once and divide by L to get
d2 I
+ ω2I = 0
dt2
where ω 2 = 1/LC. If there is an external current forcing, we have
d2 I
+ ω 2 I = f (t)
dt2
F = −mg sin θ
d2 θ
a=l
dt2
According to Newton’s law, then,
d2 θ
+ ω 2 sin θ = 0 (4.3)
dt2
where ω 2 = g/l. This is not the harmonic oscillator. However, if θ stays small, θ ≈ 0, which would
be the case if the oscillations are fairly small, we have the approximation
sin θ ≈ θ
4 THE HARMONIC OSCILLATOR 22
d2 θ
+ ω2θ = 0
dt2
Again, if there is an external forcing, we get
d2 θ
+ ω 2 θ = f (t)
dt2
Exercises
1. Solve y 00 + 9y = 0 starting from y(0) = 1 and y 0 (0) = −1. What is the amplitude and frequency
of oscillation?
2. Solve y 00 + 9y = 2 cos(2t) starting from y(0) = 0 and y 0 (0) = 0. Sketch the solution.
5. The spring-mass system (4.1), i.e. the harmonic oscillator, is conservative. That means there
is a quantity E, called the energy, that is conserved:
6. Another way to see that the energy is constant is to show that its derivative w.r.t. t is zero.
Multipy y 00 + ω 2 y = 0 by y 0
dE
Show that = 0.
dt
7. The pendulum equation (4.3) is also conservative. This is a nonlinear equation, but we can use
the same idea as in the previous problem. Here the conserved quantity is
2
1 dθ ω2
E= + (1 − cos θ)
2 dt 2
To show that E is conserved,
d2 y dy
A 2
+B + Cy = 0 (5.1)
dt dt
We know that solutions of linear DE are combinations of solutions of the homogeneous equation and
a particular solution. So we begin our study by solving the homogeneous equation with constant
coefficients: A, B, C constant.
MAIN IDEA FOR HOMOGENEOUS DE: Look for solutions of the form ert .
Example 5.1. Find two solutions and the general solution of the following DE by looking for
solutions of the form ert .
y 00 − 3y 0 + 2y = 0.
Generally, when we look for a solution of a linear DE of the form ert , each nth derivative is going to
give us a term rn . Thus, when we substitute ert into equation (5.1), we get the equation for r:
Ar2 + Br + C = 0 (5.2)
Each solution of (5.2) gives us a solution ert of the DE (5.1). The equation (5.2) is called the auxil-
iary equation, and the polynomial P (r) on the left-hand side is called the auxilliary polynomial.
Generally, the auxilliary equation will have two solutions, r1 , r2 , and thus we will get two solutions
er1 t and er2 t of the DE (5.1). As long as r1 6= r2 , these two solutions are linearly independent
(LI), meaning that the only way we can have
c1 er1 t + c2 er2 t = 0
for all t is if c1 = c2 = 0. Once we have two LI solutions, we have the general solution:
y(t) = c1 er1 t + c2 er2 t
The two LI solutions will not always be of the form ert , but the idea is that we need two LI solutions.
Generally, for an nth order differential equation, we will need to find n LI solutions to form the
general solution.
5 CONSTANT COEFFICIENT 2ND ORDER DIFFERENTIAL EQUATIONS 24
y 00 + y 0 − 2y = 0
Complex roots
If the roots of the auxiliary equation are distinct, then the set er1 t , er2 t are LI solutions. This is true
whether the roots are real or complex. But if there are complex roots we have to put the solutions
into real form. To do this, recall Euler’s formula:
This implies
e(a+ib)t = eat [cos(bt) + i sin(bt)]
If r = a + ib is a complex root of P (r) = 0, and the coefficients are all real, the complex conjugate
r̄ = a − ib is also a root. By taking linear combinations, we get the two LI solutions
y 00 + ω 2 y = 0
Example 5.5. The spring-mass system will generally have some damping due to friction. The force
of friction acts in the opposite direction to the motion, so, if there are no external forces,
F = −k y − γ y 0
where γ is the coefficient of friction. Newton’s Law then gives us the equation of motion
m y 00 + γ y 0 + k y = 0
Find the general solution and determine at what value of the friction the system is critically damped,
i.e. there are no oscillations.
5 CONSTANT COEFFICIENT 2ND ORDER DIFFERENTIAL EQUATIONS 26
y 00 − 2y + y = 0
y 000 − 2y 00 + y 0 − 2y = 0.
Exercises
1. Find the general solution.
(a) y = c1 e3t + c2 e−t , (b) y = c1 e4t + c2 te4t , (c) y = c1 e−3t cos(t) + c2 e−3t sin(t)
y 00 + γy 0 + y = 0
For which values of the friction coefficient γ will the spring undergo oscillations? Sketch
solutions when γ = 1 and γ = 3.
d2 y dy
A 2
+B + Cy = f (t)
dt dt
The complete (general) solution is the null solution (general solution of the homogeneous part) plus
a particular solution:
ycomplete (t) = yn (t) + yp (t)
Having found the null solutions (solutions of the homogeneous part), we now need to find a particular
solution when f 6= 0. Recall the terminology: When f (t) 6= 0, the equation is non-homogeneous.
The f (t) represents the input, or forcing term, and the solution y is the response.
Undetermined coefficients
The method of undetermined coefficients is quite useful when f has a particular form. The idea is
to guess a solution of the same form as f , and then determine the coefficients (hence the name!).
Example 6.1. Find the general solution of
y 00 + y 0 − 2y = 3t
Look for a solution of the form yp (t) = c1 + c2 t.
In the above example, when f (t) is a polynomial of degree 1, a particular solution can be found that
is also of degree 1. This is generally true. Likewise, if f (t) is an exponential or sinusoid, we can find
solutions of the same form.
Take home message: (Part of) the response has the same form as the input.
6 UNDETERMINED COEFFICIENTS & VARIATION OF PARAMETERS 29
Example 6.3. Find the general solution of y 00 + 2y 0 + 5y = 3 sin(t). What happens in the long run?
6 UNDETERMINED COEFFICIENTS & VARIATION OF PARAMETERS 30
Variation of parameters
One of the shortcomings of the method of undetermined coefficients is that it only works for certain
types of inputs. For more general forcing terms, we will need a more general method. The method
of variation of parameters is completely general (as long as we can compute certain integrals), but
requires a bit more computation. In fact, the method works even if the coefficients A, B, C are
functions of t!
Here we will assume A = 1, so that we have
y 00 + By 0 + Cy = f (t) (6.1)
The main idea is to take the null solution, which will be a linear combination of two LI solutions:
and form a particular solution by allowing the constants c1 , c2 to be functions of t. (Allow the
parameters to vary.) That is, we seek a particular solution of the form
We have some freedom in how we select c1 (t), c2 (t), and we will use this to simplify the computation.
Notice that, by the product rule,
yp0 (t) = c01 (t)y1 (t) + c02 (t)y1 (t) + c1 (t)y10 (t) + c2 (t)y20 (t)
One condition we impose is that the first two terms on the RHS above add to zero:
Thus,
Now we have two equations (6.3) and (6.4) for two unknowns c01 and c02 . We can solve for c01 and c02
and then integrate these to recover the particular solution yp .
The 2 × 2 system for c01 , c02 can be written
" #" # " #
y1 y2 c01 0
=
y10 y20 c20 f
Example 6.6. Solve the problem in Example 6.4 using variation of parameters.
Green’s functions
We can write a particular solution in an elegant way as an integral using something called the
Green’s function. Now, instead of stopping at the equations (6.5), we will integrate them using
definite integrals. Integrate the equations (6.5) and use these to form the particular solution (6.2).
We get
y2 (t)f (t) y1 (t)f (t)
Z Z
yp (t) = −y1 (t) dt + y2 (t) dt (6.6)
W (t) W (t)
In the above equation (6.6), the integrals are indefinite integrals. We could take them to be definite
integrals from 0 to t. Then we have
Z t
y1 (s)y2 (t) − y1 (t)y2 (s)
yp (t) = f (s) ds
0 W (s)
which gives us the response to the input as an integral of the input against a function called a Green’s
function. In other words, let
Z t
yp (t) = G(s, t)f (s) ds (6.8)
0
Think of the integral as a sum. (It is the limit of a sum, a Riemann sum.) Equation (6.8) says that
the response is a weighted sum of the input. The weights are given in terms of the solutions of the
homogeneous equation by the Green’s function.
Example 6.8. Find the Green’s function for y 00 + 9y, and then solve y 00 + 9y = 2 sin(2t). Hint:
Recall the trig identities
1
sin(A − B) = sin A cos B − cos A sin B and sin A sin B = [cos(A − B) − cos(A + B)]
2
6 UNDETERMINED COEFFICIENTS & VARIATION OF PARAMETERS 34
Example 6.10. y 00 + 9y = 3H(t − 2). Sketch the solution with starting values y(0) = 1, y 0 (0) = 0.
6 UNDETERMINED COEFFICIENTS & VARIATION OF PARAMETERS 35
Exercises
1. Use undetermined coefficients to find a particular solution yp .
3. Resonance occurs when the forcing is a solution of the homogeneous equation. In this case,
we can usually find a particular solution using undetermined coefficients by multiplying the
normal, non-resonant, particular solution by t. Do this for the following.
For (b), show that you get the same thing using variation of parameters.
(a) Show that the null solutions are e−at cos(3t) and e−at sin(3t). (Recall problem 3 from §5.)
1
(b) Show that the Green’s function is G(s, t) = ea(s−t) sin(3(t − s)).
3
(c) Use the Green’s function to find a particular solution yp (t). Sketch the solution for a =
.1, .5, 1.
with starting values y(0) = 3, y 0 (0) = 0. Sketch the solutions. (This corresponds to a mass-
spring system with damping. In (a) the spring is being periodically forced. In (b) a magnet is
turned on at t = 2 that pulls the mass to the right.)
7 SYSTEMS OF LINEAR EQUATIONS & MATRIX-VECTOR MULTIPLICATION 36
3.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
3 5 6
1
3
2
1 5 6
2
1
0
0 1 2 3 4 5 6 7
" # " # " #
1 2 7
1 +3 = , so the solution is x = 1, y = 3 .
3 1 6
7 SYSTEMS OF LINEAR EQUATIONS & MATRIX-VECTOR MULTIPLICATION 37
Conceptually, we think of systems of equations in the column picture: We want to find linear com-
binations of the columns to equal the RHS. When we solve the equations using elimination, we use
the row picture.
Let " # " # " #
1 2 7
v= w= b=
3 1 6
Then the problem above is to find the linear combination xv + yw that adds up to b. Linear
combinations are formed using the parallelogram law.
or
" #
(row 1) · x
By rows: Ax =
(row 2) · x
" #" #
1 2 2
Example 7.2. Find by columns and by rows.
3 1 −1
" # " #
x 6
The system we started with can be written in matrix-vector form by letting x = ,b= , and
y 7
then the system is
Ax = b
When we multiply a vector by a matrix, we get another vector. What kinds of actions can such
multiplication do?
" #
1 3
Example 7.3. Consider the matrix A = . What does Ax do to x? What condition must b
2 6
satisfy in order for Ax = b to have a solution?
Singular matrices The matrix A in Example 7.3 is called singular. The equation Ax = b
generally does not have a solution. The RHS b has to satisfy a certain condition for a solution to
exist. Conceptually, the rows in the row picture represent parallel lines. The columns in the column
picture are pointing in the same direction. A 2×2 matrix will be singular if the columns are multiples
of each other.
7 SYSTEMS OF LINEAR EQUATIONS & MATRIX-VECTOR MULTIPLICATION 39
" #
0 1
Example 7.4. What does the matrix A = do to vectors x?
−1 0
or
(row 1) · x
(row 2) · x
By rows: Ax = ..
.
(row m) · x
(Ax)i = (row i) · x
n
X
= aij xj
j=1
# of columns of A = # of elements of x
The number of rows of A can be anything. The size of the vector Ax is the number of rows of A.
7 SYSTEMS OF LINEAR EQUATIONS & MATRIX-VECTOR MULTIPLICATION 40
" 1 #
1 1 1
Example 7.5. −1
−1 0 1
2
1 2 −1 1
Example 7.6. Let A = −1 −2 1. Then Ax = (x2 + 2x2 − x3 ) −1. The 2nd column is
3 6 −3 3
twice the first, and the third column is −1 times the first. Any linear combination of the columns is
a multiple of the first column. What must b be in order for Ax = b to have a solution?
1 1 1 1
Example 7.7. Find A −1, where A = −1 0 1
2 2 1 0
7 SYSTEMS OF LINEAR EQUATIONS & MATRIX-VECTOR MULTIPLICATION 41
Exercises
1. Sketch the row picture and the colum picture for the following systems
3x + 2y = 5 x − 2y = 1
(a) (b)
2x + y = 3 2x + y = 7
5. Fill in the missing entries of the 2 × 2 matrix A to make A singular. (Note: There is more than
one right answer.) " #
2 ?
A=
−1 ?
6. Let h i
A= 1 1 1
The solution set for Ax = 0 represents what kind of region in R3 ? Sketch this region.
Now suppose x is any solution of Ax = 0. Show that x = c1 v + c2 w for some constants c1 and
c2 .
8 MATRIX MULTIPLICATION 43
8 Matrix multiplication
We saw how to multiply a matrix and a vector. Multiplying two matrices together follows the same
rules. We can multiply by columns or by rows. To understand why matrix multiplication works the
way it does, let ej be the jth standard basis vector:
0
1
0
0 ..
.
0 1 0
0
0 0 1
1 ← jth position
0 ,
e1 =
0 ,
e2 =
0 ,
e3 = ··· ej =
0
.. .. ..
..
. . . .
0 0 0 0
This is the vector with a 1 in the jth entry and zeros elsewhere. Then, if the columns of A are ai :
h i
A = a1 a2 · · · an
Then
Aej = aj (8.1)
That is, the jth column of A is Aej . Thus, to get the jth column of AB, we have
jth column of AB = ABej = A(jth column of B)
Therefore, we have the following rule for multiplying two matrices together:
h i h i
If B = b1 b2 · · · bn , AB = Ab1 Ab2 · · · Abn (8.2)
n
X
(AB)ij = aik bkj (8.3)
k=1
The number of columns of A must equal the number of rows of B. The size of AB is
(m × n)(n × p) = m × p
8 MATRIX MULTIPLICATION 44
" #" #
1 3 −1 0 1
Example 8.1.
2 −1 2 1 3
1 1 1 −1 2 1
Example 8.2. 0 3 2 1 0 −5
0 0 5 0 1 4
1 h i
Example 8.3. 3 −1 4
2
h i 0
Example 8.4. 1 0 1 1
1
8 MATRIX MULTIPLICATION 45
Most of the rules for mulitiplication of numbers carry over to matrices, with one very important
exception. The most important property is the associative law of multiplication, which makes
multiplication of more than 2 matrices make sense:
This means that the product ABC makes sense, since we can do the multiplication in any order.
1 h i 0
Example 8.5. −1 1 0 1 1
2 1
8 MATRIX MULTIPLICATION 46
The proof of the distributive law is an exercise. You should convince yourself that all of them are
true. One law that is not in the list is the commutative law for multiplication. This is becase
Usually AB 6= BA
Exercises
1. A is 3 × 2, B is 2 × 5, and C is 5 × 3. Which of the following products makes sense? What are
the sizes of the products?
AB BA CA CB CAB
1 h i
2. Let A = −3 and B = −4 2 1 . Compute AB and BA.
2
" # " # " #
1 2 −1 1 1 1
3. Let A = , B = , and C = . Compute (AB)C and A(BC) and show
1 3 2 4 5 −3
that they are equal.
Ak = AAA
| {z· · · A}
k times
" #
1 2
Compute A3 for A = .
1 3
5. Prove the distributive law A(B + C) = AB + AC. One way to do this is to use the rule (8.3)
for multiplication by rows. Calculate the (i, j)th entry of A(B + C), and show that it is equal
to the (i, j)th entry of AB + AC.
7. Here is another way to think of matrix multiplication. Let’s call the columns of A aj and the
rows of B ri . Then the product of A and B is the sum of the matrices ai ri . That is,
r1
r2
AB = a1 a2 · · · an = a1 r1 + a2 r2 + · · · + an rn
..
.
rn
Show that this is true by showing that the (i, j)th entry of the above sum is the same as that
obtained in formula (8.3).
9 SOLVING SYSTEMS OF LINEAR EQUATIONS BY ELIMINATION 48
x + 2y = 7
3x + y = 6
We want to eliminate everything below the x term in the first equation. Let’s subtract 3 times the
first equation from the second:
x + 2y = 7
−5y = −15
Now we have an upper triangular system: everything below the diagonal is zero. Next, solve for y
and substitute back into the first equation (back substitution), to get the solution (x, y) = (1, 3) .
That’s all there is to it! We do elementary row operations to get an upper triangular system, which
we solve by back-substitution. It works the same no matter how many equations and variables you
have. It’s just that there are some situations that require some extra steps. Notice that we don’t
really need to carry along the variables. We could just as well do the operations on the numbers, by
forming the augmented matrix (the matrix of coefficients and the RHS in the last column):
" # " #
1 2 7 1 2 7
becomes, after elimination
3 1 6 0 −5 −15
We operate on the equations (rows) with elementary row operations (ERO). These leave the solution
set unchanged. They are:
BREAKDOWN: In the first step, the number in the upper left is called the first pivot. This is what
we use to eliminate everything below it. The second pivot is the number on the diagonal after the
next step, the −5. Thus, the two pivots for the above problem are 1 and −5. The pivots are the
numbers on the diagonal after elimination.
The number we use to multiply the first pivot by to eliminate the number below is called the
multiplier. In the example above, the multiplier is l21 = 3.
9 SOLVING SYSTEMS OF LINEAR EQUATIONS BY ELIMINATION 49
Example 9.1. In the first example, do eliimination by first interchanging the equations. What are
the pivots, and what is the multiplier? What is the product of pivots in both cases?
x + 3y = 4
2x + 6y = 9
" # " #
4 4
Example 9.3. Failure with infinitely many solutions. Change b = to :
9 8
x + 3y = 4
2x + 6y = 8
9 SOLVING SYSTEMS OF LINEAR EQUATIONS BY ELIMINATION 50
2x + 2y + 2z = 4
4x + 4y + 2z = 4
−2x + y + 7z = 11
1. Use the first equation to create zeros below the first pivot.
2. Use the new equation 2 to create zeros below the second pivot.
3. Keep going to create zeros below all of the pivots to find the triangular U .
The one caveat is that we might need to interchange rows along the way. The method may create a
row of zeros in U , in which case there are either no solutions or infinitely many solutions.
Notation: The upper triangular U that we found by elimination is an example of an (upper) echelon
matrix. An echelon matrix has the properties that each nonzero row has a leading nonzero number
called the pivot, and all of the entries below each pivot are zero. The echelon form of a matrix is not
unique. However, in later sections we will find a unique echelon form called the row reduced echelon
form.
9 SOLVING SYSTEMS OF LINEAR EQUATIONS BY ELIMINATION 51
Recall that the first multiplier was l21 = 3. We subtracted 3 times the first equation from the second.
This operation can also be accomplished by multiplying by an elimination matrix
" #
1 0
E=
−3 1
Exercises
1. Which multiple of equation 1 should be subtracted from equation 2?
2x − 6y = 4
−x + 5y = 0
2. Use the previous exercise to find the LU factorization. Fill in the missing values.
" # " #" #
2 −6 1 0 2 −6
A = LU, =
−1 5 ? 1 0 ?
3. Choose a right hand side for the second equation so that the system has (i) no solution, and
(ii) infinitely many solutions.
2x + 3y = 5
4x + 6y =
4. Derive a test on b1 and b2 so that the system has a solution. How " many
# solutions
" #will the
1 0
system have if there is a solution? Sketch the column picture for b = and b = .
2 1
2x + 3y = b1
4x + 6y = b2
5. Alice buys three applies, a dozen bananas, and one cantaloupe for $2.36. Jorge buys a dozen
apples and two cantaloupes for $5.26. Quinn buys two bananas and three cantaloupes for $2.77.
How much do single pieces of each fruit cost?
2x + 2y + z = 3
4x + 4y + 3z = 9
−2y + 5z = 17
7. Consider the system in the previous problem. Suppose we change the 2nd equation to
4x + 4y + az = b
For which values of a and b is there (i) no solution, and (ii) infinitely many solutions?
10 INVERSE MATRICES 53
10 Inverse matrices
Let’s consider the system we started with:
x + 2y = 7
3x + y = 6
" #
1 −1 2
which we can write as Ax = b. Notice that if we multiply on both sides by , we have
5 3 −1
" #" #" # " #" #
1 −1 2 1 2 x 1 −1 2 7
=
5 3 −1 3 1 y 5 3 −1 6
" #" # " #
1 0 x 1
=
0 1 y 3
" # " #
x 1
=
y 3
" #
−1 1 −1 2
The matrix on the left, A = is called the inverse of A.
5 3 −1
A matrix A is called invertible if there exists a matrix A−1 , called the inverse of A,
such that
AA−1 = I and A−1 A = I
If A is not invertible, it is called singular.
" #
3 2
Example 10.2. Find the inverse of A = .
1 1
" #
3 2
Example 10.3. Find the value of a that makes singular.
1 a
(AB)−1 = B −1 A−1
where ej is the jth standard basis vector. Call the columns of A−1 v1 , v2 , v3 . Then, since
h i h i
AA−1 = A v1 v2 v3 = Av1 Av2 Av3
Thus, if we can solve these three equations, we have the inverse. In fact, we can solve all three at
once using Gauss-Jordan elimination. Here we put the three ei ’s in an augmented matrix and do
row operations until the left block is the identity. What is left in the right block is the inverse.
h i h i
A I ∼ I A−1
1 2 2
Example 10.5. To find the inverse of A = −1 3 0, we form
0 1 1
h i 1 2 2 1 0 0
A I = −1 3 0 0 1 0
0 1 1 0 0 1
1 0 0 1 0 −2
∼ 0 1 0 1/3 1/3 −2/3
0 0 1 −1/3 −1/3 5/3
Notice that this process works as long as A doesn’t have any zero pivots. In this case, we can find
a matrix B such that AB = I (a right-inverse). If BA has a zero row, BAB has a zero row, which
means that B has a zero row. Thus, B cannot have a zero row, which means that B must have a
right inverse: BB −1 = I, so BAB = B ⇒ BA = I, as well. Thus,
Invertibility conditions
1. A is invertible.
3. A has n pivots.
Exercises
1. Find the inverse of the permutation matrices by trial and error
0 1 0 0 1 0
P = 1 0 0 and P = 0 0 1
0 0 1 1 0 0
4 0 0 0 0 0 7 10
The transpose
The transpose of a matrix is another matrix. It is obtained by “reflecting” entries across the diagonal:
the rows become the columns and vice versa. If A = [aij ], then AT is the transpose of A, and its
entries are
aTij = aji (switch the indices)
1 0 7
Example 11.1. Find the transpose of A = −8 2 1.
0 1 0
Symmetric matrices
The most important matrices in applications are the symmetric matrices.
1 0 7
Example 11.2. 0 2 1 is symmetric. The matrix in Example 11.1 is not symmetric.
7 1 3
One very important symmetric matrix is obtained by multiplying any matrix by its transpose.
" # 1 0
1 0 2
Example 11.4. 0 1
0 1 3
2 3
11 SYMMETRIC AND ORTHOGONAL MATRICES 60
Orthogonal matrices
Orthogonal matrices are another very important type of matrix in applications. They are, in a very
key sense, the “best” kind of matrix to have. Recall that a collection of vectors q1 , q2 , . . . , qn is
orthonormal if they are all orthogonal to each other, and they all have length 1:
(
0 if i 6= j
qi · qj =
1 if i = j
Here his one ireason why orthogonal matrices are so important. Suppose Q is the 2 × 2 matrix
Q = q1 q2 , where q1 , q2 are orthonormal. Then
In other words,
Q−1 = QT
Another key fact about orthogonal matrices is that they preserve lengths.
Example 11.6. Let Q be an orthogonal matrix. Show that kQxk = kxk for every vector x.
0 1 0
Example 11.7. Let P = 0 0 1 be a permutation matrix. Show that P is orthogonal. Find
1 0 0
−1
P . What does P do to vectors x? What does P T do?
11 SYMMETRIC AND ORTHOGONAL MATRICES 62
Exercises
1. A matrix A is called skew-symmetric, or sometimes anti-symmetric, if AT = −A. Fill in the
missing parts to make A anti-symmetric.
" #
? −1
A=
? ?
12 Determinants
For square matrices we have a number, a single number, det(A), called the determinant of A with
some remarkable properties. Four of them are as follows.
1. The determinant gives a test for invertibility. A is invertible if and only if det(A) 6= 0.
2. The determinant gives us a measure of how sensitive the system Ax = b is. Even if det(A) 6= 0,
if the determinant is too close to zero, then the solution of Ax = b can change very drastically if
b changes by a small amount. This may cause problems when solving such systems numerically.
3. The determinant gives a volume. The volume of a parallelogram with sides a and b is the
determinant of the matrix with a and b as the columns. Generally, the determinant is the
volume of a parallelipiped in Rn . This property is used in calculus, for example, to calculate
the Jacobian of a transformation.
4. The determinant is the product of the eigenvalues.
We can define the determinant of a"scalar# to be just that scalar, det(a) = a. We have also seen the
a b
determinant of a 2 × 2 matrix A =
c d
" #
a b a b
det(A) = det = = ad − bc
c d c d
Generally, the determinant is defined to satisfy three properties. The rest of the properties follow
from these three. We illustrate them on a 2 × 2 matrix, but they hold generally for n × n.
These three properties above define the determinant of an n × n matrix – the determinant of the
identity is 1, the sign is reversed by a row exchange, and the determinant is linear in each row. The
formulas for computing the determinant are not nearly as important as the properties it possesses,
which all follow from these three.
5. Subtracting a multiple of one row from another leaves the determinant unchanged.
a − lc b − ld a b
Row operation =
c d c d
7. If A is a triangular (upper or lower) matrix, then the determinant is the product of the diagonal
entries: det(A) = a11 a22 a33 · · · ann
a b a 0
Triangular matrices = ad and = ad
0 d c d
12 DETERMINANTS 65
8. The determinant is (up to a sign) the product of the pivots. If A is singular then det(A) = 0.
If A is invertible then det(A) 6= 0.
Note that this rule implies that all of the properties involving the rows also apply to the columns.
A scalar can be factored out of each column; the determinant is linear in each column; the
determinant of a matrix with equal columns is zero; subtracting a multiple of one column from
another leaves the determinant unchanged; if A has a zero column then its determinant is zero.
12 DETERMINANTS 66
1 −1 3
Example 12.3. Determine all values of k for which A = 1 2 −1 is non-singular.
3 6 k
12 DETERMINANTS 67
There is another way to compute determinants using the so-called cofactor expansion. If you enjoy
doing a lot of unnecessary calculations, this is the method for you. It can be useful in some situations
in which a row or column has a lot of zeros. We illustrate the idea on a 3 × 3 matrix. The idea is
that the first row can be written as
Now use the linearity property on the first row, EROs to zero out entries below a11 , a12 , a13 , and
interchange columns to get
a a12 a13 a11 a12 a13
11
a21 a22 a23 = a21 a22 a23 + a21 a22 a23 + a21 a22 a23
a31 a32 a33 a31 a32 a33 a31 a32 a33 a31 a32 a33
a a12 a13
11
= a22 a23 + a21 a23 + a21 a22
a32 a33 a31 a33 a31 a32
a a a
11 12 13
= a22 a23 − a21 a23 + a21 a22
a32 a33 a31 a33 a31 a32
a
22 a23
a
21 a23
a a22
21
= a11 − a12 + a13
a32 a33 a31 a33 a31 a32
We have reduced the 3 × 3 determinant to three 2 × 2 determinants. The same idea can be used to
write a 4×4 in terms of 3×3, then in terms of 2×2 determinants, and so on. Here we expanded on the
first row, but we can use the linearity in any row, or any column, to produce the same result. If we
want to expand along a given row, we will use the linearity property, then appropriate interchanges
to get determinants where the first column has zeros below and to the right of the first entry. Then
we reduce further. The upshot is that we get
Let Aij be the matrix A with the ith row and jth column removed. The determinant of Aij
is called the i, j minor and is written Mij . The minor with the sign (−1)i+j is called the i, j
cofactor and is written Cij . That is,
Mij = det (Aij ) , and Cij = (−1)i+j Mij = (−1)i+j det (Aij ) .
1 1 0 0
−2 1 3 1
Example 12.4. Find the determinant of A =
0 1 0 0
2 7 −1 3
Recall:
i j k
a · b = kakkbk cos θ a × b = kakkbk sin(θ) n a × b = a1 a2 a3
b1 b2 b3
Theorem 12.1.
Example 12.5. Find the area of the parallelogram with vertices (0, 0), (3, 1), (5, 5) and (2, 4).
We conclude this section with a beautiful formula that should never be used.
Theorem 12.2 (Cramer’s Rule). Let A be an n × n matrix such that det(A) 6= 0. Let Bk be
the matrix obtained by replacing the kth column of A with b. Then the unique solution to the
system Ax = b is (x1 , x2 , . . . , xn ), where
det(Bk )
xk =
det(A)
Corollary 12.1. The inverse of A is the transpose of the cofactor matrix, divided by the determinant.
That is,
Cji
A−1 =
ij det(A)
12 DETERMINANTS 70
Exercises
1. Find the determinant by expanding along an appropriate row or column.
0 0
1 3 −7 2
1
0
0 2 0
2 −3 7
3 2 7 9
4 1 11
6 2 1 −3
2. Let
1 −1 2
A = 3 1 4
0 1 3
Use properties of determinants to calculate det(A), det(AT ), det(−2A) and det(A4 ).
3. Find the area of the parallelogram with vertices (0, 0), (3, −2), (5, 2) and (2, 4).
" #
1 k
4. Let A = . For which values of k does Ax = b have a unique solution?
k 9
5. Use properties of determinants to show that A is singular for any value of a, where
1 + 3a 1 3
A = 2 + 2a 2 2
3 3 0
a b
6. Suppose = 3. Use properties of determinants to calculate
c d
−3a + 2c −3b + 2d
2c 2d
13 Column space
This section will be a bit more abstract. We have seen how to do calculations with matrices, how
to find solutions of systems of equations, how to compute determinants, and so on. Now we will
move to a higher level of understanding, to thinking about various spaces and how they relate to
each other. To understand Ax = b in its entirety, you need to understand vector spaces and their
subspaces. The goal is to understand the “Fundamental Theorem of Linear Algebra” which gives a
picture of linear algebra.
Let’s start with something we are familiar with: Rn . Recall that we can add vectors in Rn (parallel-
ogram law) and we can scale vectors. When we add two vectors we get another vector in the same
space. Likewise, when we scale a vector we get another vector in the same space. We say that Rn is
closed under addition and scalar multiplication.
c1 v1 + c2 v2 + · · · + cm vm
The structure of Rn can be generalized to a very abstract thing called a vector space. A vector space
is a set V of elements, called vectors, that satisfies
. . . plus 8 other conditions, called axioms, below. The two conditions above are called closure under
addition, and closure under scalar multiplication. We can add vectors and scale vectors and stay in
the space. The crucial thing is that we can take linear combinations.
13 COLUMN SPACE 72
Axiom Meaning
3. Associativity of addition u + (v + w) = (u + v) + w
4. Commutativity of addition u+v=v+u
5. Zero element (identity of addition) There is an element 0 ∈ V such that
v + 0 = v for all v ∈ V .
6. Negative element (additive inverse) For every v ∈ V there is a −v such
that v + (−v) = 0.
7. Associativity of scalar multiplication α(βv) = (αβ)v
8. Identity element 1v = v
9. Distributive property 1 α(u + v) = αu + αv
10. Distribuitive property 2 (α + β)v = αv + αu
Vector spaces have a structure like Rn , and our intuition is guided by our knowledge of Rn . However,
there are very many other (important) vector spaces. A few of the more important ones are the
following.
Examples of vector spaces:
Example 13.2. Why is the “or less” qualifier necessary in Pn ? Is the set of polynomials of exact
degree 3 a vector space?
SUBSPACES
The spaces associated with a matrix, as well as many other important vector spaces, are in fact
subsets of larger spaces. They are subsets, but also vector spaces on their own. Such spaces are
called subspaces.
One of the nice things about subspaces is that they inherit the properties of the space they live in.
This means that the axioms will all be satisfied as long as we have closure under addition and scalar
multiplication. We only need . . .
Example 13.7. The set of polynomials of exact degree 3 is a subset of P3 , but is not a subspace.
P2 is a subspace of P3 .
13 COLUMN SPACE 75
Example 13.8. Show that the set of functions satisfying f (0) = 0 is a subspace (and hence a vector
space) of the set of functions. What about the set of functions satisfying f (0) = 1?
The span is the set of all the linear combinations. If a vector space V is the span of v1 , v2 , . . . , vn ,
we say that {v1 , v2 , . . . , vn } is a spanning set for V , and V is spanned by {v1 , v2 , . . . , vn }.
Theorem 13.1. Given any collection of vectors {v1 , v2 , . . . , vn } in some vector space V ,
span{v1 , v2 , . . . , vn }
is a subspace of V .
Proof.
13 COLUMN SPACE 76
Example 13.9. span{1, x, x2 } = P2 . A spanning set for P2 is {1, x, x2 }. Find another spanning set
for P2 .
Example 13.10. Find a spanning set for the set of symmetric matrices.
COLUMN SPACE
Recall that Ax is
Ax = x1 (column 1) + · · · + xn (column n)
Let a1 , . . . , an be the columns of A, i.e.
h i
A = a1 a2 · · · an
Then
Ax = x1 a1 + x2 a2 + · · · + xn an
Thus, Ax is a linear combination of the vectors a1 , a2 , . . . , an . The set of all such linear combinations
is called the column space of A, and is denoted C(A).
Another way to say this is that the column space is the span of the columns, or is spanned by the
columns of A. In set notation:
C(A) = {Ax : x ∈ Rn }
C(A) is a subspace of Rm .
The column space can also be thought of as the range of the linear operator A – it is everything
you can get from multiplying vectors by A. Some authors use R(A) for the column space, to denote
range of A.
Example 13.13. Find spanning sets for the column spaces of the matrices in Example 13.12.
C(A) = span , C(B) = span
Since Ax is a linear combination of the columns of A, and thus in the column space of A for any
given vector x, solving Ax = b is equivalent to finding a linear combination of the columns to add
up to b. Therefore,
Example 13.14. What conditions do b and d have to satisfy for Ax = b and Bx = d to have a
solution for the matrices in Example 13.12?
13 COLUMN SPACE 79
1 2 −1
Example 13.15. Find a condition for b ∈ C(A) for A = 1 3 −1.
3 2 −3
Exercises
1. Which of the following subsets of R3 are subspaces?
2. (a) Find an example of a set in the plane where closure under addition holds, but scalar
multiplication fails. In particular, find a set S, such that if v, w ∈ S, then v + w ∈ S, but
1
2 v may be outside of S.
(b) Find an example where closure under scalar multiplication holds but closure under addi-
tion fails.
c1 + c2 x + c3 x3
1 + c1 x + c2 x3
4. Let U and V be two lines through the origin in the plane. Both U and V are subspaces of R2 .
The set U + V is defined as the set of all sums of elements from U and V . That is,
U + V = {u + v : u ∈ U, v ∈ V }
6. Take the 3 matrices in Example 13.10 and the matrix in Exercise 5. The span of these four
matrices is what?
10. If we add an extra column b to a matrix A, then the column space could get larger or it could
stay the same. Give an example where the column space gets larger and an example where it
doesn’t. Why is Ax = b solvable exactly
h when
i the column space doesn’t get larger? That is,
why is Ax = b solvable when A and A b have the same column space?
14 THE NULLSPACE OF A 81
14 The nullspace of A
The next subspace associated with a matrix is the nullspace. We want to completely describe this
space and see how it relates to other spaces.
Definition 14.1. The nullspace of A, denoted N (A), is the set of all solutions to Ax = 0.
In set notation:
N (A) = {x ∈ Rn : Ax = 0}
There is always one solution to Ax = 0, namely x = 0. One of the key questions is, Are there any
other solutions? Sometimes there will be, and other times x = 0 will be the only solution. First:
To construct the nullspace, and make sure that we have all of the solutions of Ax = 0, we will find
the special solutions. We have an efficient way to do this. We do Gauss-Jordan elimination. Now
we will not be content to reduce a matrix to upper triangular (echelon) form. We want to get 1’s on
the pivots, and zeros in the other entries of the pivot columns. This form of the matrix is called the
row reduced echelon form (RREF) of the matrix.
" #
1 3
Example 14.1. Find the nullspace of A = . Sketch N (A) and C(A).
2 6
14 THE NULLSPACE OF A 82
h i
Example 14.2. Find the nullspace of A = 1 −3 2 .
Elimination and reduction to triangular form produces pivot columns and free columns. The free
columns will be associated with free variables.
Example 14.3. " # " #
2 4 4 6 2 4 4 6
A= becomes U=
2 5 6 9 0 1 2 3
after elimination. The pivots are 2 and 1. The multiplier is 1. Now we have two pivot columns
(the first two columns) and two free columns (the last two columns). Reduce further to reduced row
echelon form by the following steps.
1. Produce zeros above the pivots by eliminating upwards.
2. Produce ones in the pivots by dividing the whole row by the pivot.
When we do this on U we get the row reduced echelon matrix
" #
1 0 −2 −3
R=
0 1 2 3
The free columns are associated with free variables. To obtain the special solutions we do the
following for each free variable. Set a free variable to 1, and the others to 0. Solve for the pivot
variables. In this example, when we set x3 to 1 and x4 to 0, we get the special solution s1 , and when
we set x3 to 0 and x4 to 1, we get the second special solution s2 .
2 3
−2 −3
s1 = s2 =
1 0
0 1
Both s1 and s2 are in the nullspace, and hence any linear combination of them will also be in the
nullspace. In fact, all the combinations make up the entire nullspace. N (A) = span{s1 , s2 }. {s1 , s2 }
forms a basis for N (A). The nullspace of A is the set of all solutions, or the general solution, of
Ax = 0.
2 3 2x1 + 3x4
−2 −3 −2x − 3x
3 4
General solution to Ax = 0 : x = x3 + x4 =
1 0 x3
0 1 x4
where x3 , x4 are free variables.
14 THE NULLSPACE OF A 83
This is how we picture elimination. Suppose A is a 4 × 5 matrix. After elimination we get the upper
triangular echelon matrix
p ? ? ? ?
0 0 p ? ?
U =
0 0 0 p ?
0 0 0 0 0
The stars can be anything. p represents a pivot. The three pivots are in columns 1, 3 and 4. The
pivot variables are x1 , x3 , x4 . The free columns are columns 2 and 5. The free variables are x2 and
x5 . When we reduce to reduced row echelon form, we will have
1 ? 0 0 ?
0 0 1 0 ?
R=
0 0 0 1 ?
0 0 0 0 0
We get one special solution s1 by setting x2 = 1, x5 = 0 and solving for the pivot variables x1 , x3 , x4 .
We get another special solution s2 by setting x2 = 0, x5 = 1 and solving for the pivots. The special
solutions {s1 , s2 } form a basis for the nullspace–they are a spanning set and they are independent.
Then the complete, or general, solution to Ax = 0 is
x = x2 s1 + x5 s2
The number of pivot columns + the number of free columns has to add up to the total number of
columns. Thus, if A is m × n, we have
Example 14.5. Find the rank and nullity of the matrix in Example 14.4.
Example 14.6. If a 2 × 3 matrix has one special solution, what is its rank? Find a 2 × 3 matrix A
−2
with the special solution s1 = 3.
1
14 THE NULLSPACE OF A 85
To end this section, we make a simple but important observation. Suppose A has more columns
than rows. When n > m, there is at least one free variable. This means that Ax = 0 has at least
one special solution, and hence at least one nonzero solution. It is important enough for a box:
Exercises
1. Let
1 3 2 4 1 2 1
A = 1 3 3 11 , B = 3 1 1
0 0 1 7 2 −1 0
(a) Find the general solutions to Ax = 0 and Bx = 0 by reducing to Reduced Row Echelon
form and finding the special solutions.
(b) Find spanning sets for N (A) and N (B).
(c) Find the rank & nullity of A and B.
3. Generally, the nullspaces of A and AT are not the same. Give an example where N (A) 6=
N (AT ). For which matrices will N (A) = N (AT )?
2
4. Is it possible for s1 = 1 to be the only special solution of a 2 × 3 matrix?
3
5. (a) Construct a 2 × 2 matrix whose nullspace is the same as its column space.
(b) Why does no 3 × 3 matrix have a nullspace that is equal to its column space?
6. If AB = 0, then ABx = A(Bx) = 0 for all vectors x. This implies that the column space of
B is contained in the of A. Construct an example of an A and B that satisfies
AB = 0. Both A and B should have some nonzero entries.
15 THE COMPLETE SOLUTION TO AX = B 86
Thus, the general, or complete, solution is the sum of the null solution (general solution to Ax = 0)
and a particular solution:
xcomplete = xn + xp
Once we have the echelon form, the nonzero rows cannot be eliminated, so these are independent.
The rank of a matrix thus tells us how many pivots, and how many independent rows we have.
Recall again, that
Depending on the size of A, we can have more than one solution, and if we do, then there are
necessarily infinitely many solutions.
In the example above, A has full row rank, meaning that the rank r of the matrix is equal to the
number of rows m. When r = m, there is always a solution. The number of special solutions is
n − r = n − m = 1 in this case.
15 THE COMPLETE SOLUTION TO AX = B 87
2 0 0 2
The matrix A in Example 15.2 has full column rank, meaning that the rank r is equal to the number
of columns n. When A has full column rank, there may or may not be a solution. If there is a
solution, it is unique. There are no special solutions. The only solution of Ax = 0 is x = 0.
1
1
Example 15.3. Repeat Example 15.2 with b = . Show that the rank of the augmented matrix
1
1
h i
A b is larger than the rank of A. The column space has increased.
15 THE COMPLETE SOLUTION TO AX = B 88
It is possible for the rank to be smaller than the number of rows and the number of columns.
Example 15.4. Find the general solution to Ax = b, where
1 3 3 5 4
A = −1 2 5 6 b = 1
0 5 8 11 5
(a) All rows have pivots, and the RREF R has no zero rows.
(b) Ax = b always has a solution. (It may not be unique.)
(c) C(A) is the whole space Rm .
(d) There are n − r = n − m special solutions in N (A).
(a) All columns have pivots, and the RREF has no zero columns.
(b) If a solution to Ax = b exists, it is unique. (A solution might not exist.)
(c) There are no free variables or special solutions.
(d) The only solution to Ax = 0 is x = 0.
Exercises
1. Find the general solution to Ax = b if
" # " #
1 3 3 1
A= , b=
2 1 0 2
3. (a) The largest possible rank of a 3 × 4 matrix is . In this case, there is a pivot in
every of U and R. The column space is then .
(b) The largest possible rank of a 4 × 3 matrix is . In this case, there is a pivot in
every of U and R. The nullspace is then .
4. Suppose you know that the 3 × 4 matrix A has the single special solution to Ax = 0 the vector
h iT
s= 3 2 1 0 .
The rank measures the ‘true size’ of the matrix. One can see that the 2nd and 3rd columns of A
are the same. So only the first two columns are needed to make the column space of A. It is a little
more difficult to see for matrix B, but
Again, only the first two columns are needed to make the column space of B. We can get the third
column from a linear combination of the first two columns. Thus, for both A and B, the first two
columns are enough for the column space. The column space is spanned by the first two columns.
We also need at least this many. We can’t get rid of either of the first two columns, or make a smaller
set of vectors to span the space. Thus, we say that the first two columns form a basis for the column
space. To understand what this means we need a few definitions.
c1 v1 + c2 v2 + · · · + cm vm = 0
where not all ci ’s are zero. This is called a linear dependence relation.
Vectors v1 , v2 , . . . , vm are linearly independent (LI) if they are not LD. That is, they are LI
if the only way to make a linear combination of them zero is to set all of the coefficients to zero:
c1 v1 + c2 v2 + · · · + cm vm = 0 ⇐⇒ c1 = c2 = · · · = cm = 0
Two vectors are LI if and only if they are not scale multiples of each other:
c1 v1 + c2 v2 = 0 ⇔ c1 = c2 = 0
and v2 are pointing in different directions, they are LI. For two (nonzero) vectors, we need for both
c1 and c2 to be nonzero to make v1 , v2 linearly dependent.
Example 16.1. The columns of A and the columns of B for the matrices (16.1) are linearly depen-
dent. They have the following linear dependence relations.
The example above shows that we have linear dependence as long as we have a linear dependence
relation where some of the ci ’s are nonzero. We don’t need all of them to be nonzero.
16 LINEAR INDEPENDENCE, BASIS AND DIMENSION 91
Another way to say this is that the columns of an m × n matrix A are LI if and only if A has n
pivots, i.e. A has full column rank.
Example 16.2. Change the last column of A in eqn (16.1) so that the columns are LI.
BASIS
Another way of saying this is that a basis is a minimal spanning set. It is a spanning set with the
least number of vectors.
Example 16.3. Any two non-zero vectors that don’t point in the same direction form a basis for
R2 .
16 LINEAR INDEPENDENCE, BASIS AND DIMENSION 92
Theorem 16.1. There is one and only one way to write x ∈ V as a linear combination of the
basis vectors.
DIMENSION
Theorem 16.2. Let V be a vector space. Every basis for V has the same number of elements.
This unique number is called the dimension of V , written dim[V ].
Proof.
Corollary 16.1. Any n LI vectors in a vector space with dim[V ] = n form a basis for V .
Proof.
Definition 16.3. The span of the rows of A is called the row space, and is denoted C(AT ).
Theorem 16.4. The pivot rows after elimination are LI, so these form a basis for the row space.
Proof.
16 LINEAR INDEPENDENCE, BASIS AND DIMENSION 95
Example 16.9. Find a basis for the row space of the matrix B in eqn (16.1).
Theorem 16.5. The columns of A corresponding to the pivot columns of an echelon form U are
a basis for C(A).
Proof.
16 LINEAR INDEPENDENCE, BASIS AND DIMENSION 96
Example 16.10. Find a basis for the column space of the matrix B in eqn (16.1).
0 1 1 1
T
Example 16.11. Find bases for C(A), C(A ) and N (A) for A = 0 3 2 1
0 2 1 0
We can find a basis for the span of any collection of vectors by either putting them as the rows of a
matrix and finding the pivot rows, or by putting them as the columns and finding the columns of A
corresponding to the pivot columns. When we use the column space to find a basis, we find a basis
in terms of the original vectors. When we use the row space, we find a basis in terms of different,
usually ‘simpler’ vectors.
Notice that the number of vectors in the basis for C(A) and C(AT ) are both equal to the number of
pivots, and hence they are the same! Thus, we have the following:
Exercises
1. Show that v1 , v2 , v3 are LI, but v1 , v2 , v3 , v4 are LD.
1 1 1 1
v1 = 0 , v2 = 1 , v3 = 1 , v4 = 2
0 0 1 3
3. Find bases for the column space, row space and nullspace for
3 2 1 0 2 1 3
(i) A = 1 1 0 1 and (ii) B = −2 −1 −3
2 1 1 −1 4 2 6
4. Find two LI vectors on the plane x − 2y + 3z = 0 in R3 . Can you find 3? The plane is the
nullspace of what matrix?
(a) If the columns of a matrix are LD, then so are the rows.
(b) Any 4 vectors in R3 are LD.
(c) The column space of a 2 × 2 matrix is the same as the row space.
(d) The column space of a 2 × 2 matrix has the same dimension as the row space.
6. Find a basis for, and determine the dimension of, each of these subspaces in R3 :
{2 − x, 4 − 2x, x2 }, {2 − x, 1 + x, x2 }
8. Suppose {v1 , v2 , v3 } is a basis for R3 . Show that if a 3 × 3 matrix A satisfies det(A) 6= 0, then
{Av1 , Av2 , Av3 } is also a basis for R3 .
17 THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 98
n o
Left nullspace: N (AT ) = x ∈ Rm : AT x = 0
xT A = 0
This is why it is called the left nullspace. Row vectors multiplying A from the left gives you zero.
Now we have the following four fundamental subspaces of an m × n matrix A with rank r:
space dimension
" #
1 3
Example 17.1. Sketch the four fundamental subspaces of A = .
2 6
17 THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 99
Recall that in the last section, we found that the dimensions of the column space and the row space
are the same, and both are equal to the rank of A:
dim[C(A)] = dim[C(AT )]
Moreover, the dimensions of the column space and the nullspace have to add up to n, the number
of columns:
Now, the key thing is that the row space and the nullspace are orthogonal. Every vector in the row
space is orthogonal to every vector in the nullspace, and vice versa. Symbolically,
C(AT ) ⊥ N (A)
In fact, C(AT ) and N (A) are more than just orthogonal. They are orthogonal complements.
Definition 17.1. Let V be a finite dimensional vector space, and S a subspace of V . The orthogonal
complement S ⊥ of S is everything in V that is orthogonal to everything in S:
Every vector in V can be written as the sum of an element in S and an element in S ⊥ . We say that
V is the direct sum of S and S ⊥ . This is often written as V = S ⊕ S ⊥ .
C(A) = N (AT )⊥
(17.1)
Proof.
C(AT) C(A)
Axr=b
b
xr
Ax=b
x=xr+xn
Axn=0
xn
N(AT)
N(A)
17 THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 101
" #
1 3
Example 17.3. Sketch the big picture for A =
2 6
Example 17.4. Find bases for the four fundamental subspaces of A, and sketch the big picture, for
" # " #" #
1 2 3 1 0 1 2 3
A= = = E −1 R
2 4 6 2 1 0 0 0
17 THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 102
The implications of the fundamental theorem are far-reaching. We can also say that Rm is the direct
sum of C(A) and N (AT ):
Rm = C(A) ⊕ N (AT )
1 −1
−1 1
Example 17.5. Let A = −1 1 . (This is the “backward difference” matrix.)
−1
1
−1 1
Show that Ax = b has a solution if and only if b is on the plane b1 + b2 + b3 + b4 + b5 = 0.
17 THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 103
uvT
Example 17.6. What are C(A) and C(AT ) for a rank one matrix uvT ?
0 0 0
17 THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 104
Exercises
1. Why can’t a matrix A have (1, 2, 3) in its row space and (3, 2, 1) in its nullspace?
2. Sketch the big picture for " #
2 10
A=
1 5
3. Find bases and dimensions for the four fundamental subspaces for
" # " #
1 2 1 1 2 1
A= , and B=
2 4 2 3 5 2
4. Find bases and dimensions for the four fundamental subspaces. This can be done without
multiplying matrices.
1 0 0 1 1 2 1
A = 2 1 0 0 0 1 3
3 2 1 0 0 0 5
5. Without multiplying matrices, find bases for C(A) and C(AT ), for
1 3 " #
1 2 1
A = 0 2
3 5 7
1 2
How do you know that any such product (a (3 × 2)(2 × 3) product) cannot be invertible?
6. For the 4 × 4 backward difference matrix
1 0 0 −1
−1 1 0 0
A=
0 −1 1 0
0 0 −1 1
the left nullspace is spanned by (1, 1, 1, 1). Why? So there is a solution to Ax = b if and only
if b1 + b2 + b3 + b4 = 0. Write out the equations and add them together to see this another way.
7. Consider the sum of two rank one matrices:
A = uvT + wzT
(a) Which two vectors span C(A)?
(b) Which two vectors span C(AT )?
(c) What condition must be satisfied to guarantee rank(A) = 2?
8. Show that N (AT A) = N (A). Hint: Suppose AT Ax = 0. Then Ax is in N (AT ) (why?). But
Ax is in C(A) (why?). So Ax is in C(A) and N (AT ). The only vector in both spaces is the
zero vector (why?), so Ax = 0.
9. A basis for N (AT ). We have seen how to find bases for the column space, row space and
nullspace. What about the left nullspace? One way to do it is to find the RREF of AT and
calculate the special solutions. Here
h is ianother method. For m × n A, form the augmented
matrix with the m × m identity: A I . Then reduce this augmented matrix to RREF. The
rows in the last m columns corresponding to the zero rows in the first n columns are then a
basis for the left nullspace. Explain why this works and illustrate with an example.
18 PROJECTION 105
18 Projection
Many of the systems that arise in applications are inconsistent. That is, they do not have a solution.
But we would still like to “solve” them in some sense. We would like to find a best approximation,
or a “solution” that minimizes the error.
2x = 1
(18.1)
x=1
" # " #
2 1
This system has no solution. We can write it as ax = b, where a = and b = . Even though
1 1
we can’t find an x that solves ax = b, we would like to find an approximation that minimizes the
error
e = b − ax
The x̂ that minimizes kek occurs at the point ax̂ that is closest to b. This is the projection of b
onto a.
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
The projection of b onto a is determined by the condition that e = b − ax̂ is orthogonal to a. Thus,
we have
a · e = 0 or a · (b − ax̂) = 0
Therefore,
a·b
x̂ =
a·a
3
x̂ =
5
This is the least square solution to the overdetermined system (18.1). It minimizes, over all possible
x’s, the length of the error vector e. Notice that it fits our intuition of what a “best approximation”
should be, i.e. somewhere between 1/2 and 1.
18 PROJECTION 106
a·b
(projection by dot product) projection of b onto a: a
a·a
aaT
(projection by matrix product) projection of b onto a: b
aT a
Thus, given any point b in the plane, we can find the closest point to b on the line spanned by a by
taking the projection. The projection matrix onto the line spanned by a is
aaT
P = (18.2)
aT a
" # " #
2 1 4 2
and the projection of b onto a is given by P b. In the case a = , we have P = .
1 5 2 1
Formula (18.2) gives the projection matrix generally for projecting onto the vector a.
" #
4
Example 18.2. What if b = ?
2
" #
0
Example 18.3. What if b = ?
1
18 PROJECTION 107
Example 18.4. Find the matrix that projects the plane onto the line 2x − y = 0.
h iT
Example 18.5. Find the projection matrix onto a = 1 1 1 , and the projection of b =
h iT
1 2 3 onto the line spanned by a.
18 PROJECTION 108
Least-squares solutions
e
N(AT)
Ax
C(A)
We call the solution of this equation x̂, and the equation for x̂ can be written in the following way:
AT A x̂ = AT b (18.5)
An x̂ that satisfies (18.5) is called a least-squares solution since it minimizes the sum of the squares
of the errors:
kek2 = e21 + e22 + · · · + e2m
The equations in the system (18.5) are referred to as the normal equations. So least-squares solutions
are solutions of normal equations.
It is an amazing fact that, regardless of what A is, even though Ax = b might not have a solution,
when we multiply both sides by AT to get the normal equations (18.5), there is always a solution,
and moreover it is, in a very deep sense, the “best” solution we can find!
The least squares solution is unique, as long as the columns of A are LI. This is because...
18 PROJECTION 109
Proof.
The above theorem implies that if A has LI columns, and hence Ax = 0 has only the zero solution,
then AT Ax = 0 has only the zero solution, so AT A is invertible. So we have proven the following.
Projection matrices
We saw in equation (18.2) how to project onto a line. Now we will see how to find the projection
matrix onto a more general subspace.
The normal equations (18.5) are the equations for the closest point to b in the column space C(A).
As long as A has LI columns, the matrix AT A is invertible, in which case we can write the least
square solution as
x̂ = (AT A)−1 AT b
So the closest point to b in C(A) is Ax̂, or
This is the projection of b onto the column space of A. We write the projection matrix P as
N(AT)
Pb
C(A)
Theorem 18.2. P 2 = P
Proof.
18 PROJECTION 111
Example 18.6. Find the projection matrix for the projection of a point in R3 onto the xy−plane.
Example 18.7. Find the projection of the point (1, 0, 1) onto the plane spanned by (1, 1, 0) and
(0, 0, 1). Which points project to the point (2, 2, 3)?
18 PROJECTION 112
Exercises
1. Find the least squares solution to
3x = 10
4x = 5
Check that the error vector is perpendicular to the column (3, 4).
x = 1/2
x=1
Why do you get a different value from that obtained in Example 18.1?
4. Show that if A is invertible, then the projection matrix P that projects a point onto the columns
of A is P = I.
5. Find the least-squares solution to Ax = b, and the projection of b onto C(A), for
1 0 1
A = 0 1 , b = 1
1 1 0
6. Let S be the subspace of R3 spanned by v1 = (1, 0, 1) and v2 = (0, 1, 1). Find the projection
matrix P onto S, and find a nonzero vector b that is projected to zero.
8. What 2 × 2 matrix projects points in the plane onto the 45◦ line y = x?
9. For the projection matrix P in the previous problem, explain what H = I − 2P does. Explain
why H 2 = I.
19 LEAST SQUARES 113
19 Least squares
y = mx + b
and try to find the slope m and y−intercept b. If all the points lie on the line, we would then be
able to solve the system of equations
mx1 + b = y1
mx2 + b = y2
..
. (19.1)
mxn + b = yn
Generally, the points will not all lie on a line, so the above system is inconsistent. We will find a
least-squares fit of the line to the data. The error at each point is the distance from yi to the point
on the line, or ei = yi − (mxi + b). The figure below shows a least squares line through 4 points, and
the errors at each point.
(x4, y4)
e4
(x2, y2)
e3
e2
(x3, y3)
e1
(x1, y1)
The error vector e = (e1 , e2 , . . . , em ) contains all of these errors. The least squares line is the line
that minimizes the sum of the squares of the errors, which is the same as minimizing the length of
the error vector. That is, the least squares line minimizes
n
X n
X
E = kek2 = e2i = (yi − mxi − b)2
i=1 i=1
To find the least squares line we find the least squares solution to the inconsistent system (19.1).
This inconsistent system can be written in matrix-vector form as Av = y, where
x1 1 y1
" #
x2 1 y2
m
A=
.. ,
.. v= , y=
..
(19.2)
. . b .
xn 1 yn
19 LEAST SQUARES 114
The vector y is given, and has the y−coordinates. The matrix A is given, and has the x−coordinates.
We want to solve for the vector v to give us the slope and y−intercept of the least-squares line. So
we find a least-squares solution by solving the normal equations
AT A v = AT y (19.3)
The system (19.3) for the least squares line is the 2 × 2 system
Example 19.1. Find the least squares line through the points (0, 0), (2, 1), (3, 2).
19 LEAST SQUARES 115
Curve fitting
In most situations a line is not the best fit for data. Usually we will try to fit some kind of curve to
the data. This can be done to estimate some trend, or uncover some law underlying the observations.
For example, one might try to fit population data with an exponential function to try to predict the
future population. The method of least-squares can be used in these cases as well.
A classic example of curve fitting to discover a natural law was performed by the German astronomer
Johannes Kepler in the early 1600’s. Kepler tried to make sense of the incredibly accurate (naked
eye!) observations of the Danish astronomer Tycho Brahe on the motion of the planets. Kepler found
that the data did not fit the prevailing geocentric models nor the heliocentric model of Copernicus.
These models were based on a 2000+ year old tradition that the motions of the planets were made
up of circles. Kepler proposed the revolutionary idea that the planets not only orbited the sun,
instead of the other way around, but that they did so in ellipses! This model fit the data better,
although not by that much! Kepler formulated three laws of planetary motion based on this model.
His third law states that the square of the orbital period of a planet is proportional to the cube of
the semi-major axis of its orbit. In other words,
T 2 ∝ x3 (19.5)
where T is the orbital period and x is the semi-major axis. Notice that we can take square roots on
both sides and write the third law as
T = Cx3/2
The figure below shows the orbital period vs the mean distance from the sun of the first four planets.
The curve is T = 0.0011x3/2 .
orbital period (years)
2
1.8
Mars
1.6
1.4
1.2
1 Earth
0.8
0.6 Venus
0.4
0.2 Mercury
0
0 50 100 150
Mean distance from sun (mil miles)
Let’s see how we can use the tools we have developed to fit a curve like this to the data. Suppose
we suspect that the data follows some law like this, but we want to show that this is the best fit. So
we can try to fit the data to a curve
T = Cxm (19.6)
where T is the period and x is the distance from the sun. If we can show that the best fit is when
m = 3/2, that would be strong evidence for the third law.
19 LEAST SQUARES 116
So we try to fit a curve (19.6) to the data. In other words, we try to find the best values of C and
m so that the error between the curve and the data is as small as possible. Equation (19.6) is not a
line, so we cannot use least squares directly. But, if we take logs of both sides, we get
m ln x + ln C = ln T
which is linear in m and ln C. Now suppose we have data for 4 of the planets. Then we get the
system of 4 equations
ln x1 1 " # ln T1
ln x 1 m ln T
2 2
=
ln x3 1 ln C ln T3
ln x4 1 ln T4
" #
m
This is a system of the form Av = b, where v = . Solving the normal equations
ln C
AT A v = AT b
Planet Orbital period (yr) Ave dist from sun (million miles)
Mercury 0.240846 36
Venus 0.615 67.2
Earth 1 93
Mars 1.881 141.6
Jupiter 11.86 483.6
Saturn 29.456 886.7
p(x) = ax2 + bx + c
Our goal is to find the “best” parabola, which means finding the best values of a, b and c. If the
parabola fits exactly through all the points then
p(xi ) = yi , i = 1, 2, . . . , n,
but generally the data will not lie exactly on a parabola. To find the best parabola, we set up the
inconsistent system p(xi ) = yi , which can be written as
ax21 + bx1 + c = y1
ax22 + bx2 + c = y2
..
.
ax2n + bxn + c = yn
19 LEAST SQUARES 117
The matrix A above is called a Vandermonde matrix. As long as the xi ’s are distinct, the columns
will be LI. We then need to find the best approximation to Ac = y. This is done by solving the
normal equations
AT A v = AT y
Example 19.2. Find the best parabola through the points (0, 0), (1, 2), (2, 1), (3, 0).
9 3 1 0
Solving this system by elimination gives a = −3/4, b = 43/20, c = 3/20. So the least-squares
parabola through the 4 points is
3 43 3
p(x) = − x2 + x +
4 20 20
The least-squares parabola and the four points are shown below.
2.0
1.5
1.0
0.5
-0.5
As a final remark in this section, we note that the method above for finding the least-squares parabola
can be generalized in a natural way to finding the least squares polynomial of any degree through n
points.
19 LEAST SQUARES 118
Exercises
1. You wish to measure some quantity x, and take a series of measurements from which you obtain
values b1 , b2 , . . . , bn . If they were all the same, you would have x = bi . If they are different,
what is the least squares solution to this problem?
2. Find the least squares line through the points (−1, 0), (0, 1), (1, 1).
3. Find the best horizontal line through the 3 points in problem #2.
4. Find the best parabola of the form y = a + bx2 through the points (−1, 0), (0, 1), (1, −2).
5. Find the least-squares line through the points (0, 0), (1, 2), (3, 3), (4, 4).
6. Find the least-squares parabola through the 4 points in the problem #5. Sketch the points,
the least-squares line, and the least-squares parabola on the same graph.
20 GRAM-SCHMIDT ORTHOGONALIZATION 119
20 Gram-Schmidt orthogonalization
In many cases it is quite useful to obtain a basis consisting of orthogonal vectors. For one thing, it
is easy to find the coefficients of any vector in the span of a set of orthogonal vectors. Generally, if
u = c1 v1 + c2 v2
then we have to solve a system of equations to find the coefficients c1 and c2 . But, if v1 and v2 are
orthogonal, we can take the dot product with v1 on both sides:
u · v1 = (c1 v1 + c2 v2 ) · v1
= c1 v1 · v1 + c2 v2 · v1
| {z }
=0
= c1 v1 · v1
u · vi
u = c1 v1 + c2 v2 + · · · + cn vn , ci = (20.1)
vi · vi
Example 20.1. Find the coefficients of (2, 1) in the basis {(1, 1), (−1, 1)}.
20 GRAM-SCHMIDT ORTHOGONALIZATION 120
a
b e
b"a
a"a
a
When we subtract the projection off from b we get the vector e, which is orthogonal to a. Thus,
we can start by letting v1 = a, and take v2 = e to be the vector obtained by subtracting off the
projection of b. That is,
v1 = a
b·a
v2 = b − a
a·a
Then v1 and v2 are orthogonal, and they are linear combinations of the vectors a and b, and hence
they have the same span.
Even better than orthogonal is orthonormal, which means orthogonal and length one. A set of vectors
q1 , . . . , qn is orthonormal if (
0 if i 6= j
qi · qj =
1 if i = j
It is even easier to find the coefficients if we have an orthonormal basis. Equation (20.1) becomes, if
q1 , q2 , . . . qn are orthonormal,
u = c1 q1 + c2 q2 + · · · + cn qn , ci = u · qi (20.2)
To find an orthonormal basis, we do Gram-Schmidt and then divide each vector by its length.
v1 = u1
u2 · v1
v2 = u2 − v1
v1 · v1
u3 · v1 u3 · v2
v3 = u3 − v1 − v2
v1 · v1 v2 · v2
and so on. Generally,
j−1
X uj · vi
vj = uj − vi (20.3)
i=1
vi · vi
Example 20.6. Find an orthogonal basis for the span of {(1, 1, 1), (1, 0, 0), (0, 1, 0)}.
20 GRAM-SCHMIDT ORTHOGONALIZATION 123
Exercises
1. Find an orthonormal basis for the span of {(−1, 0, 1), (1, 1, 2)}.
" #
1 2
2. In this problem we will find the QR factorization of the matrix A = .
1 1
" # " #
1 2
(a) Start with the vectors a = and b = . Perform the Gram-Schmidt algorithm to
1 1
obtain two orthogonal vectors v1 and v2 in terms of a and b.
(b) Normalize the vectors you obtained in part (a) to obtain orthonormal vectors q1 , q2 .
(c) Find the coefficients of a and b in terms of q1 and q2 . That is, find the rij in
Av = λv (21.1)
λ − eigenvalue, v − eigenvector
" #
1 1
Example 21.1. Find the eigenvalues and eigenvectors of A = .
1 1
This (21.3) is called the characteristic equation of A. Solutions of equation (21.3) are eigenvalues of
A. Once one has the eigenvalues of A, one can find the eigenvectors by solving equation (21.2).
The key thing about eigenvectors is the direction. We are free to scale them however we like.
21 INTRODUCTION TO EIGENVALUES & EIGENVECTORS 125
Example 21.2. Show that if v is an eigenvector with associated eigenvalue λ, then any scalar
multiple of v is also an eigenvector associated with λ.
" #
0 −3
Example 21.3. Find the eigenvalues and eigenvectors of A = .
−1 2
1 1 0
Example 21.4. Find the eigenvalues and eigenvectors of A = −2 1 3.
−1 1 2
21 INTRODUCTION TO EIGENVALUES & EIGENVECTORS 126
" #
1 −2
Example 21.5. Find the eigenvalues of A = .
1 3
The last example illustrates an important fact about eigenvalues of real matrices: Eigenvalues don’t
have to be real valued. If they are complex, they come in complex conjugate pairs.
An important quantity associated with eigenvalues is the trace of a matrix. This is the sum of the
diagonal elements.
trace of A: tr(A) = a11 + a22 + · · · + ann
4. The eigenvalues of AT are the same as the eigenvalues of A. (This does not hold
for the eigenvectors!)
Markov matrices
Markov matrices arise in situations where there is change between states and there is conservation
of the total amount of stuff. Such matrices have the property that the sum of the columns is equal
to 1.
Example 21.6. Suppose each year 20% of the population of city A moves to city B. The remaining
80% of city A stays in city A. Also, each year, 10% of the population of city B moves to city A.
The other 90% of city B stays in city A. Let xn be the population of city A at year n, and yn the
population of city B at year n. Initially the entire population lives in city A (city B is a new city).
Determine the population of each city at year n, and the population in the long run as n → ∞.
Solution. First we write down how the population at year n + 1 relates to the population at year n:
where " #
.8 .1
A=
.2 .9
" #
x
is a Markov matrix. Let the vector xn = n , so that the above equations can be written as
yn
xn+1 = Axn
Notice that
x1 = Ax0
x2 = Ax1 = AAx0 = A2 x0
x3 = Ax2 = AA2 x0 = A3 x0
If λ1 , v1 and λ2 , v2 are eigenvalue-eigenvector pairs, then write the initial distribution in the basis of
eigenvectors:
x0 = c1 v1 + c2 v2
Using property 7 of eigenvalues, we then have
xn = c1 λn1 v1 + c2 λn2 v2
Thus, determining the distribution at any time n is reduced to finding the eigenvalues and eigenvec-
tors of A.
21 INTRODUCTION TO EIGENVALUES & EIGENVECTORS 129
Let us find the eigenvalues and eigenvectors. We will use properties 1 & 2 of eigenvalues to determine
the eigenvalues. Notice that
tr(A) = 1.7 det(A) = 0.7
Two numbers that add to 1.7 and multiply to 0.7 are
λ1 = 1, λ2 = 0.7
Thus, these are the eigenvalues. To find the eigenvectors, look for vectors in the nullspace of A − λi I:
" # " # " #
−.2 .1 −2 1 1
λ1 = 1 : A−I = ∼ ⇒ v1 =
.2 −.1 0 0 2
" # " # " #
.1 .1 1 1 −1
λ2 = .7 : A − .7I = ∼ ⇒ v2 =
.2 .2 0 0 1
" #
1
Now, since the entire population is initially in city A, we can call x0 = . (The units are the total
0
population.) We need to write x0 in terms of the eigenvectors:
" # " # " #
1 −1 1
x0 = c1 v1 + c2 v2 or c1 + c2 =
2 1 0
Solving this linear system for c1 , c2 gives us c1 = 1/3, c2 = −2/3. Now we can write down the
solution!
xn = c1 λn1 v1 + c2 λn2 v2
= c1 v1 + c2 0.7n v2 (since λn1 = 1n = 1)
" # " #
1 1 2 −1
= − 0.7n
3 2 3 1
This gives us the population distribution at any year n. To determine the limiting distribution, we
use the fact that 0.7n → 0 as n → ∞. Thus, the limiting population distribution is
" #
1 1
lim xn =
n→∞ 3 2
That is, 1/3 of the population ends up in city A and 2/3 ends up in city B. Notice that the limiting
distribution is determined by the eigenvector v1 .
21 INTRODUCTION TO EIGENVALUES & EIGENVECTORS 130
Exercises
1. What are the eigenvalues and eigenvectors of the identity matrix I?
2. For A in Example 21.3, calculate the eigenvalues and eigenectors of AT . Show that the eigen-
values are the same, but the eigenvectors are different.
3. Some bad news: The eigenvalues of A + B are not the eigenvalues of A plus the eigenvalues of
B. Show this directly for A as in Example 21.3 and
" #
10 −9
B=
12 −11
That is, calculate the eigenvalues of B and the eigenvalues of A + B. Show that the eigenvalues
of A + B are not obtained by adding the eigenvalues of A with the eigenvalues of B.
5. More good news: The eigenvalues of AB are the same as the eigenvalues of BA. Why? (Hint:
If ABv = λv, then BABv = λBv.) Are the eigenvectors the same?
" #
cos θ − sin θ
6. The rotation matrix A = cannot have real eigenvalues for most θ. Why? Show
sin θ cos θ
that the eigenvalues are, in fact, λ = cos θ ± i sin θ.
7. Suppose that each year, 40% of iPhone users switch to Android. At the same time, 10% of
Android users switch to iPhone. We want to determine what happens in the long run. Let
xn be the fraction who use iPhone after n years and yn the fraction who prefer Android. (So
xn + yn = 1.) Construct the matrix that gives
" # " #
xn+1 x
=A n
yn+1 yn
8. Explain why, if A is a matrix whose columns add to 1, then λ = 1 is an eigenvalue. Thus, for
the 2 × 2 Markov matrices, one can determine both eigenvalues just by calculating the trace.
22 DIAGONALIZING A MATRIX 131
22 Diagonalizing a matrix
One reason eigenvectors are important is that they form the ‘best’ basis for a given matrix. Suppose
A has LI eigenvectors v1 , . . . , vn , and eigenvalues λ1 , . . . , λn . That is,
Avi = λi vi , i = 1, . . . , n
where Λ is the diagonal matrix with the eigenvalues on the diagonal. In other words, we have
AX = XΛ
Now, since the eigenvectors are LI, X is invertible. Thus, we can multiply on the left by X −1 to get
a diagonal matrix:
X −1 AX = Λ
We have diagonalized the matrix! We can also multiply on the right to get A in terms of Λ:
A = XΛX −1
" #
0 −3
Example 22.1. Diagonalize A = .
−1 2
22 DIAGONALIZING A MATRIX 132
To see why the eigenvectors are the ‘best basis’, suppose we wish to solve Ax = b. Change basis by
letting
x = Xy and b = Xc
Now you are in the basis of eigenvectors. Then
AXy = Xc ⇒ Λy = c
The system of equations becomes diagonal (and easy to solve) in the basis of eigenvectors. Of course,
you would never want to solve a system of linear equations this way. But, it illustrates the point that
in the basis of eigenvectors, systems become diagonalized. We will use this idea to solve systems of
differential equations.
Powers of a matrix
Generally, computing the powers of a matrix can be cumbersome. There is one case where it is easy,
though:
dk1
d1
d2
dk2
is diagonal, then D k =
Example 22.2. Show that if D = .. .. .
.
.
dn dkn
If we have diagonalization, we can use this to compute the powers of A, since if A = XΛX −1 ,
Ak = (XΛX −1 )k
= (XΛX −1 )(XΛX −1 ) · · · (XΛX −1 )
| {z }
k times
= XΛX −1 XΛX −1
XΛX −1 · · · XΛX −1
= XΛk X −1
k
λ1
=X
.. −1
X
.
λkn
22 DIAGONALIZING A MATRIX 133
" #18
3 −4
Example 22.3. Compute
2 −3
A∞ = lim An
n→∞
22 DIAGONALIZING A MATRIX 134
Diagonalizability
Recall that we could diagonalize, and write A = XΛX −1 , as long as we could find the invertible
X made up of eigenvectors of A. A will be diagonalizale, then, as long as we can find enough LI
eigenvectors. In fact, for an n × n matrix A we need n of them.
The problem with the matrix in Example 22.5 is that it has only one eigenvector (v = (1, 0)). But, a
matrix can
" have# only one eigenvalue and still be diagonalizable, as long as it has enough eigenvectors,
0 0
e.g. A = .
0 0
How can we tell when a matrix is diagonalizable? There is one case, which is actually the generic
case, when we can do it.
Proof.
22 DIAGONALIZING A MATRIX 135
When A doesn’t have n distinct eigenvalues, it may still be diagonalizable as long as it has enough
LI eigenvectors. To tell how many we need for each eigenvalue, we need the concept of multiplicity.
There are two kinds of multiplicity.
Definition 22.1. Let p(λ) = det(A − λI) be the characteristic polynomial. Then the eigenvalues
are the roots of this polynomial. From the fundamental theorem of algebra, we know that we can
factor p into a product of factors:
where n1 + n2 + · · · + np = n and the λi are distinct. The power ni on the (λi − λ)ni term is called
the algebraic multiplicity of the eigenvalue λi .
Definition 22.2. For each eigenvalue, the eigenvectors satisfy (A − λi )v = 0. There may be one or
more eigenvectors for a given eigenvalue, but there cannot be more than the algebraic multiplicity.
The number of LI eigenvectors for λi is called the geometric multiplicity of λi . The geometric
multiplicity is the dimension of the nullspace of A − λi I, called the eigenspace. In other words,
Now, there will be enough eigenvectors as long as the number of LI eigenvectors is the same as the
algebraic multiplicity. In other words,
Example 22.6. Both A and B have eigenvalues λ = 0 and λ = 2. One is diagonalizable, the other
is not. Determine which is which.
1 1 0 2 0 0
A = −2 1 3 , B = 1 1 −1
−1 1 2 1 −1 1
22 DIAGONALIZING A MATRIX 136
Exercises
1. TRUE or FALSE (with reasons): If the eigenvalues of a 3 × 3 matrix A are 1, 3, then A is
(a) invertible (b) diagonalizable (c) defective
" #1024
3 2
5. Show that = I.
−5 −3
(a) Show that λ = 1 is an eigenvalue. Hint: The columns of P are eigenvectors. Hence
every vector in the column space is an eigenvector. If rank(P ) = r, what is the geometric
multiplicity of the eigenvalue λ = 1?
(b) If P is invertible, then P is diagonalizable by part (a). If P is not invertible, λ = 0 is an
eigenvalue. What is the geometric multiplicity of the eigenvalue λ = 0 in this case?
(c) Conclude that λ = 0 and λ = 1 are the only eigenvalues, and that P is always diagonal-
izable.
8. Suppose A = XΛX −1 is diagonalizable, and let p(λ) = (λ1 − λ)(λ2 − λ) · · · (λn − λ) be the
characteristic polynomial. Subsitute A into this polynomial to get
the zero matrix. A matrix satisfies its own characteristic equation. This fact is called the
Cayley-Hamilton Theorem. Note that this holds even if A is not diagonalizable, but is a bit
more technical to prove in the defective case.
23 SYSTEMS OF DIFFERENTIAL EQUATIONS 137
y10 = a y1 + b y2
y20 = c y1 + d y2
The methods of this section generalize naturally to systems of 3, 4, etc. equations. Notice that we
can write this as a matrix-vector equation:
" # " #" #
d y1 a b y1
= , or y0 = Ay
dt y2 c d y2
The key to solving y0 = Ay is to find the eigenvalues and eigenvectors of A. Suppose we have them:
Avi = λi vi
h i
Let X = v1 v2 be the eigenvector matrix. We change bases to the basis of eigenvectors:
y = Xz, or y = z1 v1 + z2 v2
z10 = λ1 z1
z20 = λ2 z2
z1 = c1 eλ1 t , z2 = c2 eλ2 t
Then we can recover the solution of the original equation by using y = Xz:
Thus, the problem of solving a linear system of DEs is reduced to finding the eigenvalues and
eigenvectors. Let’s look at the different cases that can occur.
23 SYSTEMS OF DIFFERENTIAL EQUATIONS 138
" #
−2 1
Example 23.1. Real, negative eigenvalues – exponential decay. Solve y0 = y. Find the
1 −2
" #
6
general solution, and the solution with y(0) = .
2
" #
−2 1
Example 23.2. Complex eigenvalues – spiral decay. Solve y0 = y. Find the solution
−1 −2
" #
6
with y(0) = .
2
23 SYSTEMS OF DIFFERENTIAL EQUATIONS 139
" #
0 1
Example 23.3. Imaginary eigenvalues – periodic solutions. Solve y0 = y
−1 0
The last example illustrates an important principle. If the eigenvalues are pure imaginary, solutions
are periodic. Given any initial condition, the solution comes back exactly to that condition and then
starts again. This kind of motion is called conservative, since it conserves lengths over time.
The first two examples above are examples where the origin (0, 0) is stable. Regardless of the initial
condition, y tends to the origin in those cases as t → ∞. This is stable motion, and will occur
whenever the real parts of the eigenvalues are both negative.
In fact, we can tell the qualitative behavior of a (two dimensional) system without even calculating
the eigenvalues. Since we know the solution (23.1) in terms of the eigenvalues, we just need to know
if they are real or complex, and whether the real parts are negative or positive in order to tell if we
have stable, conservative, or unstable motion. Recall that, for a 2 × 2 matrix A, the eigenvalues are
given in terms of the trace T and determinant D by
√
T ± T 2 − 4D
λ1,2 =
2
If D > 0, the sign of the real parts of the eigenvalues is the same as the sign of the trace T . If D < 0,
both eigenvalues will be real: one will be negative and the other will be positive. Thus, we have
T <0 stable
D>0 T >0 unstable
T =0 conservative
D<0 saddle (unstable)
23 SYSTEMS OF DIFFERENTIAL EQUATIONS 140
We get oscillations if we have an imaginary component of the eigenvalue, which occurs when T 2 −
4D < 0. Thus, we have a more complete picture by locating the trace and determinant in the
trace–determinant plane:
Underdamping B 2 < 4C
Critical damping B 2 = 4C
Overdamping B 2 > 4C
The same idea will work for any nth order linear equation. We can always convert it into a system
of n first order equations. For example, the third order equation
y 000 + By 00 + Cy 0 + Dy = 0
can be written as
y 0 1 0 y1
d 1
y1 = y, y2 = y 0 , y3 = y 00 , y2 = 0 0 1 y2
dt
y3 −D −C −B y3
23 SYSTEMS OF DIFFERENTIAL EQUATIONS 141
Exercises
1. Find the general solution y "= c1 e#λ1 t v1 + c2 eλ2 t v2 and the solution to the initial value problem
3 1
with y(0) = (1, 2), for y0 = y.
3 5
" #
0 a
2. Find the solutions to y0= y. Assume a, b > 0. In one direction the solution grows,
b 0
and in the other direction the solution shrinks. On which line do the solutions tend toward as
t → ∞?
" #
a a
3. Find the solutions to y0 = y. Assume a + b 6= 0. Can the motion be stable?
b b
4. Here is a case (the simplest case) where we don’t have two LI eigenvectors:
" #
0 0 1
y = y
0 0
" #
1
One eigenvector/eigenvalue pair gives us one solution y(t) = c1 . To find the other one, use
0
the equations: y10 = y2 , y20 = 0. Solve the second and substitute it into the first. Find the
general solution by taking linear combinations.
5. For the given matrices, determine if the motion is stable, unstable or conservative.
" # " # " #
−2 −3 −2 3 0 −3
A= , B= , C=
−4 −5 −4 −5 3 0
6. Two large rooms initially contain 40 and 10 people, respectively. A door opens between the two
rooms, and people tend to move from the more crowded room to the less crowded. Let y1 be the
number in the first foom and y2 the number in the second room. Then the movement between
the rooms is proportional to the difference y1 − y2 . Assume the proportionality constant is one,
so that
dy1 dy2
= y2 − y1 , and = y1 − y2
dt dt
(a) Show that the total y1 + y2 is constant (50 people).
(b) Find the matrix A in y0 = Ay, and its eigenvalues and eigenvectors.
(c) Find the solution with y1 (0) = 40, y2 (0) = 10. What are y1 and y2 at t = 1 and t = ∞?
7. A real 3 × 3 matrix A will give stable motion in y0 = Ay if the real parts of the eigenvalues are
negative. Use the fact that the trace is the sum of the eigenvalues and the determinant is the
product to show that if we have stable motion, then tr(A) < 0 and det(A) < 0. However, the
converse is not true. Find an example where tr(A) < 0 and det(A) < 0, but the motion is not
stable.
24 THE EXPONENTIAL OF A MATRIX AND SOLUTIONS WITH INPUTS 142
y0 = Ay + q(t) (24.1)
As before, we will find that the general solution is the sum of the null solution and a particular
solution. The null solution is the solution of the homogeneous problem y0 = Ay, which we did in the
last section. The question, then, is how to find the particular solution.
Recall the case when A = a is just a scalar, which we studied in §2:
Z t
0 at
y = ay + q(t), ⇒ y(t) = e y(0) + ea(t−s) q(s) ds
0
In fact, the solution of equation (24.1) is exactly the same! We just have to change a to A:
Z t
y(t) = eAt y(0) + eA(t−s) q(s) ds
0
We only have to figure out what is meant by eAt , i.e. the exponential of a matrix. To do this, let’s
recall the series for ex :
x2 x3
ex = 1 + x + + + ···
2 3!
The exponential of a matrix is defined the same way:
1 1
Matrix exponential: eAt = I + At + (At)2 + (At)3 + · · ·
2 3!
∞
X 1
= (At)n
n=0
n!
Notice that eAt is another matrix of the same size as A. From this, right away, we see that e0 = I,
the exponential of the zero matrix is the identity matrix. Other key properties are:
d At
1. e = AeAt , 2. eAt eAs = eA(t+s) (24.2)
dt
= XeΛt X −1
where
P 1
(λ1 t)k e λ1 t
k! P 1 k
∞
X 1 k! (λ2 t)
eλ2 t
eΛt = (Λt)k =
=
k! .. ..
k=0
.
.
P 1 k
k! (λn t) eλn t
eλ1 t
At
e λ2 t
−1
e =X .. X
.
e λn t
24 THE EXPONENTIAL OF A MATRIX AND SOLUTIONS WITH INPUTS 144
y 00 + 4y = 0, y(0) = 3, y 0 (0) = 8
24 THE EXPONENTIAL OF A MATRIX AND SOLUTIONS WITH INPUTS 145
Z t
y(t) = e At
y(0) + eA(t−s) q(s) ds solves y0 = Ay + q(t) (24.3)
0
Reason:
Exercises
1. Here is an example of a matrix" that
# is not diagonalizable,
" # but the matrix exponential can be
0 1 0 0
found fairly easily. Take A = . Then A2 = . Find eAt , and therefore the solution
0 0 0 0
of y0 = Ay. Compare this result with the solution you found in §23 #4.
" # " #
1 b et b(et − 1)
2. Let A = . Compute An . Use this to find eAt = using the series definition.
0 0 0 1
Use Problem 2 to compute eA and eB . Compute eA+B , and then show that eA eB , eB eA and
eA+B are all different.
5. Consider the “room problem”, Exercise 6 from §23. Suppose we introduce some flow into the
first room. Then we have the following system.
dy1 dy2
= y2 − y1 + 1, and = y1 − y2
dt dt
Solve this with the same initial conditions as in problem #6. What happens in the long run?