Math 250B Lectures Notes

Math 250B Intro to Differential Equations and Linear Algebra Fall 2019
Course structure & lecture notes
I. Differential equations
1. First order differential equations

(a) Differential equations and where they come from
(b) First order linear differential equations
(c) Models of growth and decay
2. Second order differential equations
(a) The harmonic oscillator
(b) Constant coefficient 2nd order DE
(c) Undetermined coefficients & variation of parameters
II. Linear algebra
1. Linear equations
(a) Systems of linear equations & matrix-vector multiplication
(b) Matrix multiplication
(c) Solving systems of linear equations by elimination
(d) Inverse matrices
(e) Symmetric and orthogonal matrices
(f) Determinants
2. Vector spaces & subspaces
(a) Column space
(b) Nullspace
(c) The complete solution to Ax = b
(d) Linear independence, basis and dimension
(e) The fundamental theorem of linear algebra
(f) Projections
(g) Least squares
(h) Gram-Schmidt orthogonalization
3. Eigenvalues & Eigenvectors
(a) Intro & basic properties
(b) Diagonalizing a matrix
(c) Systems of differential equations
(d) The exponential of a matrix & solutions with inputs
CONTENTS 2
Lecture Notes
Contents
1 Differential equations and where they come from 3
2 First order linear differential equations 8
3 Models of growth and decay 14
4 The harmonic oscillator 18
5 Constant coefficient 2nd order differential equations 23
6 Undetermined coefficients & variation of parameters 28
7 Systems of linear equations & matrix-vector multiplication 36
8 Matrix multiplication 43
9 Solving systems of linear equations by elimination 48
10 Inverse matrices 53
11 Symmetric and orthogonal matrices 58
12 Determinants 63
13 Column space 71
14 The nullspace of A 81
15 The complete solution to Ax = b 86
16 Linear independence, basis and dimension 90
17 The fundamental theorem of linear algebra 98
18 Projection 105
19 Least squares 113
20 Gram-Schmidt orthogonalization 119
21 Introduction to eigenvalues & eigenvectors 124
22 Diagonalizing a matrix 131
23 Systems of differential equations 137
24 The exponential of a matrix and solutions with inputs 142

1 DIFFERENTIAL EQUATIONS AND WHERE THEY COME FROM 3
1 Differential equations and where they come from

Differential equations are all about change. A differential equation relates the derivative
dy
y 0 (t) =
dt
which is the rate of change of y, i.e. the slope of the curve, to y and/or t. The equation y 0 (t) = f (t, y)
means that y is changing according to the rule prescribed by f .
Where differential equations come from

Many differential equations arise from models of physical processes. One of the most important
equations in physics is
Newton’s (2nd ) Law: F = ma
Example 1.1. Free fall. The downward force of gravity acting on an object with mass m is F =
mg, where g is a constant, approximately given by g = 32 ft/sec2 . If you drop a stone, how far will
it fall in 1 second? 2 seconds?
Example 1.2. The spring. Hooke’s Law for the force acting on a spring is: F = −ky
Example 1.3. Growth and decay. If the amount y of a substance grows or decays in proportion
to the amount present, we have
dy
= ky
dt
If y grows, k > 0; if y is decaying, k < 0.
Example 1.4. Newton’s Law of Cooling. The rate of change of the temperature of an object is
proportional to the difference of its temperature and the temperature of the medium.
dT
= k (Tm − T )
dt
Examples 1.3 and 1.4 are examples of separable equations, that is equations that can be written in
the form
dy
p(y) = q(t) (1.1)
dt
Recall the technique for solving a separable DE: “Multiply both sides of (1.1) by dt” and integrate.
Z Z
p(y)dy = q(t)dt
Example 1.5. Solve the Initial Value Problem (IVP)
dy 1 + y2
= , y(1) = 1
dt 2ty
Example 1.6. Find the general solution to the exponential growth/decay problem (Example 1.3)
and Newton’s Law of Cooling (Example 1.4).
Example 1.7. Orthogonal projection. Differential equations also arise in many other areas
of mathematics where one wants to know how two quantities relate given that we know how they
change.
A nice geometrical example involves finding families of curves that intersect at right angles. Notice
that if we have an equation
F (x, y, c) = 0
and we hold c constant, then we have an equation in x and y. This describes a curve in the xy−plane.
The set of all such curves is a family of curves. For example, if
F (x, y, c) = x2 + y 2 − c2
F (x, y, c) = 0 ⇔ x2 + y 2 = c2
So, for each c the equation F (x, y, c) = 0 describes a circle. The equation F (x, y, c) = 0 describes
the family of all circles in the xy−plane.
The orthogonal projection problem is the following: Given a family of curves
F (x, y, c) = 0
find another family of curves

G(x, y, c) = 0
such that each curve in the second family intersects the curves in the first family at right angles.
Recall that if two lines with slopes m1 , m2 are perpendicular, then m1 m2 = −1, or m2 = − m11 . Thus,
dy
if curves F (x, y, c) = 0 have slopes m1 = = f (x, y), then to find the orthogonal family, we need
dx
curves with slope
1
m2 = −
f (x, y)
Since the slope is the same as the derivative, we need to solve the differential equation
dy 1
=−
dx f (x, y)
Example 1.8. Find the set of curves orthogonal to the circles.

Example 1.9. Find the set of curves orthogonal to the curves y = cx4 .
Exercises
1. A stone is dropped from a height of 1600 feet. How long does it take for the stone to hit the
ground?
2. Find the solution to the IVP
y 0 = −2y, y(0) = 1
How long does it take for the solution to decay to 1/e?
3. (a) Show that y(t) = A cos(ωt − φ) is a solution of the harmonic oscillator
y 00 + ω 2 y = 0
for any A and φ.
(b) Determine the solution with the initial conditions y(0) = a, y 0 (0) = 0.
4. Find the equation for the orthogonal trajectories to the following families of curves. For each
family, sketch a few of the curves and their orthogonal trajectories on the same graph.
(a) y = cx2
(b) y 2 = 3x + c
5. Solve these separable equations.
dy
(a) = 2ty 2
dt
(b) y 0 = yt + y + t + 1
dy
(c) = tr y s
dt
6. A dead body is found in a room with ambient temperature of Tm = 70 F. Suppose you know
that it takes 3 hours for a body to cool to 85 F.
(a) Use Newton’s law of cooling to write the temperature of the body as a function of time.
(b) Suppose you measure the temperature of the body and find that it is 75 F. How long has
the person been dead?
2 FIRST ORDER LINEAR DIFFERENTIAL EQUATIONS 8
2 First order linear differential equations

A DE is called first-order linear if it can be written in the form
dy
b(t) + a(t)y = q(t)
dt
If b(t) 6= 0, we can divide by b(t), relabel and obtain the standard form:
dy
+ a(t)y = q(t) (2.1)
dx
Homogeneous, particular and complete solutions
The way to think about linear equations is to think of the RHS term q as a source or input term. q(t)
is the input and y(t) is the response. When solving a linear system we are determining the response
from the input.
Suppose q = 0, and a is constant, so we have the homogeneous equation
y 0 + ay = 0 (hom)
This has the general solution yn (t) = Ce−at . Now, if we add a source term q (supposing it is constant
again), we have
y 0 + ay = q (full)
A solution to this equation is yp (t) = q/a (check). Now, the complete solution is
ycomplete (t) = null + particular

= yn (t) + yp (t)
= Ce−at + q/a
Notice that the null solution comes from the starting value y(0). The particular part comes from the
input q.
This is generally true. To find the complete solution to (2.1) (the general solution), we find the
general solution of the homogeneous part (the null solution) and add a particular solution. Finding
the general solution of the homogeneous part, we already know how to do: it is separable. So we
just need to be able to find a (any!) particular solution.
Sometimes it can be easy to find the particular solution. In the above instance, for example, when
p and q are both constant, we can deduce that a particular solution is just going to be a constant.
But for other inputs, it may not be so obvious, and we want a method for determining the particular
solution in general.
Rather than just pick q out of a hat, we will concentrate on the types of inputs that occur most often
in applications. These are
1. Constant source q(t) = q

2. Exponential q(t) = eαt
3. Sinusoidal q(t) = A cos(ωt + δ)
4. Step function at T q(t) = H(t − T )
5. Delta function at T q(t) = δ(t − T )
Integrating factors for finding particular solutions

Our strategy for solving (2.1) is to manipulate it into a form
d
(“something”) = “something involving only t”
dt
after which we can integrate both sides w.r.t. t.
To this end, to solve

y 0 + a(t)y = q(t)
we multiply both sides by an as yet to be determined function I(t):
dy
I(t) + I(t)a(t)y = I(t)q(t) (2.2)
dt
Now, if we choose I to be
R
a(t)dt
I(t) = e (2.3)
d
Then the LHS of equation (2.2) is [I(t)y(t)], so this can be written
dt
d
[I(t)y(t)] = I(t)q(t)
dt
which we can then integrate and solve for y. The function I(t) is called an integrating factor.
IMPORTANT CASE: a is constant, y 0 + ay = q(t)

The case when a is constant occurs widely in applications. This will occur when some quantity
tends to decay, and we add an input that counteracts this tendency. For example, the quantity of
some drug in the bloodstream will tend to decay, but if we add some quantity, this counteracts this
tendency. The input in this case would be the injection of the drug.
When a is constant, the integrating factor is I(t) = eat , and the equation we get is
d at
(e y) = eat q(t)
dt
Now integrate both sides from 0 to t (this is a definite ingegral), using s as the variable of integration:
s=t Z t
as
eas q(s)ds

e y(s)
=
s=0 0
When we solve for y(t), we get
Z t
−at −at
y(t) = e y(0) + e eas q(s) ds
0
Thus, we get a solution in terms of an integral. Notice that (as long as a > 0) the term e−at y(0)
tends toward zero, so in the long run, all that matters is the response to the input given by the
integral on the RHS.
Example 2.1. Constant source. Find the solution to Newton’s Law of Cooling by treating it as a
linear equation.
Example 2.2. Exponential source. Here the source is growing (c > 0) or decaying (c < 0) expo-
nentially. What is the response?
y 0 + 2y = 5e3t y 0 + 2y = 5e−3t
Example 2.3. Sinusoidal source. What is the response to an oscillating input?
y 0 + 2y = 5 cos(3t)
e cos(ωs) ds = √ 21 eas sin(ωs + φ) + C, where φ = tan−1 (a/ω).

R as
Hint: a +ω 2
Example 2.4. Step function source. What if a source suddenly “turns on”? Here H(t − T ) = 1 if
t ≥ T and 0 if t < T .
y 0 + 2y = 3H(t − 1), y(0) = 4
Example 2.5. Delta function source. The delta ‘function’ δ(t − T ) is defined so that
Z ∞
δ(t − T )f (t) dt = f (T )
−∞
This is a ‘function’ whose integral is 1, but is zero everywhere except at T . How can it be!? Actually,
such a function can’t exist, but it plays a very useful role in modeling. For example, a sudden
injection of a drug at time T can be modeled by a delta function.
y 0 + 2y = 3δ(t − 1), y(0) = 4
Example 2.6. Lastly, let’s look at a case where a(t) is not a constant. Determine the general
solution to the DE
dy 2
− y = t2 sin t, t > 0.
dt t
Exercises
1. Solve the IVP
dy 1
(1 + t2 ) + 4ty = , y(0) = 2.
dt (1 + t2 )2
2. Find the linear DE that produces the null and particular solutions
yn (t) = Ce−t and yp (t) = 2(e3t − 1)
3. Solve y 0 + 2y = 3e−4t + 5, starting with y(0) = 0. What happens as t → ∞?
4. Solve and sketch the solution.
(a) y 0 + y = 2e−t , y(0) = 3

(b) y 0 + y = 2 cos t, y(0) = 3
(c) y 0 + y = 2H(t − 4), y(0) = 3
(d) y 0 + y = 2δ(t − 4), y(0) = 3
5. Solve y 0 + y = 3δ(t − 4) + 2H(t − 5). What happens as t → ∞? Compare with the solution of
y 0 + y = 2.

6. Solve y 0 +3y = 3H(t−1) 1 − e1−t . Sketch the solution and determine the long-term behavior.
3 MODELS OF GROWTH AND DECAY 14
3 Models of growth and decay

Exponential growth
Let’s talk about something practical for a change: money. If you invest P dollars in an account
that returns an annual interest of r%, compounded continuously, then the amount in your account
is governed by
y 0 = ry, y(0) = P
The continuous compounding is key here: the growth of the account is proportional to the amount
present.
Example 3.1. Suppose you invest $1 in an account paying 10% interest. How long does it take for
your account to reach $100? $200? $1 million?
Exponential growth will occur in any situation where the growth of some substance, whether it be
money, bacteria, etc., is proportional to the amount present.
Example 3.2. Show that if the doubling time is t2 , then
y(t) = y(0)2t/t2
What is the doubling time for the account paying 10% interest?
Exponential decay
Now suppose the reverse is happening–a substance is decaying in proportion to the amount present.
This occurs in radioactive decay. Some substances, e.g. a certain isotope of Carbon, C-14, contin-
uously decay. The more C-14 there is, the faster it decays; the less there is, the slower it decays.
Thus, we have
y 0 = −ky
Example 3.3. Show that if t1/2 is the half-life, then

t/t1/2
1
y(t) = y(0)
2
Example 3.4. C-14 has a half-life of 5700 years. Suppose you are able to measure the amount of
C-14 in an object, and you are able to determine that the amount currently present is 1/10 of the
amount T years ago. What is T ?
Mixing Problems
Consider the mixing problem of determining the amount of some solute in a tank. Solution flows
into the tank at some rate, with some concentration, and solution flows out of the tank at some rate
with some concentration. Thus, the amount of the solute is modeled by
dA
= flow in − flow out
dt
Example 3.5. A tank contains 8 L of water in which is dissolved 32 g of a chemical. A solution

containing 2 g/L of the chemical flows into the tank at a rate of 4 L/min, and the well-stirred mixture
flows out at a rate of 2 L/min. Determine the amount and concentration of the chemical in the tank
after 20 minutes.
Solution. We know the initial amount (A(0) = 32). The rate of flow in is
flow in = (chemical concentration in) × (rate of flow in)

g L
=2 ·4
L min
g
=8
min
The flow out depends on the amount A in the tank and the volume of the mixture:
flow out = (chemical concentration) × (rate of flow of mixture out)

A
= ×2
V
Now, the volume is determined by V 0 = 2, so V (t) = 2t + 8. Thus, the equation for A is
dA 1
=8− A
dt t+4
Logistic growth
The growth model y 0 = ry is not very satisfactory for most populations. It doesn’t account for
the fact that most populations will stop growing after reaching some level. This is due to many
factors, including buildup of waste products, competition for food, and lack of space to grow. These
factors cause the growth rate to slow when the population gets larger. To account for this effect,
we can modify the exponential growth model by adding a term that slows the growth rate when the
population gets large. One of the most popular such models is the logistic model:
y0 = r y − b y2
This has the property:
y small: y 0 ≈ ry
y large: y 0 ≈ −by 2
So, when y is small, y will grow; when y is large, y will decline.
Example 3.6. Without solving the equation, sketch representative solution curves. What happens
in the long run?
Notice that the logistic equation is separable, so we can solve it using techniques already learned. This
involves using partial fractions to evaluate an integral. Another approach is to make a transformation
that turns the equation into a linear equation. We can solve the linear equation and get back the
solution to the logistic equation.
Example 3.7. a) Make the transformation z = y −1 . Show that the equation for z is
z 0 = −rz + b
b) Solve the equation for z, and then recover the solution for y.
r
c) Show that lim y(t) = . The ratio K = r/b is called the carrying capacity of the population.
t→∞ b
Exercises
1. A fungus doubles in size every day, and it weighs a pound after 20 days. If another fungus
were twice as large at the start, would it weigh a pound after 10 days?
2. Charcoal is found in a cave. It is determined that the amount of C-14 present is 1/50 the
amount when the wood was burned. How long ago was it burned?
3. The E. coli bacteria has a volume of 6 µm3 . In optimal conditions, an E. coli bacteria will
double about every 30 minutes. Under these conditions, how long will it take for a single
bacterium to grow to fill a thimble with volume 1 cm3 ? How long will it take for the volume
to fill the entire earth 1.08 × 108 km3 ?
4. A tank contains 2 L of water in which is dissolved 4 g of a chemical. A solution containing 1

g/L of the chemical flows into the tank at a rate of 4 L/min, and the well-stirred mixture flows
out at a rate of 3 L/min. Determine the amount and concentration of the chemical in the tank
after 10 minutes.
5. A tank contains 5 L of water in which is dissolved 10 g of salt. A solution containing 4 g/L of

salt flows into the tank at a rate of 1 L/min, and the well-stirred mixture flows out at a rate
of 1 L/min.
(a) Without solving the equation, determine the amount of chemical in the mixture in the
long run.
(b) Determine the amount A(t) of salt in the container as a function of time. Sketch A(t).
(c) How long will it take for the concentration to reach 90% of its limiting value?
6. The logistic equation has been used to model the spread of technology. Let N ∗ be the number
of ranchers in Uruguay, and N (t) the number who have adopted a new pasture technology.
The rate of adoption dN/dt is proportional to the number who have adopted the technology,
and the fraction who have not (and thus are susceptible to changing). So the equation is
dN N

= αN 1 − ∗
dt N
According to a study by Banks (1993), N ∗ = 17, 015, N (0) = 141, α = 0.49 per year. De-
termine how long it takes for the new technology to be adopted by 80% of the population of
ranchers.
4 THE HARMONIC OSCILLATOR 18
4 The harmonic oscillator

Now we will begin a study of differential equations of the form
d2 y dy
A 2
+B + Cy = f (t)
dt dt
We will start with the most important differential equation of all – the harmonic oscillator.
Newton’s law and the harmonic oscillator
The DEs that occur most often in applications are second order, meaning that the highest derivative
occuring in the equation is the second: y 00 . The reason for this is Newton’s 2nd Law:
F = ma
As the acceleration is the 2nd derivative, this means
d2 y
m =F
dt2
Thus, if we know the force, we can write down a 2nd order DE.
Of the 2nd order equations, the most important is the harmonic oscillator. This describes the motion
of an object subject to a restoring force. Examples include the spring attached to a wall, or the small
motions of the pendulum.
In the absence of friction, the restoring force is F = −ky. So the equation of motion is
d2 y k

Harmonic Oscillator: + ω2y = 0 2
ω = (4.1)
dt2 m
As we have seen, the general solution of the harmonic oscillator is
y(t) = A cos(ωt − φ)
A here represents the amplitude of oscillation and φ is the phase shift. Alternatively,
y(t) = B cos(ωt) + C sin(ωt)

In this form, we can determine B and C from the initial position and velocity:
B = y(0), C = y 0 (0)/ω
Notice that for this second order DE, we need two initial conditions to specify the solution. This is
true in general: An nth order equation requires n initial conditions.
Example 4.1. Find the solution of the harmonic oscillator (4.1) with parameters k = 4, m = 1,
and initial conditions y(0) = 2, y 0 (0) = 1. Write the solution in both forms. Find the amplitude and
frequency of oscillation.
Hint:
p
B cos θ + C sin θ = B 2 + C 2 cos(θ − φ), where φ = tan−1 (C/B)
Inputs and responses

Suppose there is some external force acting on the oscillator. Think of pushing a child on a swing.
Then the force will include both the restoring force and the external force f (t):
F = −ky + f (t)
Newton’s Law then gives us the equation
d2 y
m + ky = f (t) (4.2)
dt2
f (t) is the input and y(t), the solution, is the response. This is now an inhomogeneous equation,
since it involves a term that does not include y. It is still linear, even if f (t) is a nonlinear function
of t, since it is linear in y. The solution of such equations is made up of a null and a particular
solution.
null solution yn (t): solution of hom. eqn.: m y 00 + k y = 0

particular solution yp (t): any solution of: m y 00 + k y = f (t)
Then the complete solution is null + particular:
ycomplete (t) = yn (t) + yp (t)
We know already the null solution. The question, then, is how to come up with a particular solution.
Example 4.2. Sinusoidal forcing. Suppose the external force is periodic (pushing a child on a
swing). Then f (t) = F cos(ωf t). Show that a particular solution of (4.2) is
F
yp (t) = cos(ωf t)
m ω2 − ωf2
as long as ωf 6= ω. If ω = ωf , we have resonance, which means the forcing freqency is the same as
the natural frequency ω of the system. In this case, show that a particular solution is
F
yp (t) = t sin(ωt)
2mω
What happens to yp (t) as t grows?
Other applications
LC Circuits
The Harmonic oscillator equations also describe the current in an electrical circuit. Consider an
inductor with inductance L in series with a capacitor of capacitance C.
Let I(t) be the current flowing around the circuit. The potential drop across the capacitor is V =
Q/C, where Q is the charge stored on the capacitor’s positive plate. The current is equal to the rate
at which charge accumulates, so I = dQ/dt. Kirchhoff’s second current law says that the potential
drop across the various components of a close circuit loop is equal to zero, so we have
dI
L + Q/C = 0
dt
Differentiate once and divide by L to get
d2 I
+ ω2I = 0
dt2
where ω 2 = 1/LC. If there is an external current forcing, we have
d2 I
+ ω 2 I = f (t)
dt2
The simple pendulum

Consider a mass m suspended from a pivot by a massless stiff rod of length l. Let θ be the angle
from the vertical. The component of force tangent to the motion is
F = −mg sin θ
The acceleration in the tangential direction is
d2 θ
a=l
dt2
According to Newton’s law, then,
d2 θ
+ ω 2 sin θ = 0 (4.3)
dt2
where ω 2 = g/l. This is not the harmonic oscillator. However, if θ stays small, θ ≈ 0, which would
be the case if the oscillations are fairly small, we have the approximation
sin θ ≈ θ
With this approximation, we get the harmonic oscillator again:
d2 θ
+ ω2θ = 0
dt2
Again, if there is an external forcing, we get
d2 θ
+ ω 2 θ = f (t)
dt2
Exercises
1. Solve y 00 + 9y = 0 starting from y(0) = 1 and y 0 (0) = −1. What is the amplitude and frequency
of oscillation?
2. Solve y 00 + 9y = 2 cos(2t) starting from y(0) = 0 and y 0 (0) = 0. Sketch the solution.
3. Repeat problem #2 for y 00 + 9y = 2 cos(3t).
4. Repeat problem #2 for y 00 + 2y = 2 cos(2t).
5. The spring-mass system (4.1), i.e. the harmonic oscillator, is conservative. That means there
is a quantity E, called the energy, that is conserved:
E = kinetic energy + potential energy

2
1 dy ω2 2

= + y
2 dt 2
Compute E when y = A cos(ωt) + B sin(ωt). Show that E is constant.
6. Another way to see that the energy is constant is to show that its derivative w.r.t. t is zero.
Multipy y 00 + ω 2 y = 0 by y 0
dE
Show that = 0.
dt
7. The pendulum equation (4.3) is also conservative. This is a nonlinear equation, but we can use
the same idea as in the previous problem. Here the conserved quantity is
2
1 dθ ω2

E= + (1 − cos θ)
2 dt 2
To show that E is conserved,
Multipy θ00 + ω 2 sin θ = 0 by θ0

dE
Show that = 0.
dt
5 CONSTANT COEFFICIENT 2ND ORDER DIFFERENTIAL EQUATIONS 23
5 Constant coefficient 2nd order differential equations
d2 y dy
A 2
+B + Cy = 0 (5.1)
dt dt
We know that solutions of linear DE are combinations of solutions of the homogeneous equation and
a particular solution. So we begin our study by solving the homogeneous equation with constant
coefficients: A, B, C constant.
MAIN IDEA FOR HOMOGENEOUS DE: Look for solutions of the form ert .
Example 5.1. Find two solutions and the general solution of the following DE by looking for
solutions of the form ert .
y 00 − 3y 0 + 2y = 0.
Generally, when we look for a solution of a linear DE of the form ert , each nth derivative is going to
give us a term rn . Thus, when we substitute ert into equation (5.1), we get the equation for r:
Ar2 + Br + C = 0 (5.2)
Each solution of (5.2) gives us a solution ert of the DE (5.1). The equation (5.2) is called the auxil-
iary equation, and the polynomial P (r) on the left-hand side is called the auxilliary polynomial.
Generally, the auxilliary equation will have two solutions, r1 , r2 , and thus we will get two solutions
er1 t and er2 t of the DE (5.1). As long as r1 6= r2 , these two solutions are linearly independent
(LI), meaning that the only way we can have
c1 er1 t + c2 er2 t = 0
for all t is if c1 = c2 = 0. Once we have two LI solutions, we have the general solution:
y(t) = c1 er1 t + c2 er2 t
Two LI solutions gives us the general solution.
The two LI solutions will not always be of the form ert , but the idea is that we need two LI solutions.
Generally, for an nth order differential equation, we will need to find n LI solutions to form the
general solution.
Example 5.2. Find the general solution to the DE
y 00 + y 0 − 2y = 0
Complex roots
If the roots of the auxiliary equation are distinct, then the set er1 t , er2 t are LI solutions. This is true
whether the roots are real or complex. But if there are complex roots we have to put the solutions
into real form. To do this, recall Euler’s formula:
eiθ = cos θ + i sin θ
This implies
e(a+ib)t = eat [cos(bt) + i sin(bt)]
If r = a + ib is a complex root of P (r) = 0, and the coefficients are all real, the complex conjugate
r̄ = a − ib is also a root. By taking linear combinations, we get the two LI solutions
eat cos(bt) and eat sin(bt)

Example 5.3. Find the general solution to the DE
y 00 + ω 2 y = 0
Example 5.4. Determine the general solution to y 00 + 6y 0 + 25y = 0.
Example 5.5. The spring-mass system will generally have some damping due to friction. The force
of friction acts in the opposite direction to the motion, so, if there are no external forces,
F = −k y − γ y 0
where γ is the coefficient of friction. Newton’s Law then gives us the equation of motion
m y 00 + γ y 0 + k y = 0
Find the general solution and determine at what value of the friction the system is critically damped,
i.e. there are no oscillations.
Multiplicity greater than 1

When a root of the auxiliary equation has a multiplicity greater than 1, we need to find extra
solutions.
Example 5.6. Show that the 2 LI solutions of
y 00 − 2y + y = 0
are y1 (t) = et and y2 (t) = tet .
Higher order equations

The strategy for higher order equations is the same: Look for solutions of the form ert .
Example 5.7. Determine the general solution to
y 000 − 2y 00 + y 0 − 2y = 0.
(Hint: r3 − 2r2 + r − 2 = (r − 2)(r2 + 1).)

Exercises
1. Find the general solution.
(a) y 00 − 6y 0 + 5y = 0, (b) y 0000 − 2y 00 + y = 0, (c) y 000 − 2y 00 + y 0 = 0
2. Find a second order equation with the given general solution
(a) y = c1 e3t + c2 e−t , (b) y = c1 e4t + c2 te4t , (c) y = c1 e−3t cos(t) + c2 e−3t sin(t)
3. For the equation

y 00 − 2ay 0 + (a2 + ω 2 )y = 0
show that the roots of the auxiliary equation are r = a ± iω. What are the two LI solutions?
Solve y 00 − 2y 0 + 10y = 0. What is the frequency?
4. For the spring-mass system with m = 1, k = 1, we have
y 00 + γy 0 + y = 0
For which values of the friction coefficient γ will the spring undergo oscillations? Sketch
solutions when γ = 1 and γ = 3.
5. Consider the equation

dy 4
=y
dt4
Show that there are 4 real LI solutions: two that oscillate in time, one that decays exponentially,
and one that grows exponentially.
6 UNDETERMINED COEFFICIENTS & VARIATION OF PARAMETERS 28
6 Undetermined coefficients & variation of parameters
Now we will look for solutions of
d2 y dy
A 2
+B + Cy = f (t)
dt dt
The complete (general) solution is the null solution (general solution of the homogeneous part) plus
a particular solution:
ycomplete (t) = yn (t) + yp (t)
Having found the null solutions (solutions of the homogeneous part), we now need to find a particular
solution when f 6= 0. Recall the terminology: When f (t) 6= 0, the equation is non-homogeneous.
The f (t) represents the input, or forcing term, and the solution y is the response.
Undetermined coefficients
The method of undetermined coefficients is quite useful when f has a particular form. The idea is
to guess a solution of the same form as f , and then determine the coefficients (hence the name!).
Example 6.1. Find the general solution of
y 00 + y 0 − 2y = 3t
Look for a solution of the form yp (t) = c1 + c2 t.
In the above example, when f (t) is a polynomial of degree 1, a particular solution can be found that
is also of degree 1. This is generally true. Likewise, if f (t) is an exponential or sinusoid, we can find
solutions of the same form.
Input : Particular solution:

f (t) = polynomial in t yp (t) = polynomial in t of the same degree
f (t) = A cos(ωf t) + B sin(ωf t), (ωf 6= ω) yp (t) = C cos(ωf t) + D sin(ωf t)
f (t) = est , (s 6= r) yp (t) = N est
Take home message: (Part of) the response has the same form as the input.
Example 6.2. y 00 + 9y = 2 sin(2t)
Example 6.3. Find the general solution of y 00 + 2y 0 + 5y = 3 sin(t). What happens in the long run?
Example 6.4. y 00 + 2y 0 + 5y = e−t
Example 6.5. y 00 + 2y 0 + 5y = 5e−t + 6 sin(t)

Variation of parameters
One of the shortcomings of the method of undetermined coefficients is that it only works for certain
types of inputs. For more general forcing terms, we will need a more general method. The method
of variation of parameters is completely general (as long as we can compute certain integrals), but
requires a bit more computation. In fact, the method works even if the coefficients A, B, C are
functions of t!
Here we will assume A = 1, so that we have
y 00 + By 0 + Cy = f (t) (6.1)
The main idea is to take the null solution, which will be a linear combination of two LI solutions:
yn (t) = c1 y1 (t) + c2 y2 (t)
and form a particular solution by allowing the constants c1 , c2 to be functions of t. (Allow the
parameters to vary.) That is, we seek a particular solution of the form
yp (t) = c1 (t) y1 (t) + c2 (t) y2 (t) (6.2)
We have some freedom in how we select c1 (t), c2 (t), and we will use this to simplify the computation.
Notice that, by the product rule,
yp0 (t) = c01 (t)y1 (t) + c02 (t)y1 (t) + c1 (t)y10 (t) + c2 (t)y20 (t)
One condition we impose is that the first two terms on the RHS above add to zero:
c01 (t)y1 (t) + c02 (t)y2 (t) = 0 (6.3)
Thus,
yp0 (t) = c1 (t)y10 (t) + c2 (t)y20 (t),

and yp00 (t) = c01 (t)y10 (t) + c02 (t)y20 (t) + c1 (t)y100 (t) + c2 (t)y200 (t)
Now substitute into the equation (6.1) to get
c01 (t)y10 (t) + c02 (t)y20 (t) = f (t) (6.4)
Now we have two equations (6.3) and (6.4) for two unknowns c01 and c02 . We can solve for c01 and c02
and then integrate these to recover the particular solution yp .
The 2 × 2 system for c01 , c02 can be written
" #" # " #
y1 y2 c01 0
=
y10 y20 c20 f
which has the solution

−y2 (t)f (t) y1 (t)f (t)

c01 (t) = , c02 (t) = (6.5)
W (t) W (t)
where W (t) is the Wronskian:
W (t) = y1 (t)y20 (t) − y10 (t)y2 (t)
Example 6.6. Solve the problem in Example 6.4 using variation of parameters.
Example 6.7. Solve the IVP
y 00 + 2y 0 + 5y = 3δ(t − 2) y(0) = 2, y 0 (0) = 0

Green’s functions
We can write a particular solution in an elegant way as an integral using something called the
Green’s function. Now, instead of stopping at the equations (6.5), we will integrate them using
definite integrals. Integrate the equations (6.5) and use these to form the particular solution (6.2).
We get
y2 (t)f (t) y1 (t)f (t)
Z Z
yp (t) = −y1 (t) dt + y2 (t) dt (6.6)
W (t) W (t)
In the above equation (6.6), the integrals are indefinite integrals. We could take them to be definite
integrals from 0 to t. Then we have
Z t
y1 (s)y2 (t) − y1 (t)y2 (s)
yp (t) = f (s) ds
0 W (s)
which gives us the response to the input as an integral of the input against a function called a Green’s
function. In other words, let
y1 (s)y2 (t) − y1 (t)y2 (s)

G(s, t) = (6.7)
W (s)
be the Green’s function. Then a particular solution is
Z t
yp (t) = G(s, t)f (s) ds (6.8)
0
Think of the integral as a sum. (It is the limit of a sum, a Riemann sum.) Equation (6.8) says that
the response is a weighted sum of the input. The weights are given in terms of the solutions of the
homogeneous equation by the Green’s function.
Example 6.8. Find the Green’s function for y 00 + 9y, and then solve y 00 + 9y = 2 sin(2t). Hint:
Recall the trig identities
1
sin(A − B) = sin A cos B − cos A sin B and sin A sin B = [cos(A − B) − cos(A + B)]
2
Example 6.9. Solve the resonant problem y 00 + 9y = 2 sin(3t).
Example 6.10. y 00 + 9y = 3H(t − 2). Sketch the solution with starting values y(0) = 1, y 0 (0) = 0.
Exercises
1. Use undetermined coefficients to find a particular solution yp .
(a) y 00 + 4y = e2t (b) y 00 + 3y 0 + 2y = 2 − t2 (c) y 00 + 3y 0 + y = cos(2t)
2. Use undetermined coefficients to find the solution of the IVP
y 00 + 9y = 3e−t , y(0) = 0, y 0 (0) = 0
Sketch the solution.
3. Resonance occurs when the forcing is a solution of the homogeneous equation. In this case,
we can usually find a particular solution using undetermined coefficients by multiplying the
normal, non-resonant, particular solution by t. Do this for the following.
(a) y 00 + 4y = 2 cos(2t) (b) y 00 − 4y 0 + 3y = et
For (b), show that you get the same thing using variation of parameters.
4. Resonance with damping. Consider
y 00 + 2ay 0 + (a2 + 9)y = sin(3t)
(a) Show that the null solutions are e−at cos(3t) and e−at sin(3t). (Recall problem 3 from §5.)
1
(b) Show that the Green’s function is G(s, t) = ea(s−t) sin(3(t − s)).
3
(c) Use the Green’s function to find a particular solution yp (t). Sketch the solution for a =
.1, .5, 1.
5. Solve the equations
(a) y 00 + 2y 0 + 10y = sin(2t) (b) y 00 + 2y 0 + 10y = H(t − 2)
with starting values y(0) = 3, y 0 (0) = 0. Sketch the solutions. (This corresponds to a mass-
spring system with damping. In (a) the spring is being periodically forced. In (b) a magnet is
turned on at t = 2 that pulls the mass to the right.)
7 SYSTEMS OF LINEAR EQUATIONS & MATRIX-VECTOR MULTIPLICATION 36
7 Systems of linear equations & matrix-vector multiplication

The row and the column picture
Let’s start with an example:
x + 2y = 7
3x + y = 6
ROW PICTURE: We look for the point where the lines x + 2y = 7 and 3x + y = 6 intersect.
The row picture
4
3.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
The lines meet at the point (1, 3), so the solution is x = 1, y = 3 .
COLUMN PICTURE: Write the system as

" # " # " #
1 2 7
x +y =
3 1 6
" # " # " #
1 2 7
Find the linear combination of the vectors and that adds to the vector .
3 1 6
The column picture
6
5 6
2
5 3
1 5 6
7
4 6
3 5 6
1
3
2
1 5 6
2
1
0
0 1 2 3 4 5 6 7
" # " # " #
1 2 7
1 +3 = , so the solution is x = 1, y = 3 .
3 1 6
Conceptually, we think of systems of equations in the column picture: We want to find linear com-
binations of the columns to equal the RHS. When we solve the equations using elimination, we use
the row picture.
Let " # " # " #
1 2 7
v= w= b=
3 1 6
Then the problem above is to find the linear combination xv + yw that adds up to b. Linear
combinations are formed using the parallelogram law.
Example 7.1. Find 2v + w and −v + 3w.
Multiplying a matrix A and a vector x

" #
x
Let x = 1 . (You could also use x, y for the components.) We will generally think of vectors as
x2
column vectors.
There are two ways to think of multiplying a matrix with a vector. Let A be the matrix
" #
1 2 h i
A= = v w
3 1
Then, we can multiply
By columns: Ax = x1 v + x2 w = x1 (column 1) + x2 (column 2)
or
" #
(row 1) · x
By rows: Ax =
(row 2) · x
If x is a vector, Ax is another vector.

" #" #
1 2 2
Example 7.2. Find by columns and by rows.
3 1 −1
" # " #
x 6
The system we started with can be written in matrix-vector form by letting x = ,b= , and
y 7
then the system is
Ax = b
When we multiply a vector by a matrix, we get another vector. What kinds of actions can such
multiplication do?
" #
1 3
Example 7.3. Consider the matrix A = . What does Ax do to x? What condition must b
2 6
satisfy in order for Ax = b to have a solution?
Singular matrices The matrix A in Example 7.3 is called singular. The equation Ax = b
generally does not have a solution. The RHS b has to satisfy a certain condition for a solution to
exist. Conceptually, the rows in the row picture represent parallel lines. The columns in the column
picture are pointing in the same direction. A 2×2 matrix will be singular if the columns are multiples
of each other.
" #
0 1
Example 7.4. What does the matrix A = do to vectors x?
−1 0
Matrix-vector multiplication is the same in general. An m × n matrix is an array of numbers with m

rows and n columns. We write A = (aij ), and use aij for the element in the ith row and jth column.
If we have a matrix A and a vector x:
   
a11 a12 a13 ··· a1n x1
 a21

a22 a23 ··· a2n 
  x2 
 
A=
 .. .. .. .. ..  x=
 .. 

 . . . . .   . 

am1 am2 am3 · · · amn xn
then the product of A and x is
By columns: Ax = x1 (column 1) + x2 (column 2) + · · · + xn (column n)
or
 
(row 1) · x
 (row 2) · x 
 
By rows: Ax =  .. 
.
 
 
(row m) · x
This means that Ax is an m−vector whose ith component is
(Ax)i = (row i) · x
n
X
= aij xj
j=1
NOTE: In order for Ax to make sense, we must have
# of columns of A = # of elements of x
The number of rows of A can be anything. The size of the vector Ax is the number of rows of A.
 
" 1 #
1 1 1  
Example 7.5. −1
−1 0 1
2
   
1 2 −1 1
Example 7.6. Let A = −1 −2 1. Then Ax = (x2 + 2x2 − x3 ) −1. The 2nd column is
   
3 6 −3 3
twice the first, and the third column is −1 times the first. Any linear combination of the columns is
a multiple of the first column. What must b be in order for Ax = b to have a solution?
   
1 1 1 1
Example 7.7. Find A −1, where A = −1 0 1
   
2 2 1 0
Example 7.8. Consider the system

x+y+z =2
−x + z = 1
2x + y = 1
Sketch the row and column pictures. Write the system in matrix-vector form.
Exercises
1. Sketch the row picture and the colum picture for the following systems
3x + 2y = 5 x − 2y = 1
(a) (b)
2x + y = 3 2x + y = 7
2. Find Ax by rows and by columns.

 
" #" # " # 3
1 5 −1 2 1 −1  
and 1
2 3 2 1 −5 3
1
3. Consider the system

x+y+z =3
y+z =2
3x + y = 4
(a) Write the system in matrix-vector form. (b) Sketch the row and column pictures.
" # " #!
x x
4. (a) What is the 2 × 2 identity matrix I? I = .
y y
" # " #!
x y
(b) What is the 2 × 2 transposition matrix P ? P = .
y x
5. Fill in the missing entries of the 2 × 2 matrix A to make A singular. (Note: There is more than
one right answer.) " #
2 ?
A=
−1 ?
6. Let h i
A= 1 1 1
The solution set for Ax = 0 represents what kind of region in R3 ? Sketch this region.
7. Show that, for the matrix in problem #6, Av = 0 and Aw = 0, where

   
1 1
v = −1 and w =  0
   
0 −1
Now suppose x is any solution of Ax = 0. Show that x = c1 v + c2 w for some constants c1 and
c2 .
8 MATRIX MULTIPLICATION 43
8 Matrix multiplication
We saw how to multiply a matrix and a vector. Multiplying two matrices together follows the same
rules. We can multiply by columns or by rows. To understand why matrix multiplication works the
way it does, let ej be the jth standard basis vector:
0
 
 
1
 
0
 
0  .. 
.
0 1 0  
0
     
0 0 1
       
1 ← jth position
 
0 ,
e1 =  
0 ,
e2 =  
0 ,
e3 =   ··· ej = 
 
0
 ..   ..   .. 
     
 .. 
 
. . . .
0 0 0 0
This is the vector with a 1 in the jth entry and zeros elsewhere. Then, if the columns of A are ai :
h i
A = a1 a2 · · · an
Then
Aej = aj (8.1)
That is, the jth column of A is Aej . Thus, to get the jth column of AB, we have
jth column of AB = ABej = A(jth column of B)
Therefore, we have the following rule for multiplying two matrices together:
By columns: To multiply AB, multiply A by each column of B.
h i h i
If B = b1 b2 · · · bn , AB = Ab1 Ab2 · · · Abn (8.2)
By rows: The (i, j)th entry of AB is (row i of A) · (column j of B).
n
X
(AB)ij = aik bkj (8.3)
k=1
The number of columns of A must equal the number of rows of B. The size of AB is
(m × n)(n × p) = m × p
" #" #
1 3 −1 0 1
Example 8.1.
2 −1 2 1 3
  
1 1 1 −1 2 1
Example 8.2. 0 3 2  1 0 −5
  
0 0 5 0 1 4
 
1 h i
Example 8.3. 3 −1 4
 
2
 
h i 0
Example 8.4. 1 0 1 1
 
1
Rules for matrix operations

The rules for addition of matrices and scalar multiplication are exactly the same as for real numbers:
(A + B)ij = aij + bij (add entrywise) cA = (c aij ) (multiply each entry by c)
Most of the rules for mulitiplication of numbers carry over to matrices, with one very important
exception. The most important property is the associative law of multiplication, which makes
multiplication of more than 2 matrices make sense:
(AB)C = A(BC) (8.4)
This means that the product ABC makes sense, since we can do the multiplication in any order.
   
1 h i 0
Example 8.5. −1 1 0 1 1
   
2 1
Other laws of matrix operations are:
A+B =B+A (commutative law for addition)

c(A + B) = cA + cB (distributive law for scalars)
A + (B + C) = (A + B) + C (associative law of addition)
A(B + C) = AB + AC (distributive law from the left)
(A + B)C = AC + BC (distributive law from the right)
The proof of the distributive law is an exercise. You should convince yourself that all of them are
true. One law that is not in the list is the commutative law for multiplication. This is becase
Usually AB 6= BA
Example 8.6. Compute AB and BA for

" # " #
0 1 0 0
A= , B=
0 0 0 1
Exercises
1. A is 3 × 2, B is 2 × 5, and C is 5 × 3. Which of the following products makes sense? What are
the sizes of the products?
AB BA CA CB CAB
 
1 h i
2. Let A = −3 and B = −4 2 1 . Compute AB and BA.
 
2
" # " # " #
1 2 −1 1 1 1
3. Let A = , B = , and C = . Compute (AB)C and A(BC) and show
1 3 2 4 5 −3
that they are equal.
4. The kth power of a square matrix A is
Ak = AAA
| {z· · · A}
k times
" #
1 2
Compute A3 for A = .
1 3
5. Prove the distributive law A(B + C) = AB + AC. One way to do this is to use the rule (8.3)
for multiplication by rows. Calculate the (i, j)th entry of A(B + C), and show that it is equal
to the (i, j)th entry of AB + AC.
6. (a) Assume a 6= 0, and compute " #" #

1 0 a b
−c/a 1 c d
(b) Now take the result of part (a), and multiply by

" #
1 0
c/a 1
What do you get?
7. Here is another way to think of matrix multiplication. Let’s call the columns of A aj and the
rows of B ri . Then the product of A and B is the sum of the matrices ai ri . That is,
 
  r1


r2 

AB = a1 a2 · · · an    = a1 r1 + a2 r2 + · · · + an rn

..
.
 
 
rn
Show that this is true by showing that the (i, j)th entry of the above sum is the same as that
obtained in formula (8.3).
9 SOLVING SYSTEMS OF LINEAR EQUATIONS BY ELIMINATION 48
9 Solving systems of linear equations by elimination
Let’s consider the system we started with:
x + 2y = 7
3x + y = 6
We want to eliminate everything below the x term in the first equation. Let’s subtract 3 times the
first equation from the second:
x + 2y = 7
−5y = −15
Now we have an upper triangular system: everything below the diagonal is zero. Next, solve for y
and substitute back into the first equation (back substitution), to get the solution (x, y) = (1, 3) .
That’s all there is to it! We do elementary row operations to get an upper triangular system, which
we solve by back-substitution. It works the same no matter how many equations and variables you
have. It’s just that there are some situations that require some extra steps. Notice that we don’t
really need to carry along the variables. We could just as well do the operations on the numbers, by
forming the augmented matrix (the matrix of coefficients and the RHS in the last column):
" # " #
1 2 7 1 2 7
becomes, after elimination
3 1 6 0 −5 −15
We operate on the equations (rows) with elementary row operations (ERO). These leave the solution
set unchanged. They are:
Elementary Row Operations
1. Multiply a row by a nonzero constant.
2. Subtract a multiple of one row from another.
3. Interchange two rows.
BREAKDOWN: In the first step, the number in the upper left is called the first pivot. This is what
we use to eliminate everything below it. The second pivot is the number on the diagonal after the
next step, the −5. Thus, the two pivots for the above problem are 1 and −5. The pivots are the
numbers on the diagonal after elimination.
The number we use to multiply the first pivot by to eliminate the number below is called the
multiplier. In the example above, the multiplier is l21 = 3.
Example 9.1. In the first example, do eliimination by first interchanging the equations. What are
the pivots, and what is the multiplier? What is the product of pivots in both cases?
It doesn’t work if the matrix is singular.
Example 9.2. Failure with no solution,
x + 3y = 4
2x + 6y = 9
" # " #
4 4
Example 9.3. Failure with infinitely many solutions. Change b = to :
9 8
x + 3y = 4
2x + 6y = 8
Example 9.4. Three equations in three unknowns.
2x + 2y + 2z = 4
4x + 4y + 2z = 4
−2x + y + 7z = 11
The general strategy:
We start with Ax = b, and after elimination, we have U x = c, where U is

upper triangular. This we solve by back-substitution.
We can even make an algorithm:
1. Use the first equation to create zeros below the first pivot.
2. Use the new equation 2 to create zeros below the second pivot.
3. Keep going to create zeros below all of the pivots to find the triangular U .
4. Solve U x = c by back substitution.
The one caveat is that we might need to interchange rows along the way. The method may create a
row of zeros in U , in which case there are either no solutions or infinitely many solutions.
Notation: The upper triangular U that we found by elimination is an example of an (upper) echelon
matrix. An echelon matrix has the properties that each nonzero row has a leading nonzero number
called the pivot, and all of the entries below each pivot are zero. The echelon form of a matrix is not
unique. However, in later sections we will find a unique echelon form called the row reduced echelon
form.
Elimination matrices and the LU factorization

Let’s revisit the first problem, in matrix-vector form:
" #" # " #
1 2 x 7
Ax = b =
3 1 y 6
Recall that the first multiplier was l21 = 3. We subtracted 3 times the first equation from the second.
This operation can also be accomplished by multiplying by an elimination matrix
" #
1 0
E=
−3 1
Then " #" # " #

1 2 x 7
EAx = Eb ⇔ =
0 −5 y −15
This gives us U x = c. Notice that if we let
" #
1 0
L=
l21 1
then " # " #" #

1 2 1 0 1 2
= or A = LU
3 1 3 1 0 −5
That is, if we store the multiplier l21 in a lower triangular matrix L with 1’s on the diagonal, we
factor A into a product of lower and upper triangular matrices. In other words, elimination (without
row interchanges) is equivalent to LU factorization.
Exercises
1. Which multiple of equation 1 should be subtracted from equation 2?
2x − 6y = 4
−x + 5y = 0
After this step, solve the triangular system.
2. Use the previous exercise to find the LU factorization. Fill in the missing values.
" # " #" #
2 −6 1 0 2 −6
A = LU, =
−1 5 ? 1 0 ?
3. Choose a right hand side for the second equation so that the system has (i) no solution, and
(ii) infinitely many solutions.
2x + 3y = 5
4x + 6y =
4. Derive a test on b1 and b2 so that the system has a solution. How " many
# solutions
" #will the
1 0
system have if there is a solution? Sketch the column picture for b = and b = .
2 1
2x + 3y = b1
4x + 6y = b2
5. Alice buys three applies, a dozen bananas, and one cantaloupe for $2.36. Jorge buys a dozen
apples and two cantaloupes for $5.26. Quinn buys two bananas and three cantaloupes for $2.77.
How much do single pieces of each fruit cost?
6. Solve by elimination and back-substitution.
2x + 2y + z = 3
4x + 4y + 3z = 9
−2y + 5z = 17
Circle the pivots. List the row operations.
7. Consider the system in the previous problem. Suppose we change the 2nd equation to
4x + 4y + az = b
For which values of a and b is there (i) no solution, and (ii) infinitely many solutions?
10 INVERSE MATRICES 53
10 Inverse matrices
Let’s consider the system we started with:
x + 2y = 7
3x + y = 6
" #
1 −1 2
which we can write as Ax = b. Notice that if we multiply on both sides by , we have
5 3 −1
" #" #" # " #" #
1 −1 2 1 2 x 1 −1 2 7
=
5 3 −1 3 1 y 5 3 −1 6
" #" # " #
1 0 x 1
=
0 1 y 3
" # " #
x 1
=
y 3
" #
−1 1 −1 2
The matrix on the left, A = is called the inverse of A.
5 3 −1
A matrix A is called invertible if there exists a matrix A−1 , called the inverse of A,
such that
AA−1 = I and A−1 A = I
If A is not invertible, it is called singular.
If A is invertible, the solution to Ax = b is x = A−1 b.

" #
a b
Example 10.1. The inverse of a 2 × 2 matrix A = is
c d
" #
−1 1 d −b
A =
ad − bc −c d
Thus, A is invertible if and only if ad − bc 6= 0. The quantity ad − bc is called the determinant of A,

written
det(A) = ad − bc
" #
3 2
Example 10.2. Find the inverse of A = .
1 1
" #
3 2
Example 10.3. Find the value of a that makes singular.
1 a
The inverse of a product

If A and B are both invertible, then
(AB)−1 = B −1 A−1
Example 10.4. Show that (ABC)−1 = C −1 B −1 A−1 .

Finding A−1 using Gauss-Jordan elimination

Suppose A is 3 × 3. Then the inverse satisfies
 
1 0 0 h i
AA−1 = I = 0 1 0 = e1 e2 e3
 
0 0 1
where ej is the jth standard basis vector. Call the columns of A−1 v1 , v2 , v3 . Then, since
h i h i
AA−1 = A v1 v2 v3 = Av1 Av2 Av3
the above is equivalent to the three equations
Av1 = e1 Av2 = e2 Av3 = e3
Thus, if we can solve these three equations, we have the inverse. In fact, we can solve all three at
once using Gauss-Jordan elimination. Here we put the three ei ’s in an augmented matrix and do
row operations until the left block is the identity. What is left in the right block is the inverse.
h i h i
A I ∼ I A−1
 
1 2 2
Example 10.5. To find the inverse of A = −1 3 0, we form
 
0 1 1
 
h i 1 2 2 1 0 0
A I = −1 3 0 0 1 0
 
0 1 1 0 0 1
 
1 0 0 1 0 −2
∼ 0 1 0 1/3 1/3 −2/3
 
0 0 1 −1/3 −1/3 5/3
Thus, the inverse is what is left over in the right block:

 
3 0 −6
1
A−1 =  1 1 −2

3
−1 −1 5
Notice that this process works as long as A doesn’t have any zero pivots. In this case, we can find
a matrix B such that AB = I (a right-inverse). If BA has a zero row, BAB has a zero row, which
means that B has a zero row. Thus, B cannot have a zero row, which means that B must have a
right inverse: BB −1 = I, so BAB = B ⇒ BA = I, as well. Thus,
If an n × n matrix A has n (nonzero) pivots, then A is invertible, and there exists

a matrix A−1 such that
AA−1 = I and A−1 A = I
We can summarize notions of invertibility.
Invertibility conditions
Theorem 10.1. Let A be an n × n matrix. Then the following are equivalent.

(If any one of them is true, the rest are true.)
1. A is invertible.
2. There exists A−1 such that A−1 A = AA−1 = I.
3. A has n pivots.
4. Ax = b has a unique solution for any b.
5. Ax = 0 has only the zero solution.

Exercises
1. Find the inverse of the permutation matrices by trial and error
   
0 1 0 0 1 0
P = 1 0 0 and P = 0 0 1
   
0 0 1 1 0 0
2. Find the inverses by Gauss-Jordan elimination

 
" # 1 1 3
0 3
A= B = 1 2 3
 
1 0
1 2 4
3. Find the inverses (in any way)

   
0 0 0 1 1 1 0 0
0 0 2 0 3 4 0 0
A= B=
   
0 3 0 0 0 0 5 7
 
4 0 0 0 0 0 7 10
4. (a) Find two invertible matrices A and B such that A + B is singular.

(b) Find two singular matrices A and B such that A + B is invertible.
5. TRUE or FALSE (with reasons)
(a) A 3 × 3 matrix with a zero row is singular.

(b) If A has non-zero entries on its diagonal, then A is invertible.
(c) If A is invertible, then A−1 is invertible.
11 SYMMETRIC AND ORTHOGONAL MATRICES 58
11 Symmetric and orthogonal matrices
The transpose
The transpose of a matrix is another matrix. It is obtained by “reflecting” entries across the diagonal:
the rows become the columns and vice versa. If A = [aij ], then AT is the transpose of A, and its
entries are
aTij = aji (switch the indices)
 
1 0 7
Example 11.1. Find the transpose of A = −8 2 1.
 
0 1 0
Properties of the transpose
Sum (A + B)T The transpose of the sum is (A + B)T = AT + B T

Double transpose (AT )T The transpose of the transpose is (AT )T = A
Products AB The transpose of AB is (AB)T = B T AT
Inverses A−1 The transpose of A−1 is (A−1 )T = (AT )−1
Symmetric matrices
The most important matrices in applications are the symmetric matrices.
Symmetric matrices A is symmetric if AT = A
 
1 0 7
Example 11.2. 0 2 1 is symmetric. The matrix in Example 11.1 is not symmetric.
 
7 1 3
One very important symmetric matrix is obtained by multiplying any matrix by its transpose.
Example 11.3. Let A be any m × n matrix. Show that AT A is symmetric .
 
" # 1 0
1 0 2 
Example 11.4. 0 1

0 1 3
2 3
Orthogonal matrices
Orthogonal matrices are another very important type of matrix in applications. They are, in a very
key sense, the “best” kind of matrix to have. Recall that a collection of vectors q1 , q2 , . . . , qn is
orthonormal if they are all orthogonal to each other, and they all have length 1:
(
0 if i 6= j
qi · qj =
1 if i = j
Definition 11.1. A matrix Q is called orthogonal if it has orthonormal columns.
Here his one ireason why orthogonal matrices are so important. Suppose Q is the 2 × 2 matrix
Q = q1 q2 , where q1 , q2 are orthonormal. Then
" T# " # " #

q h
1
i q · q q1 · q2 1 0
T
Q Q=I q1 q2 = 1 1 =
q2T q2 · q1 q2 · q2 0 1
In other words,
The inverse of a square orthogonal matrix is its transpose.
Q−1 = QT
Example 11.5. Show that " #

cos θ − sin θ
Q=
sin θ cos θ
is orthogonal. What does Q do to vectors x?
Another key fact about orthogonal matrices is that they preserve lengths.
Example 11.6. Let Q be an orthogonal matrix. Show that kQxk = kxk for every vector x.
 
0 1 0
Example 11.7. Let P = 0 0 1 be a permutation matrix. Show that P is orthogonal. Find
 
1 0 0
−1
P . What does P do to vectors x? What does P T do?
Exercises
1. A matrix A is called skew-symmetric, or sometimes anti-symmetric, if AT = −A. Fill in the
missing parts to make A anti-symmetric.
" #
? −1
A=
? ?
2. Let A be skew-symmetric. Show that the diagonal entries of A must be zero.
3. Every square matrix can be broken into a symmetric + skew-symmetric part:

1 1
A = symmetric + skew-symmetric = A + AT + A − AT
2 2
Show that the first piece on the RHS is symmetric and the second piece is skew-symmetric.
Break " #
3 5
A=
7 9
into its symmetric and skew-symmetric parts.
4. Fill in the missing parts to make Q orthogonal.

" #
1/2 ?
Q=
? 1/2
" #
2
Let x = . Sketch x and Qx on the same graph. Compute kxk and kQxk.
1
   
0 1 0 1 0 0
5. Let P1 = 0 0 1 be the permutation matrix in Example 11.7, and P2 = 0 0 1 be
   
1 0 0 0 1 0
another permutation matrix. What does P2 do to vectors x? What does P2 P1 do? Find
(P2 P1 )−1 . What does this matrix do to vectors x?
6. (a) Suppose Q is an orthogonal matrix. Why is Q−1 = QT also orthogonal?

(b) If Q is an orthogonal matrix, its columns are orthogonal, so QT Q = I. Show that if Q is
a square orthogonal matrix, then the rows of Q are orthonormal.
12 DETERMINANTS 63
12 Determinants
For square matrices we have a number, a single number, det(A), called the determinant of A with
some remarkable properties. Four of them are as follows.
1. The determinant gives a test for invertibility. A is invertible if and only if det(A) 6= 0.
2. The determinant gives us a measure of how sensitive the system Ax = b is. Even if det(A) 6= 0,
if the determinant is too close to zero, then the solution of Ax = b can change very drastically if
b changes by a small amount. This may cause problems when solving such systems numerically.
3. The determinant gives a volume. The volume of a parallelogram with sides a and b is the
determinant of the matrix with a and b as the columns. Generally, the determinant is the
volume of a parallelipiped in Rn . This property is used in calculus, for example, to calculate
the Jacobian of a transformation.
4. The determinant is the product of the eigenvalues.
We can define the determinant of a"scalar# to be just that scalar, det(a) = a. We have also seen the
a b
determinant of a 2 × 2 matrix A =
c d
" #
a b a b
det(A) = det = = ad − bc

c d c d
Generally, the determinant is defined to satisfy three properties. The rest of the properties follow
from these three. We illustrate them on a 2 × 2 matrix, but they hold generally for n × n.
Definition of the determinant

1. The determinant of the identity is 1:

1 0
det(I) = 1 =1

0 1

2. The determinant is anti-symmetric – it switches sign if two rows are interchanged:

a b c d
= −

c d a b

3. The determinant is linear in each row:

a + a0 b + b0 a b a0 b0
Add vectors in a row = +

c d c d c d

ta tb a b
Factor a scalar out of a row = t

c d c d

12 DETERMINANTS 64
These three properties above define the determinant of an n × n matrix – the determinant of the
identity is 1, the sign is reversed by a row exchange, and the determinant is linear in each row. The
formulas for computing the determinant are not nearly as important as the properties it possesses,
which all follow from these three.
Further properties of determinants
4. If two rows of A are the same then det(A) = 0.

a b
Equal rows = ab − ba = 0

a b

5. Subtracting a multiple of one row from another leaves the determinant unchanged.

a − lc b − ld a b
Row operation =

c d c d

6. If A has a row of zeros, then det(A) = 0.

0 0
Zero row =0

c d

7. If A is a triangular (upper or lower) matrix, then the determinant is the product of the diagonal
entries: det(A) = a11 a22 a33 · · · ann

a b a 0
Triangular matrices = ad and = ad

0 d c d

12 DETERMINANTS 65
8. The determinant is (up to a sign) the product of the pivots. If A is singular then det(A) = 0.
If A is invertible then det(A) 6= 0.
Product of pivots det(A) = ± det(U )

" #
a b
Invertibility is invertible iff ad − bc 6= 0
c d
9. The determinant of the product is the product of the determinants:
Product rule det(AB) = det(A) det(B)
10. The transpose has the same determinant: det(AT ) = det(A)

a c a b
Transpose rule =

b d c d

Note that this rule implies that all of the properties involving the rows also apply to the columns.
A scalar can be factored out of each column; the determinant is linear in each column; the
determinant of a matrix with equal columns is zero; subtracting a multiple of one column from
another leaves the determinant unchanged; if A has a zero column then its determinant is zero.
12 DETERMINANTS 66
Example 12.1. Show that if A is a non-singular n × n matrix, then

1
det(A−1 ) =
det(A)
Computing the determinant

The properties of the determinant give us one way right off the bat to compute det(A). We know
that the determinant of a triangular matrix is the product of its diagonal entries. We also know how
EROs affect the determinant. Thus, one way to compute a determinant is to reduce it to triangular
form, keeping track of how the EROs change the determinant. (Note that this is how mathematical
software packages like MATLAB compute the determinant.)
Example 12.2. By reducing to upper triangular form, evaluate

3 1 5

2 1 −2

1 −1

2
 
1 −1 3
Example 12.3. Determine all values of k for which A = 1 2 −1 is non-singular.
 
3 6 k
12 DETERMINANTS 67
There is another way to compute determinants using the so-called cofactor expansion. If you enjoy
doing a lot of unnecessary calculations, this is the method for you. It can be useful in some situations
in which a row or column has a lot of zeros. We illustrate the idea on a 3 × 3 matrix. The idea is
that the first row can be written as
(a11 , a12 , a13 ) = (a11 , 0, 0) + (0, a12 , 0) + (0, 0, a13 )
Now use the linearity property on the first row, EROs to zero out entries below a11 , a12 , a13 , and
interchange columns to get

a a12 a13 a11 a12 a13
11
a21 a22 a23 = a21 a22 a23 + a21 a22 a23 + a21 a22 a23

a31 a32 a33 a31 a32 a33 a31 a32 a33 a31 a32 a33

a a12 a13
11
= a22 a23 + a21 a23 + a21 a22

a32 a33 a31 a33 a31 a32

a a a
11 12 13
= a22 a23 − a21 a23 + a21 a22

a32 a33 a31 a33 a31 a32

a
22 a23
a
21 a23
a a22
21
= a11 − a12 + a13
a32 a33 a31 a33 a31 a32

We have reduced the 3 × 3 determinant to three 2 × 2 determinants. The same idea can be used to
write a 4×4 in terms of 3×3, then in terms of 2×2 determinants, and so on. Here we expanded on the
first row, but we can use the linearity in any row, or any column, to produce the same result. If we
want to expand along a given row, we will use the linearity property, then appropriate interchanges
to get determinants where the first column has zeros below and to the right of the first entry. Then
we reduce further. The upshot is that we get
Cofactor expansion of the determinant
Let Aij be the matrix A with the ith row and jth column removed. The determinant of Aij
is called the i, j minor and is written Mij . The minor with the sign (−1)i+j is called the i, j
cofactor and is written Cij . That is,
Mij = det (Aij ) , and Cij = (−1)i+j Mij = (−1)i+j det (Aij ) .
With this notation, the determinant can be computed as

n
X
Expansion along ith row det(A) = aij Cij i fixed
j=1
Xn
Expansion along jth column det(A) = aij Cij j fixed
i=1
12 DETERMINANTS 68
 
1 1 0 0
−2 1 3 1
Example 12.4. Find the determinant of A = 
 
 0 1 0 0

2 7 −1 3
Geometrical interpretation of the determinant
Recall:
i j k

a · b = kakkbk cos θ a × b = kakkbk sin(θ) n a × b = a1 a2 a3

b1 b2 b3
Theorem 12.1.
1. The area of a parallelogram with sides a and b is

a a2
Area = 1

b1 b2

2. The volume of a parallelepiped with sides a, b and c is

a a2 a3
1
Volume = b1 b2 b3

c1 c2 c3
12 DETERMINANTS 69
Example 12.5. Find the area of the parallelogram with vertices (0, 0), (3, 1), (5, 5) and (2, 4).
We conclude this section with a beautiful formula that should never be used.
Theorem 12.2 (Cramer’s Rule). Let A be an n × n matrix such that det(A) 6= 0. Let Bk be
the matrix obtained by replacing the kth column of A with b. Then the unique solution to the
system Ax = b is (x1 , x2 , . . . , xn ), where
det(Bk )
xk =
det(A)
Corollary 12.1. The inverse of A is the transpose of the cofactor matrix, divided by the determinant.
That is,
Cji
A−1 =
ij det(A)
12 DETERMINANTS 70
Exercises
1. Find the determinant by expanding along an appropriate row or column.

0 0
1 3 −7 2
1
0
0 2 0
2 −3 7

3 2 7 9

4 1 11
6 2 1 −3
2. Let  
1 −1 2
A = 3 1 4
 
0 1 3
Use properties of determinants to calculate det(A), det(AT ), det(−2A) and det(A4 ).
3. Find the area of the parallelogram with vertices (0, 0), (3, −2), (5, 2) and (2, 4).
" #
1 k
4. Let A = . For which values of k does Ax = b have a unique solution?
k 9
5. Use properties of determinants to show that A is singular for any value of a, where
 
1 + 3a 1 3
A = 2 + 2a 2 2
 
3 3 0

a b
6. Suppose = 3. Use properties of determinants to calculate

c d

−3a + 2c −3b + 2d

2c 2d

7. Use properties of determinants to show

αx − βy βx − αy 2 α β
= x + y2

βx + αy αx + βy β α

8. Why is the determinant of a permutation matrix P always ±1?

13 COLUMN SPACE 71
13 Column space
This section will be a bit more abstract. We have seen how to do calculations with matrices, how
to find solutions of systems of equations, how to compute determinants, and so on. Now we will
move to a higher level of understanding, to thinking about various spaces and how they relate to
each other. To understand Ax = b in its entirety, you need to understand vector spaces and their
subspaces. The goal is to understand the “Fundamental Theorem of Linear Algebra” which gives a
picture of linear algebra.
Let’s start with something we are familiar with: Rn . Recall that we can add vectors in Rn (parallel-
ogram law) and we can scale vectors. When we add two vectors we get another vector in the same
space. Likewise, when we scale a vector we get another vector in the same space. We say that Rn is
closed under addition and scalar multiplication.
Because of these two properties, we can take linear combinations of vectors.
Definition 13.1. A linear combination of vectors v1 , v2 , . . . , vm is
c1 v1 + c2 v2 + · · · + cm vm
for scalars c1 , c2 , . . . , cm . A linear combination is a sum of scaled vectors.
The structure of Rn can be generalized to a very abstract thing called a vector space. A vector space
is a set V of elements, called vectors, that satisfies
Closure under addition 1. If u ∈ V and v ∈ V , then u + v ∈ V .

and scalar multiplication. 2. If v ∈ V and α ∈ R, then αv ∈ V .
. . . plus 8 other conditions, called axioms, below. The two conditions above are called closure under
addition, and closure under scalar multiplication. We can add vectors and scale vectors and stay in
the space. The crucial thing is that we can take linear combinations.
13 COLUMN SPACE 72
Axiom Meaning
3. Associativity of addition u + (v + w) = (u + v) + w
4. Commutativity of addition u+v=v+u
5. Zero element (identity of addition) There is an element 0 ∈ V such that
v + 0 = v for all v ∈ V .
6. Negative element (additive inverse) For every v ∈ V there is a −v such
that v + (−v) = 0.
7. Associativity of scalar multiplication α(βv) = (αβ)v
8. Identity element 1v = v
9. Distributive property 1 α(u + v) = αu + αv
10. Distribuitive property 2 (α + β)v = αv + αu
Vector spaces have a structure like Rn , and our intuition is guided by our knowledge of Rn . However,
there are very many other (important) vector spaces. A few of the more important ones are the
following.
Examples of vector spaces:
1. Mm,n (R), the vector space of m × n matrices with real entries.
2. The vector space of functions.
3. The vector space Pn of polynomials of degree n or less.
4. The vector space of all solutions y(t) to Ay 00 + By 0 + Cy = 0.
Example 13.1. What is the zero vector in M2 ?

13 COLUMN SPACE 73
Example 13.2. Why is the “or less” qualifier necessary in Pn ? Is the set of polynomials of exact
degree 3 a vector space?
SUBSPACES
The spaces associated with a matrix, as well as many other important vector spaces, are in fact
subsets of larger spaces. They are subsets, but also vector spaces on their own. Such spaces are
called subspaces.
Definition 13.2. A subspace U of a vector space V is a subset of V that is a vector space by

itself.
One of the nice things about subspaces is that they inherit the properties of the space they live in.
This means that the axioms will all be satisfied as long as we have closure under addition and scalar
multiplication. We only need . . .
A subset U of a vector space V is a subspace of V if and

only if U is closed under addition and scalar multiplication.
Example 13.3. Show that the line x + 3y = 0 is a subspace of R2 .

13 COLUMN SPACE 74
Example 13.4. Is the unit disk a subspace of R2 ?
Example 13.5. Describe all the subspaces of R2 .
Example 13.6. Describe all the subspaces of R3 .
Example 13.7. The set of polynomials of exact degree 3 is a subset of P3 , but is not a subspace.
P2 is a subspace of P3 .
13 COLUMN SPACE 75
Example 13.8. Show that the set of functions satisfying f (0) = 0 is a subspace (and hence a vector
space) of the set of functions. What about the set of functions satisfying f (0) = 1?
SPAN & SPANNING SETS

One way to construct a vector space is to start with a collection of vectors and take all the possible
linear combinations of these vectors. Given vectors v1 , v2 , . . . , vn , the span of these vectors is
span span {v1 , v2 , . . . , vn } = {c1 v1 + c2 v2 + · · · + cn vn : c1 , c2 , . . . , cn ∈ R}
The span is the set of all the linear combinations. If a vector space V is the span of v1 , v2 , . . . , vn ,
we say that {v1 , v2 , . . . , vn } is a spanning set for V , and V is spanned by {v1 , v2 , . . . , vn }.
Theorem 13.1. Given any collection of vectors {v1 , v2 , . . . , vn } in some vector space V ,
span{v1 , v2 , . . . , vn }
is a subspace of V .
Proof.
13 COLUMN SPACE 76
Example 13.9. span{1, x, x2 } = P2 . A spanning set for P2 is {1, x, x2 }. Find another spanning set
for P2 .
Example 13.10. Find a spanning set for the set of symmetric matrices.
(" # " # " #)

1 0 0 1 0 0
Example 13.11. Describe span , , .
0 0 0 0 0 1
13 COLUMN SPACE 77
COLUMN SPACE
Recall that Ax is
Ax = x1 (column 1) + · · · + xn (column n)
Let a1 , . . . , an be the columns of A, i.e.
h i
A = a1 a2 · · · an
Then
Ax = x1 a1 + x2 a2 + · · · + xn an
Thus, Ax is a linear combination of the vectors a1 , a2 , . . . , an . The set of all such linear combinations
is called the column space of A, and is denoted C(A).
column space C(A) = all linear combinations of the columns of A
Another way to say this is that the column space is the span of the columns, or is spanned by the
columns of A. In set notation:
C(A) = {Ax : x ∈ Rn }
Since C(A) is a span, it is a subspace.
C(A) is a subspace of Rm .
The column space can also be thought of as the range of the linear operator A – it is everything
you can get from multiplying vectors by A. Some authors use R(A) for the column space, to denote
range of A.
Example 13.12. Find C(A) and C(B).

" # " #
1 3 1 3
A= B=
2 6 2 2
13 COLUMN SPACE 78
Example 13.13. Find spanning sets for the column spaces of the matrices in Example 13.12.

C(A) = span , C(B) = span
Since Ax is a linear combination of the columns of A, and thus in the column space of A for any
given vector x, solving Ax = b is equivalent to finding a linear combination of the columns to add
up to b. Therefore,
Condition for a solution to exist

Ax = b has a solution if and only if b ∈ C(A).
Example 13.14. What conditions do b and d have to satisfy for Ax = b and Bx = d to have a
solution for the matrices in Example 13.12?
13 COLUMN SPACE 79
 
1 2 −1
Example 13.15. Find a condition for b ∈ C(A) for A = 1 3 −1.
 
3 2 −3
Example 13.16. Find a 2 × 2 matrix A whose column space is the line x + 2y = 0.
Exercises
1. Which of the following subsets of R3 are subspaces?
(a) The plane of vectors (x1 , x2 , x3 ) with x1 = 0.

(b) The plane of vectors (x1 , x2 , x3 ) with x1 = 1.
(c) The vectors with x1 x2 x3 = 0.
(d) All vectors that satisfy x1 + x2 + x3 = 0.
(e) All vectors that satisfy x1 + x2 + x3 = 1.
(f) All linear combinations of (1, 1, 0) and (−2, 3, 0).
(g) All vectors with x1 ≤ x2 .
(h) All vectors with x1 = x2 .
13 COLUMN SPACE 80
2. (a) Find an example of a set in the plane where closure under addition holds, but scalar
multiplication fails. In particular, find a set S, such that if v, w ∈ S, then v + w ∈ S, but
1
2 v may be outside of S.
(b) Find an example where closure under scalar multiplication holds but closure under addi-
tion fails.
3. a) Consider the set S of all polynomials of the form
c1 + c2 x + c3 x3
for c1 , c2 , c3 ∈ R. Is S a vector space?

b) Consider the set U of all polynomials of the form
1 + c1 x + c2 x3
for c1 , c2 ∈ R. Is U a vector space?
4. Let U and V be two lines through the origin in the plane. Both U and V are subspaces of R2 .
The set U + V is defined as the set of all sums of elements from U and V . That is,
U + V = {u + v : u ∈ U, v ∈ V }
(a) Show that U + V is a subspace of R2 (and, hence, a vector space).

(b) Is the union U ∪ V a subspace of R2 ?
(c) What is the difference between U + V and U ∪ V ?
(" #)
0 1
5. Describe span .
−1 0
6. Take the 3 matrices in Example 13.10 and the matrix in Exercise 5. The span of these four
matrices is what?
7. Sketch C(A) and C(B) for

 
" # 1 −1 3
2 6
A= , B = 2 −2 6 
 
3 9
5 −5 15
8. Find a 3 × 3 matrix A whose column space is the plane x + 2y − z = 0.
9. If A is a 3 × 3 invertible matrix, then its column space is . Why?
10. If we add an extra column b to a matrix A, then the column space could get larger or it could
stay the same. Give an example where the column space gets larger and an example where it
doesn’t. Why is Ax = b solvable exactly
h when
i the column space doesn’t get larger? That is,
why is Ax = b solvable when A and A b have the same column space?
14 THE NULLSPACE OF A 81
14 The nullspace of A
The next subspace associated with a matrix is the nullspace. We want to completely describe this
space and see how it relates to other spaces.
Definition 14.1. The nullspace of A, denoted N (A), is the set of all solutions to Ax = 0.
In set notation:
N (A) = {x ∈ Rn : Ax = 0}
There is always one solution to Ax = 0, namely x = 0. One of the key questions is, Are there any
other solutions? Sometimes there will be, and other times x = 0 will be the only solution. First:
Theorem 14.1. If A is m × n, then N (A) is a subspace of Rn .
To construct the nullspace, and make sure that we have all of the solutions of Ax = 0, we will find
the special solutions. We have an efficient way to do this. We do Gauss-Jordan elimination. Now
we will not be content to reduce a matrix to upper triangular (echelon) form. We want to get 1’s on
the pivots, and zeros in the other entries of the pivot columns. This form of the matrix is called the
row reduced echelon form (RREF) of the matrix.
" #
1 3
Example 14.1. Find the nullspace of A = . Sketch N (A) and C(A).
2 6
h i
Example 14.2. Find the nullspace of A = 1 −3 2 .
Elimination and reduction to triangular form produces pivot columns and free columns. The free
columns will be associated with free variables.
Example 14.3. " # " #
2 4 4 6 2 4 4 6
A= becomes U=
2 5 6 9 0 1 2 3
after elimination. The pivots are 2 and 1. The multiplier is 1. Now we have two pivot columns
(the first two columns) and two free columns (the last two columns). Reduce further to reduced row
echelon form by the following steps.
1. Produce zeros above the pivots by eliminating upwards.
2. Produce ones in the pivots by dividing the whole row by the pivot.
When we do this on U we get the row reduced echelon matrix
" #
1 0 −2 −3
R=
0 1 2 3
The free columns are associated with free variables. To obtain the special solutions we do the
following for each free variable. Set a free variable to 1, and the others to 0. Solve for the pivot
variables. In this example, when we set x3 to 1 and x4 to 0, we get the special solution s1 , and when
we set x3 to 0 and x4 to 1, we get the second special solution s2 .
   
2 3
−2 −3
s1 =   s2 =  
   
 1  0
0 1
Both s1 and s2 are in the nullspace, and hence any linear combination of them will also be in the
nullspace. In fact, all the combinations make up the entire nullspace. N (A) = span{s1 , s2 }. {s1 , s2 }
forms a basis for N (A). The nullspace of A is the set of all solutions, or the general solution, of
Ax = 0.
     
2 3 2x1 + 3x4
−2 −3 −2x − 3x 
3 4
General solution to Ax = 0 : x = x3   + x4   = 
    
 1  0  x3


0 1 x4
where x3 , x4 are free variables.
This is how we picture elimination. Suppose A is a 4 × 5 matrix. After elimination we get the upper
triangular echelon matrix  
p ? ? ? ?
0 0 p ? ?
U =
 
0 0 0 p ?

0 0 0 0 0
The stars can be anything. p represents a pivot. The three pivots are in columns 1, 3 and 4. The
pivot variables are x1 , x3 , x4 . The free columns are columns 2 and 5. The free variables are x2 and
x5 . When we reduce to reduced row echelon form, we will have
 
1 ? 0 0 ?
0 0 1 0 ?
R=
 
0 0 0 1 ?

0 0 0 0 0
We get one special solution s1 by setting x2 = 1, x5 = 0 and solving for the pivot variables x1 , x3 , x4 .
We get another special solution s2 by setting x2 = 0, x5 = 1 and solving for the pivots. The special
solutions {s1 , s2 } form a basis for the nullspace–they are a spanning set and they are independent.
Then the complete, or general, solution to Ax = 0 is
x = x2 s1 + x5 s2
Example 14.4. Find the general solution to Ax = 0 for

 
2 3 3 5
A = 2 −1 1 2
 
4 −2 2 4
THE RANK OF A MATRIX AND THE RANK-NULLITY THEOREM

When we perform row operations we don’t change the solution set. The pivot columns and variables
don’t change. The number of pivots is key. It tells us the ‘true size’ of a matrix. Note that there
can be at most m pivots, and at most n pivots. There can’t be more pivots than rows (or columns).
The number of pivots is called the rank of the matrix. The number of free variables is called the
nullity of the matrix. These terms also refer to the dimension of C(A) and N (A), respectively, in a
sense we will detail later on.
rank of A = number of pivot columns = dimension of C(A)

nullity of A = number of free columns = dimension of N (A)
The number of pivot columns + the number of free columns has to add up to the total number of
columns. Thus, if A is m × n, we have
Rank-Nullity Theorem: Rank r plus nullity n − r equals n.
Example 14.5. Find the rank and nullity of the matrix in Example 14.4.
Example 14.6. If a 2 × 3 matrix  has one special solution, what is its rank? Find a 2 × 3 matrix A
−2
with the special solution s1 =  3.
 
1
To end this section, we make a simple but important observation. Suppose A has more columns
than rows. When n > m, there is at least one free variable. This means that Ax = 0 has at least
one special solution, and hence at least one nonzero solution. It is important enough for a box:
Suppose A has more columns n than rows m. Then there are

nonzero solutions to Ax = 0. There are nonzero vectors in N (A).
Exercises
1. Let    
1 3 2 4 1 2 1
A = 1 3 3 11 , B = 3 1 1
   
0 0 1 7 2 −1 0
(a) Find the general solutions to Ax = 0 and Bx = 0 by reducing to Reduced Row Echelon
form and finding the special solutions.
(b) Find spanning sets for N (A) and N (B).
(c) Find the rank & nullity of A and B.
2. The equation x + 2y − 5z = 0 determines a plane in R3 . What is the matrix in this equation?

Which are the free variables? Find the two special solutions. Then these span the plane.
3. Generally, the nullspaces of A and AT are not the same. Give an example where N (A) 6=
N (AT ). For which matrices will N (A) = N (AT )?
 
2
4. Is it possible for s1 = 1 to be the only special solution of a 2 × 3 matrix?
 
3
5. (a) Construct a 2 × 2 matrix whose nullspace is the same as its column space.
(b) Why does no 3 × 3 matrix have a nullspace that is equal to its column space?
6. If AB = 0, then ABx = A(Bx) = 0 for all vectors x. This implies that the column space of
B is contained in the of A. Construct an example of an A and B that satisfies
AB = 0. Both A and B should have some nonzero entries.
15 THE COMPLETE SOLUTION TO AX = B 86
15 The complete solution to Ax = b

h i
When we do elimination on the augmented matrix A b , we end up with an echelon matrix with
r pivots. We have r nonzero rows, and n − r free variables. Each free column gives us one special
solution to Ax = 0. We then need to find one particular solution to Ax = b. Let us call xp a
partiular solution. Then if xn solves Ax = 0, we have
A(xn + xp ) = Axn + Axp = 0 + b = b
Thus, the general, or complete, solution is the sum of the null solution (general solution to Ax = 0)
and a particular solution:
xcomplete = xn + xp
Once we have the echelon form, the nonzero rows cannot be eliminated, so these are independent.
The rank of a matrix thus tells us how many pivots, and how many independent rows we have.
Recall again, that
Ax = b has a solution if and only if b ∈ C(A).
Depending on the size of A, we can have more than one solution, and if we do, then there are
necessarily infinitely many solutions.
Example 15.1. Find the general solution to Ax = b, where

" # " #
1 3 3 4
A= b=
−1 2 5 1
In the example above, A has full row rank, meaning that the rank r of the matrix is equal to the
number of rows m. When r = m, there is always a solution. The number of special solutions is
n − r = n − m = 1 in this case.

   
1 3 3 4
−1 2 5 4
A= b= 
   
 0 1 1 1

2 0 0 2
The matrix A in Example 15.2 has full column rank, meaning that the rank r is equal to the number
of columns n. When A has full column rank, there may or may not be a solution. If there is a
solution, it is unique. There are no special solutions. The only solution of Ax = 0 is x = 0.
 
1
1
Example 15.3. Repeat Example 15.2 with b =  . Show that the rank of the augmented matrix
 
1
1
h i
A b is larger than the rank of A. The column space has increased.
It is possible for the rank to be smaller than the number of rows and the number of columns.
   
1 3 3 5 4
A = −1 2 5 6  b = 1
   
0 5 8 11 5
To summarize, suppose A is an m × n matrix with rank r.
1. Ax = b has a solution if and only if b ∈ C(A)
2. If A has full row rank (r = m)
(a) All rows have pivots, and the RREF R has no zero rows.
(b) Ax = b always has a solution. (It may not be unique.)
(c) C(A) is the whole space Rm .
(d) There are n − r = n − m special solutions in N (A).
3. If A has full column rank (r = n)
(a) All columns have pivots, and the RREF has no zero columns.
(b) If a solution to Ax = b exists, it is unique. (A solution might not exist.)
(c) There are no free variables or special solutions.
(d) The only solution to Ax = 0 is x = 0.
4. If A is square (m = n), then
(a) If the rank of A is n (r = n), Ax = b always has a unique solution.

(b) If the rank of A is less than n (r < n), then there might not be a solution.
There are n − r special solutions to Ax = 0.
Exercises
1. Find the general solution to Ax = b if
" # " #
1 3 3 1
A= , b=
2 1 0 2
2. Consider the system
x1 + 2x2 + 3x3 + 5x4 = b1

2x1 + 4x2 + 8x3 + 12x4 = b2
3x1 + 6x2 + 7x3 + 13x4 = b3
h i
(a) Reduce A b to Row Reduced Echelon Form Rx = c.
(b) Find the condition on b1 , b2 , b3 for Ax = b to have a solution.
(c) Find the nullspace of A as the span of special solutions.
h iT
(d) Find a particular solution when b = 3 6 9 , and the general solution.
3. (a) The largest possible rank of a 3 × 4 matrix is . In this case, there is a pivot in
every of U and R. The column space is then .
(b) The largest possible rank of a 4 × 3 matrix is . In this case, there is a pivot in
every of U and R. The nullspace is then .
4. Suppose you know that the 3 × 4 matrix A has the single special solution to Ax = 0 the vector
h iT
s= 3 2 1 0 .
(a) What is the rank of A?

(b) What is the Row Reduced Echelon Form R of A?
(c) How do you know that Ax = b always has a solution?
16 LINEAR INDEPENDENCE, BASIS AND DIMENSION 90
16 Linear independence, basis and dimension

The following matrices have rank 2:
   
−2 1 1 1 2 1
A =  3 2 2 B= 1 1 2 (16.1)
   
1 7 7 −2 −5 −1
The rank measures the ‘true size’ of the matrix. One can see that the 2nd and 3rd columns of A
are the same. So only the first two columns are needed to make the column space of A. It is a little
more difficult to see for matrix B, but
matrix B : (column 3) = 3 × (column 1) − (column 2)
Again, only the first two columns are needed to make the column space of B. We can get the third
column from a linear combination of the first two columns. Thus, for both A and B, the first two
columns are enough for the column space. The column space is spanned by the first two columns.
We also need at least this many. We can’t get rid of either of the first two columns, or make a smaller
set of vectors to span the space. Thus, we say that the first two columns form a basis for the column
space. To understand what this means we need a few definitions.
Definition 16.1. Vectors v1 , v2 , . . . , vm are linearly dependent (LD) if there is a relation
c1 v1 + c2 v2 + · · · + cm vm = 0
where not all ci ’s are zero. This is called a linear dependence relation.
Vectors v1 , v2 , . . . , vm are linearly independent (LI) if they are not LD. That is, they are LI
if the only way to make a linear combination of them zero is to set all of the coefficients to zero:
c1 v1 + c2 v2 + · · · + cm vm = 0 ⇐⇒ c1 = c2 = · · · = cm = 0
Two vectors are LI if and only if they are not scale multiples of each other:
c1 v1 + c2 v2 = 0 ⇔ c1 = c2 = 0
If we can find nonzero c1 , c2 to satisfy c1 v1 + c2 v2 = 0, we would have v1 = −c

c1 v2 . So as long as v1
2
and v2 are pointing in different directions, they are LI. For two (nonzero) vectors, we need for both
c1 and c2 to be nonzero to make v1 , v2 linearly dependent.
Example 16.1. The columns of A and the columns of B for the matrices (16.1) are linearly depen-
dent. They have the following linear dependence relations.
A: 0 · (column 1) + (column 2) − (column 3) = 0
B: 3 · (column 1) − (column 2) − (column 3) = 0
The example above shows that we have linear dependence as long as we have a linear dependence
relation where some of the ci ’s are nonzero. We don’t need all of them to be nonzero.
Since Ax is a linear combination of the columns, we have
The columns of A are LI if and only if the only solution to Ax = 0 is x = 0.
Another way to say this is that the columns of an m × n matrix A are LI if and only if A has n
pivots, i.e. A has full column rank.
Example 16.2. Change the last column of A in eqn (16.1) so that the columns are LI.
BASIS
Definition 16.2. The vectors v1 , v2 , . . . , vm are a basis for a vector space V if
(i) They are a spanning set for V , and
(ii) They are LI.
Another way of saying this is that a basis is a minimal spanning set. It is a spanning set with the
least number of vectors.
Example 16.3. Any two non-zero vectors that don’t point in the same direction form a basis for
R2 .
Theorem 16.1. There is one and only one way to write x ∈ V as a linear combination of the
basis vectors.
(" # " #) (" # " # " #)

2 1 2 1 1
Example 16.4. , is a basis for R2 . , , is a spanning set for R2 , but is
1 1 1 1 −2
not a basis.
Example 16.5. {1, x, x2 } is a basis for P2 . 1, x − 2, x2 − x is also a basis for P2 .

DIMENSION
Theorem 16.2. Let V be a vector space. Every basis for V has the same number of elements.
This unique number is called the dimension of V , written dim[V ].
Proof.
Corollary 16.1. Any n LI vectors in a vector space with dim[V ] = n form a basis for V .
Example 16.6. dim[Rn ] = n
Example 16.7. Any 3 LI vectors in R3 form a basis for R3 .
Example 16.8. dim[Pn ] = n + 1.

Theorem 16.3. An n × n matrix A is invertible if and only if the only solution to Ax = 0 is

x = 0.
Proof.
The row space and its basis

If we have a set of vectors {v1 , v2 , . . . , vn } and we want to find a basis for the span of these vectors,
how do we do it? For instance, we might want to find a basis for the column space. There are a few
ways to to it. First, a definition.
Definition 16.3. The span of the rows of A is called the row space, and is denoted C(AT ).
The row space is the column space of the transpose.

Recall that when we do elimination, we perform Elementary Row Operations (ERO). Here is a key
point: The EROs leave the span the same! EROs just form different linear combinations of the
original rows, so the rows we end up with at the end are linear combinations of the original rows.
Theorem 16.4. The pivot rows after elimination are LI, so these form a basis for the row space.
Proof.
Example 16.9. Find a basis for the row space of the matrix B in eqn (16.1).
A basis for the column space

Finding a basis for the column space is similar, but not quite the same. When we perform EROs on
the rows, we do, in fact, change the column space. So the pivot columns of U are not a basis for
C(A). However,
Theorem 16.5. The columns of A corresponding to the pivot columns of an echelon form U are
a basis for C(A).
Proof.
Example 16.10. Find a basis for the column space of the matrix B in eqn (16.1).
 
0 1 1 1
T
Example 16.11. Find bases for C(A), C(A ) and N (A) for A = 0 3 2 1
 
0 2 1 0
We can find a basis for the span of any collection of vectors by either putting them as the rows of a
matrix and finding the pivot rows, or by putting them as the columns and finding the columns of A
corresponding to the pivot columns. When we use the column space to find a basis, we find a basis
in terms of the original vectors. When we use the row space, we find a basis in terms of different,
usually ‘simpler’ vectors.
Notice that the number of vectors in the basis for C(A) and C(AT ) are both equal to the number of
pivots, and hence they are the same! Thus, we have the following:
rank(A) = dim[C(A)] = dim[C(AT )] (16.2)
Other ways of saying this are:
dimension of the row space = dimension of the column space

row rank = column rank
# LI rows = # LI columns
Exercises
1. Show that v1 , v2 , v3 are LI, but v1 , v2 , v3 , v4 are LD.
       
1 1 1 1
v1 = 0 , v2 = 1 , v3 = 1 , v4 = 2
       
0 0 1 3
2. Find the largest number of LI vectors among

       
1 1 0 1
v1 = −1 , v2 =  0  , v3 = −1 , v4 = 1
       
0 −1 1 1
3. Find bases for the column space, row space and nullspace for
   
3 2 1 0 2 1 3
(i) A = 1 1 0 1 and (ii) B = −2 −1 −3
   
2 1 1 −1 4 2 6
4. Find two LI vectors on the plane x − 2y + 3z = 0 in R3 . Can you find 3? The plane is the
nullspace of what matrix?
5. True or False (with reasons):
(a) If the columns of a matrix are LD, then so are the rows.
(b) Any 4 vectors in R3 are LD.
(c) The column space of a 2 × 2 matrix is the same as the row space.
(d) The column space of a 2 × 2 matrix has the same dimension as the row space.
6. Find a basis for, and determine the dimension of, each of these subspaces in R3 :
(a) All vectors whose components are equal.

(b) All vectors whose components add to zero.
(c) All vectors perpendicular to (1, 0, 1).
7. Which of the following sets is a basis for P2 ? (Hint: Corollary 1.)
{2 − x, 4 − 2x, x2 }, {2 − x, 1 + x, x2 }
8. Suppose {v1 , v2 , v3 } is a basis for R3 . Show that if a 3 × 3 matrix A satisfies det(A) 6= 0, then
{Av1 , Av2 , Av3 } is also a basis for R3 .
17 THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 98
17 The fundamental theorem of linear algebra

In this section we get to what Gilbert Strang calls the “big picture” of linear algebra. We will see
how the four fundamental subspaces relate to each other. First, though, we need that fourth space.
We have talked about the nullspace, the column space, and the row space. The last of the four is
the left nullspace, denoted N (AT ). This is the nullspace of the transpose.
n o
Left nullspace: N (AT ) = x ∈ Rm : AT x = 0
If we take the transpose of AT x = 0, we have
xT A = 0
This is why it is called the left nullspace. Row vectors multiplying A from the left gives you zero.
Now we have the following four fundamental subspaces of an m × n matrix A with rank r:
The four fundamental subspaces
space dimension
column space C(A) r

row space C(AT ) r
nullspace N (A) n−r
left nullspace N (AT ) m−r
" #
1 3
Example 17.1. Sketch the four fundamental subspaces of A = .
2 6
Recall that in the last section, we found that the dimensions of the column space and the row space
are the same, and both are equal to the rank of A:
dim[C(A)] = dim[C(AT )]
Moreover, the dimensions of the column space and the nullspace have to add up to n, the number
of columns:
rank + nullity = # of columns

dim[C(A)] + dim[N (A)] = n
or dim[C(AT )] + dim[N (A)] = n
Now, the key thing is that the row space and the nullspace are orthogonal. Every vector in the row
space is orthogonal to every vector in the nullspace, and vice versa. Symbolically,
C(AT ) ⊥ N (A)
In fact, C(AT ) and N (A) are more than just orthogonal. They are orthogonal complements.
Definition 17.1. Let V be a finite dimensional vector space, and S a subspace of V . The orthogonal
complement S ⊥ of S is everything in V that is orthogonal to everything in S:
Orthogonal complement: S ⊥ = {x ∈ V : x · y = 0 for all y ∈ S}
Every vector in V can be written as the sum of an element in S and an element in S ⊥ . We say that
V is the direct sum of S and S ⊥ . This is often written as V = S ⊕ S ⊥ .
Example 17.2. Find the orthogonal complement of the plane 2x − y + z = 0.

Theorem 17.1 (The Fundamental Theorem of Linear Algebra). Let A be an m × n matrix.

Then the row space is the orthogonal complement of the nullspace:
C(AT ) = N (A)⊥
This also applies to AT , so the column space is the orthogonal complement of the left nullspace:
C(A) = N (AT )⊥
(17.1)
Proof.
THE BIG PICTURE

n m
R R
C(AT) C(A)
Axr=b
b
xr
Ax=b
x=xr+xn
Axn=0
xn
N(AT)
N(A)
" #
1 3
Example 17.3. Sketch the big picture for A =
2 6
Example 17.4. Find bases for the four fundamental subspaces of A, and sketch the big picture, for
" # " #" #
1 2 3 1 0 1 2 3
A= = = E −1 R
2 4 6 2 1 0 0 0
The implications of the fundamental theorem are far-reaching. We can also say that Rm is the direct
sum of C(A) and N (AT ):
Rm = C(A) ⊕ N (AT )
Everything in Rm can be written as x = y + z, where y ∈ C(A) and z ∈ N (AT ), and y ⊥ z. The

same goes for Rn , as well: Rn = C(AT ) ⊕ N (A).
Since the column space is the orthogonal complement of the left nullspace, a vector b will be in the
column space if and only if b ⊥ y for all y ∈ N (AT ). Since Ax = b has a solution if and only if
b ∈ C(A), we have the following consequence of the fundamental theorem:
The Fredholm Alternative
Ax = b has a solution if and only if b · y = 0 for every y such that AT y = 0.
It is an alternative, since it may be phrased as:
1. Either Ax = b has a solution, or
2. There exists a y such that AT y = 0 and b · y = 1.
 
1 −1
−1 1 
 
Example 17.5. Let A =  −1 1 . (This is the “backward difference” matrix.)
 
−1
 
 1 
−1 1
Show that Ax = b has a solution if and only if b is on the plane b1 + b2 + b3 + b4 + b5 = 0.
Rank one matrices Every rank one matrix can be written as
uvT
Example 17.6. What are C(A) and C(AT ) for a rank one matrix uvT ?
Example 17.7. Find the 4 fundamental subspaces for

 
1 2 1
3 6 3
A=
 
5 10 5

0 0 0
Exercises
1. Why can’t a matrix A have (1, 2, 3) in its row space and (3, 2, 1) in its nullspace?
2. Sketch the big picture for " #
2 10
A=
1 5
3. Find bases and dimensions for the four fundamental subspaces for
" # " #
1 2 1 1 2 1
A= , and B=
2 4 2 3 5 2
4. Find bases and dimensions for the four fundamental subspaces. This can be done without
multiplying matrices.   
1 0 0 1 1 2 1
A = 2 1 0 0 0 1 3
  
3 2 1 0 0 0 5
5. Without multiplying matrices, find bases for C(A) and C(AT ), for
 
1 3 " #
 1 2 1
A = 0 2

3 5 7
1 2
How do you know that any such product (a (3 × 2)(2 × 3) product) cannot be invertible?
6. For the 4 × 4 backward difference matrix
 
1 0 0 −1
−1 1 0 0
A=
 
 0 −1 1 0

0 0 −1 1
the left nullspace is spanned by (1, 1, 1, 1). Why? So there is a solution to Ax = b if and only
if b1 + b2 + b3 + b4 = 0. Write out the equations and add them together to see this another way.
7. Consider the sum of two rank one matrices:
A = uvT + wzT
(a) Which two vectors span C(A)?
(b) Which two vectors span C(AT )?
(c) What condition must be satisfied to guarantee rank(A) = 2?
8. Show that N (AT A) = N (A). Hint: Suppose AT Ax = 0. Then Ax is in N (AT ) (why?). But
Ax is in C(A) (why?). So Ax is in C(A) and N (AT ). The only vector in both spaces is the
zero vector (why?), so Ax = 0.
9. A basis for N (AT ). We have seen how to find bases for the column space, row space and
nullspace. What about the left nullspace? One way to do it is to find the RREF of AT and
calculate the special solutions. Here
h is ianother method. For m × n A, form the augmented
matrix with the m × m identity: A I . Then reduce this augmented matrix to RREF. The
rows in the last m columns corresponding to the zero rows in the first n columns are then a
basis for the left nullspace. Explain why this works and illustrate with an example.
18 PROJECTION 105
18 Projection
Many of the systems that arise in applications are inconsistent. That is, they do not have a solution.
But we would still like to “solve” them in some sense. We would like to find a best approximation,
or a “solution” that minimizes the error.
Example 18.1. Consider the system of equations
2x = 1
(18.1)
x=1
" # " #
2 1
This system has no solution. We can write it as ax = b, where a = and b = . Even though
1 1
we can’t find an x that solves ax = b, we would like to find an approximation that minimizes the
error
e = b − ax
The x̂ that minimizes kek occurs at the point ax̂ that is closest to b. This is the projection of b
onto a.
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
The projection of b onto a is determined by the condition that e = b − ax̂ is orthogonal to a. Thus,
we have
a · e = 0 or a · (b − ax̂) = 0
Therefore,
a·b
x̂ =
a·a
Since a · a = 5 and a · b = 3, this gives us
3
x̂ =
5
This is the least square solution to the overdetermined system (18.1). It minimizes, over all possible
x’s, the length of the error vector e. Notice that it fits our intuition of what a “best approximation”
should be, i.e. somewhere between 1/2 and 1.
18 PROJECTION 106
The projection of b onto a is " # " #

3 2 6/5
ax̂ = =
5 1 3/5
h iT
This is the closest point to b in span{a}, i.e. the closest point to b on the line spanned by 2 1 .
Notice that, since
a·b
x̂ =
a·a
the projection of b onto a is ax̂, which can be written either in terms of dot products or matrix
products:
a·b
(projection by dot product) projection of b onto a: a
a·a
aaT
(projection by matrix product) projection of b onto a: b
aT a
Thus, given any point b in the plane, we can find the closest point to b on the line spanned by a by
taking the projection. The projection matrix onto the line spanned by a is
aaT
P = (18.2)
aT a
" # " #
2 1 4 2
and the projection of b onto a is given by P b. In the case a = , we have P = .
1 5 2 1
Formula (18.2) gives the projection matrix generally for projecting onto the vector a.
" #
4
Example 18.2. What if b = ?
2
" #
0
Example 18.3. What if b = ?
1
18 PROJECTION 107
Example 18.4. Find the matrix that projects the plane onto the line 2x − y = 0.
h iT
Example 18.5. Find the projection matrix onto a = 1 1 1 , and the projection of b =
h iT
1 2 3 onto the line spanned by a.
18 PROJECTION 108
Least-squares solutions
Now we turn to the general problem of finding our best approximation to

Ax = b (18.3)
If b ∈ C(A), then (18.3) has a solution. Then we can find an x so that the error vector
e = b − Ax
is zero. But if b is not in C(A), then there is no solution. We will look for an approximation as in
the first example. We try to find a vector in C(A) that is as close to b as possible. That is, we try
to minimize the length
kek = kAx − bk
of the error. Since Ax is in C(A) for any x, this length will be minimized when e is orthogonal to
C(A).
Recall the fundamental theorem of linear algebra. The column space is the orthogonal complement
of the left nullspace:
C(A) = N (AT )⊥ (18.4)
Since kek will be minimized when e is orthogonal to C(A), the minimum occurs when e ∈ N (AT ).
That is, when
AT e = 0 or AT (b − Ax) = 0
e
N(AT)
Ax
C(A)
We call the solution of this equation x̂, and the equation for x̂ can be written in the following way:
AT A x̂ = AT b (18.5)
An x̂ that satisfies (18.5) is called a least-squares solution since it minimizes the sum of the squares
of the errors:
kek2 = e21 + e22 + · · · + e2m
The equations in the system (18.5) are referred to as the normal equations. So least-squares solutions
are solutions of normal equations.
It is an amazing fact that, regardless of what A is, even though Ax = b might not have a solution,
when we multiply both sides by AT to get the normal equations (18.5), there is always a solution,
and moreover it is, in a very deep sense, the “best” solution we can find!
The least squares solution is unique, as long as the columns of A are LI. This is because...
18 PROJECTION 109
Theorem 18.1. The nullspaces of A and AT A are the same.
Proof.
The above theorem implies that if A has LI columns, and hence Ax = 0 has only the zero solution,
then AT Ax = 0 has only the zero solution, so AT A is invertible. So we have proven the following.
Corollary 18.1. AT A is invertible if and only if A has LI columns.

18 PROJECTION 110
Projection matrices
We saw in equation (18.2) how to project onto a line. Now we will see how to find the projection
matrix onto a more general subspace.
The normal equations (18.5) are the equations for the closest point to b in the column space C(A).
As long as A has LI columns, the matrix AT A is invertible, in which case we can write the least
square solution as
x̂ = (AT A)−1 AT b
So the closest point to b in C(A) is Ax̂, or
Ax̂ = A(AT A)−1 AT b
This is the projection of b onto the column space of A. We write the projection matrix P as
P = A(AT A)−1 AT (18.6)
Then the projection of b onto the column space of A is P b.
N(AT)
Pb
C(A)
Theorem 18.2. P 2 = P
Proof.
18 PROJECTION 111
Example 18.6. Find the projection matrix for the projection of a point in R3 onto the xy−plane.
Example 18.7. Find the projection of the point (1, 0, 1) onto the plane spanned by (1, 1, 0) and
(0, 0, 1). Which points project to the point (2, 2, 3)?
18 PROJECTION 112
Exercises
1. Find the least squares solution to
3x = 10
4x = 5
Check that the error vector is perpendicular to the column (3, 4).
2. Find the least squares solution to
x = 1/2
x=1
Why do you get a different value from that obtained in Example 18.1?
3. What 2 × 2 matrix projects points in the plane onto the line 3x + 2y = 0.
4. Show that if A is invertible, then the projection matrix P that projects a point onto the columns
of A is P = I.
5. Find the least-squares solution to Ax = b, and the projection of b onto C(A), for
   
1 0 1
A = 0 1 , b = 1
   
1 1 0
6. Let S be the subspace of R3 spanned by v1 = (1, 0, 1) and v2 = (0, 1, 1). Find the projection
matrix P onto S, and find a nonzero vector b that is projected to zero.
7. If P 2 = P , show that (I − P )2 = (I − P ). When P projects onto the column space of A, I − P

projects onto which fundamental subspace?
8. What 2 × 2 matrix projects points in the plane onto the 45◦ line y = x?
9. For the projection matrix P in the previous problem, explain what H = I − 2P does. Explain
why H 2 = I.
19 LEAST SQUARES 113
19 Least squares
Linear fits to data

Suppose we are given n data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), and we want to fit a line through
these points the best we can. We write our line as
y = mx + b
and try to find the slope m and y−intercept b. If all the points lie on the line, we would then be
able to solve the system of equations
mx1 + b = y1
mx2 + b = y2
..
. (19.1)
mxn + b = yn
Generally, the points will not all lie on a line, so the above system is inconsistent. We will find a
least-squares fit of the line to the data. The error at each point is the distance from yi to the point
on the line, or ei = yi − (mxi + b). The figure below shows a least squares line through 4 points, and
the errors at each point.
(x4, y4)
e4
(x2, y2)
e3
e2
(x3, y3)
e1
(x1, y1)
The error vector e = (e1 , e2 , . . . , em ) contains all of these errors. The least squares line is the line
that minimizes the sum of the squares of the errors, which is the same as minimizing the length of
the error vector. That is, the least squares line minimizes
n
X n
X
E = kek2 = e2i = (yi − mxi − b)2
i=1 i=1
To find the least squares line we find the least squares solution to the inconsistent system (19.1).
This inconsistent system can be written in matrix-vector form as Av = y, where
   
x1 1 y1
" #
 x2 1  y2 
  
 m
A=
 .. ,
..  v= , y=
 .. 
 (19.2)
 . . b .
xn 1 yn
The vector y is given, and has the y−coordinates. The matrix A is given, and has the x−coordinates.
We want to solve for the vector v to give us the slope and y−intercept of the least-squares line. So
we find a least-squares solution by solving the normal equations
AT A v = AT y (19.3)
The system (19.3) for the least squares line is the 2 × 2 system
"P 2 P #" # "P #

xi xi m x i yi
= (19.4)
b
P P
xi n yi
Example 19.1. Find the least squares line through the points (0, 0), (2, 1), (3, 2).
Curve fitting
In most situations a line is not the best fit for data. Usually we will try to fit some kind of curve to
the data. This can be done to estimate some trend, or uncover some law underlying the observations.
For example, one might try to fit population data with an exponential function to try to predict the
future population. The method of least-squares can be used in these cases as well.
A classic example of curve fitting to discover a natural law was performed by the German astronomer
Johannes Kepler in the early 1600’s. Kepler tried to make sense of the incredibly accurate (naked
eye!) observations of the Danish astronomer Tycho Brahe on the motion of the planets. Kepler found
that the data did not fit the prevailing geocentric models nor the heliocentric model of Copernicus.
These models were based on a 2000+ year old tradition that the motions of the planets were made
up of circles. Kepler proposed the revolutionary idea that the planets not only orbited the sun,
instead of the other way around, but that they did so in ellipses! This model fit the data better,
although not by that much! Kepler formulated three laws of planetary motion based on this model.
His third law states that the square of the orbital period of a planet is proportional to the cube of
the semi-major axis of its orbit. In other words,
T 2 ∝ x3 (19.5)
where T is the orbital period and x is the semi-major axis. Notice that we can take square roots on
both sides and write the third law as
T = Cx3/2
The figure below shows the orbital period vs the mean distance from the sun of the first four planets.
The curve is T = 0.0011x3/2 .
orbital period (years)
2
1.8
Mars
1.6
1.4
1.2
1 Earth
0.8
0.6 Venus
0.4
0.2 Mercury
0
0 50 100 150
Mean distance from sun (mil miles)
Let’s see how we can use the tools we have developed to fit a curve like this to the data. Suppose
we suspect that the data follows some law like this, but we want to show that this is the best fit. So
we can try to fit the data to a curve
T = Cxm (19.6)
where T is the period and x is the distance from the sun. If we can show that the best fit is when
m = 3/2, that would be strong evidence for the third law.
So we try to fit a curve (19.6) to the data. In other words, we try to find the best values of C and
m so that the error between the curve and the data is as small as possible. Equation (19.6) is not a
line, so we cannot use least squares directly. But, if we take logs of both sides, we get
m ln x + ln C = ln T
which is linear in m and ln C. Now suppose we have data for 4 of the planets. Then we get the
system of 4 equations    
ln x1 1 " # ln T1
ln x 1 m ln T 
2 2
=
  
ln x3 1 ln C ln T3 
  
ln x4 1 ln T4
" #
m
This is a system of the form Av = b, where v = . Solving the normal equations
ln C
AT A v = AT b
gives us an m that is very close to 3/2! (Exercise.)
Planet Orbital period (yr) Ave dist from sun (million miles)
Mercury 0.240846 36
Venus 0.615 67.2
Earth 1 93
Mars 1.881 141.6
Jupiter 11.86 483.6
Saturn 29.456 886.7
Polynomial fits to data

Other kinds of curve fits can be put into a least-squares form as well. Suppose that we want to fit a
parabola through a number of data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ). Let’s call the parabola
p(x) = ax2 + bx + c
Our goal is to find the “best” parabola, which means finding the best values of a, b and c. If the
parabola fits exactly through all the points then
p(xi ) = yi , i = 1, 2, . . . , n,
but generally the data will not lie exactly on a parabola. To find the best parabola, we set up the
inconsistent system p(xi ) = yi , which can be written as
ax21 + bx1 + c = y1
ax22 + bx2 + c = y2
..
.
ax2n + bxn + c = yn
Or, in matrix-vector form as Av = y, where

   
x21 x1 1   y1
 2
 x2 x2 1  a  y2 
 
A= , v = b , y=
 
 .. .. ..   .. 

 . . . c .
x2n xn 1 yn
The matrix A above is called a Vandermonde matrix. As long as the xi ’s are distinct, the columns
will be LI. We then need to find the best approximation to Ac = y. This is done by solving the
normal equations
AT A v = AT y
Example 19.2. Find the best parabola through the points (0, 0), (1, 2), (2, 1), (3, 0).
In this case the Vandermonde matrix and y are

   
0 0 1 0
1 1 1 2
A= y= 
   
4 2 1 1

9 3 1 0
Therefore, the normal equations AT Ac = AT y are

    
98 36 14 a 6
36 14 6   b  = 4
    
14 6 4 c 3
Solving this system by elimination gives a = −3/4, b = 43/20, c = 3/20. So the least-squares
parabola through the 4 points is
3 43 3
p(x) = − x2 + x +
4 20 20
The least-squares parabola and the four points are shown below.
2.0
1.5
1.0
0.5
0.5 1.0 1.5 2.0 2.5 3.0
-0.5
As a final remark in this section, we note that the method above for finding the least-squares parabola
can be generalized in a natural way to finding the least squares polynomial of any degree through n
points.
Exercises
1. You wish to measure some quantity x, and take a series of measurements from which you obtain
values b1 , b2 , . . . , bn . If they were all the same, you would have x = bi . If they are different,
what is the least squares solution to this problem?
2. Find the least squares line through the points (−1, 0), (0, 1), (1, 1).
3. Find the best horizontal line through the 3 points in problem #2.
4. Find the best parabola of the form y = a + bx2 through the points (−1, 0), (0, 1), (1, −2).
5. Find the least-squares line through the points (0, 0), (1, 2), (3, 3), (4, 4).
6. Find the least-squares parabola through the 4 points in the problem #5. Sketch the points,
the least-squares line, and the least-squares parabola on the same graph.
20 GRAM-SCHMIDT ORTHOGONALIZATION 119
20 Gram-Schmidt orthogonalization
In many cases it is quite useful to obtain a basis consisting of orthogonal vectors. For one thing, it
is easy to find the coefficients of any vector in the span of a set of orthogonal vectors. Generally, if
u = c1 v1 + c2 v2
then we have to solve a system of equations to find the coefficients c1 and c2 . But, if v1 and v2 are
orthogonal, we can take the dot product with v1 on both sides:
u · v1 = (c1 v1 + c2 v2 ) · v1
= c1 v1 · v1 + c2 v2 · v1
| {z }
=0
= c1 v1 · v1
Thus, dividing by v1 · v1 , we obtain

u · v1
c1 =
v1 · v1
Doing the same thing with v2 , we get
u · v2
c2 =
v2 · v2
The same reasoning works in complete generality. If v1 , v2 , . . . , vn are orthogonal and u is in the
span of the vi ’s, then
u · vi
u = c1 v1 + c2 v2 + · · · + cn vn , ci = (20.1)
vi · vi
Example 20.1. Find the coefficients of (2, 1) in the basis {(1, 1), (−1, 1)}.
HOW TO FIND ORTHOGONAL VECTORS

The Gram-Schmidt algorithm gives us a way to find a set of orthogonal vectors with a given span
starting with any basis. Let us first consider two vectors a and b. Let us find v1 and v2 with the
same span as a, b, but orthogonal to each other. Actually, we already have the tools to do this, it is
b·a
just a matter of formulating it. Recall that the projection of b onto a is a.
a·a
a
b e
b"a
a"a
a
When we subtract the projection off from b we get the vector e, which is orthogonal to a. Thus,
we can start by letting v1 = a, and take v2 = e to be the vector obtained by subtracting off the
projection of b. That is,
v1 = a
b·a
v2 = b − a
a·a
Then v1 and v2 are orthogonal, and they are linear combinations of the vectors a and b, and hence
they have the same span.
Example 20.2. Verify that v1 and v2 are orthogonal.

Example 20.3. Find an orthogonal basis for the plane 3x + y − z = 0.
Even better than orthogonal is orthonormal, which means orthogonal and length one. A set of vectors
q1 , . . . , qn is orthonormal if (
0 if i 6= j
qi · qj =
1 if i = j
It is even easier to find the coefficients if we have an orthonormal basis. Equation (20.1) becomes, if
q1 , q2 , . . . qn are orthonormal,
u = c1 q1 + c2 q2 + · · · + cn qn , ci = u · qi (20.2)
To find an orthonormal basis, we do Gram-Schmidt and then divide each vector by its length.
Example 20.4. Find an orthonormal basis for the plane 3x + y − z = 0.

THE GRAM-SCHMIDT ALGORITHM

The same idea works in complete generality. Given a set of LI vectors u1 , u2 , . . . , un , to find an
orthogonal basis with the same span, we proceed iteratively, by successively subtracting off the
projection:
v1 = u1
u2 · v1
v2 = u2 − v1
v1 · v1
u3 · v1 u3 · v2
v3 = u3 − v1 − v2
v1 · v1 v2 · v2
and so on. Generally,
j−1
X uj · vi
vj = uj − vi (20.3)
i=1
vi · vi
Example 20.5. Verify that the vj as defined in (20.3) are orthogonal.
Example 20.6. Find an orthogonal basis for the span of {(1, 1, 1), (1, 0, 0), (0, 1, 0)}.
Exercises
1. Find an orthonormal basis for the span of {(−1, 0, 1), (1, 1, 2)}.
" #
1 2
2. In this problem we will find the QR factorization of the matrix A = .
1 1
" # " #
1 2
(a) Start with the vectors a = and b = . Perform the Gram-Schmidt algorithm to
1 1
obtain two orthogonal vectors v1 and v2 in terms of a and b.
(b) Normalize the vectors you obtained in part (a) to obtain orthonormal vectors q1 , q2 .
(c) Find the coefficients of a and b in terms of q1 and q2 . That is, find the rij in
a = r11 q1 + r21 q2 , b = r12 q1 + r22 q2
Hint: Use (20.2).

h i h i
(d) Let A = a b be the matrix with a and b as columns. Let Q = q1 q2 be the matrix
" #
r11 r12
with the qj as columns. Then Q is orthogonal (why?) and R = is upper
r21 r22
triangular (why?). Show that
A = QR
Doing Gram-Schmidt on the columns of a matrix results in this factorization, called the
QR−factorization of a matrix, which is extremely important in numerical analysis.
3. Find the QR factorization of

 
" # 1 0 0
3 1
A= and B = 0 2 3
 
4 −5
0 1 2
21 INTRODUCTION TO EIGENVALUES & EIGENVECTORS 124
21 Introduction to eigenvalues & eigenvectors

Recall that if v is a vector, Av is another vector. Generally Av will point in a different direction
from v. However, there are special directions where these two vectors line up and point in the same
direction. When Av points in the same direction as v, Av is a scalar multiple of v. This scalar
multiple is called an eigenvalue, and the special direction is called an eigenvector for A. We then
have
Av = λv (21.1)
λ − eigenvalue, v − eigenvector
" #
1 1
Example 21.1. Find the eigenvalues and eigenvectors of A = .
1 1
Finding eigenvalues with the characteristic equation

Generally, finding eigenvalues is a difficult problem, unless the matrix is very small. However, when
the matrix is small, we can find the eigenvalues by solving the characteristic equation. (If A is large,
this method will fail.)
Suppose Av = λv, then
(A − λI)v = 0 (21.2)
We only count v as an eigenvector if it is nonzero. This means that there is a nonzero vector in the
nullspace of A − λI. Thus,
Characteristic equation: det(A − λI) = 0 (21.3)
This (21.3) is called the characteristic equation of A. Solutions of equation (21.3) are eigenvalues of
A. Once one has the eigenvalues of A, one can find the eigenvectors by solving equation (21.2).
The key thing about eigenvectors is the direction. We are free to scale them however we like.
Example 21.2. Show that if v is an eigenvector with associated eigenvalue λ, then any scalar
multiple of v is also an eigenvector associated with λ.
" #
0 −3
Example 21.3. Find the eigenvalues and eigenvectors of A = .
−1 2
 
1 1 0
Example 21.4. Find the eigenvalues and eigenvectors of A = −2 1 3.
 
−1 1 2
" #
1 −2
Example 21.5. Find the eigenvalues of A = .
1 3
The last example illustrates an important fact about eigenvalues of real matrices: Eigenvalues don’t
have to be real valued. If they are complex, they come in complex conjugate pairs.
Rules for eigenvalues
An important quantity associated with eigenvalues is the trace of a matrix. This is the sum of the
diagonal elements.
trace of A: tr(A) = a11 + a22 + · · · + ann
1. tr(A) = sum of eigenvalues
2. det(A) = product of eigenvalues

3. If A is triangular (upper or lower), then the eigenvalues of A are the diagonal

entries.
4. The eigenvalues of AT are the same as the eigenvalues of A. (This does not hold
for the eigenvectors!)
5. λ = 0 is an eigenvalue of A if and only if A is singular (non-invertible).
6. If A is invertible and λ is an eigenvalue of A, then λ−1 is an eigenvalue of A−1 .
7. If λ is an eigenvalue of A, with eigenvector v, λk is an eigenvalue of Ak , with

eigenvector v.
Markov matrices
Markov matrices arise in situations where there is change between states and there is conservation
of the total amount of stuff. Such matrices have the property that the sum of the columns is equal
to 1.
Example 21.6. Suppose each year 20% of the population of city A moves to city B. The remaining
80% of city A stays in city A. Also, each year, 10% of the population of city B moves to city A.
The other 90% of city B stays in city A. Let xn be the population of city A at year n, and yn the
population of city B at year n. Initially the entire population lives in city A (city B is a new city).
Determine the population of each city at year n, and the population in the long run as n → ∞.
Solution. First we write down how the population at year n + 1 relates to the population at year n:
xn+1 = .8xn + .1yn

yn+1 = .2xn + .9yn
We can write this in matrix form as

" # " #" # " # " #
xn+1 .8 .1 xn xn+1 x
= or =A n
yn+1 .2 .9 yn yn+1 yn
where " #
.8 .1
A=
.2 .9
" #
x
is a Markov matrix. Let the vector xn = n , so that the above equations can be written as
yn
xn+1 = Axn
Notice that
x1 = Ax0
x2 = Ax1 = AAx0 = A2 x0
x3 = Ax2 = AA2 x0 = A3 x0
etc, so that at year n we have

" # " #
xn x
n
xn = A x0 or = An 0
yn y0
If λ1 , v1 and λ2 , v2 are eigenvalue-eigenvector pairs, then write the initial distribution in the basis of
eigenvectors:
x0 = c1 v1 + c2 v2
Using property 7 of eigenvalues, we then have
xn = c1 λn1 v1 + c2 λn2 v2
Thus, determining the distribution at any time n is reduced to finding the eigenvalues and eigenvec-
tors of A.
Let us find the eigenvalues and eigenvectors. We will use properties 1 & 2 of eigenvalues to determine
the eigenvalues. Notice that
tr(A) = 1.7 det(A) = 0.7
Two numbers that add to 1.7 and multiply to 0.7 are
λ1 = 1, λ2 = 0.7
Thus, these are the eigenvalues. To find the eigenvectors, look for vectors in the nullspace of A − λi I:
" # " # " #
−.2 .1 −2 1 1
λ1 = 1 : A−I = ∼ ⇒ v1 =
.2 −.1 0 0 2
" # " # " #
.1 .1 1 1 −1
λ2 = .7 : A − .7I = ∼ ⇒ v2 =
.2 .2 0 0 1
" #
1
Now, since the entire population is initially in city A, we can call x0 = . (The units are the total
0
population.) We need to write x0 in terms of the eigenvectors:
" # " # " #
1 −1 1
x0 = c1 v1 + c2 v2 or c1 + c2 =
2 1 0
Solving this linear system for c1 , c2 gives us c1 = 1/3, c2 = −2/3. Now we can write down the
solution!
xn = c1 λn1 v1 + c2 λn2 v2
= c1 v1 + c2 0.7n v2 (since λn1 = 1n = 1)
" # " #
1 1 2 −1
= − 0.7n
3 2 3 1
This gives us the population distribution at any year n. To determine the limiting distribution, we
use the fact that 0.7n → 0 as n → ∞. Thus, the limiting population distribution is
" #
1 1
lim xn =
n→∞ 3 2
That is, 1/3 of the population ends up in city A and 2/3 ends up in city B. Notice that the limiting
distribution is determined by the eigenvector v1 .
Exercises
1. What are the eigenvalues and eigenvectors of the identity matrix I?
2. For A in Example 21.3, calculate the eigenvalues and eigenectors of AT . Show that the eigen-
values are the same, but the eigenvectors are different.
3. Some bad news: The eigenvalues of A + B are not the eigenvalues of A plus the eigenvalues of
B. Show this directly for A as in Example 21.3 and
" #
10 −9
B=
12 −11
That is, calculate the eigenvalues of B and the eigenvalues of A + B. Show that the eigenvalues
of A + B are not obtained by adding the eigenvalues of A with the eigenvalues of B.
4. However, if λ is an eigenvalue of A, then λ + 1 is an eigenvalue of A + I. Why?
5. More good news: The eigenvalues of AB are the same as the eigenvalues of BA. Why? (Hint:
If ABv = λv, then BABv = λBv.) Are the eigenvectors the same?
" #
cos θ − sin θ
6. The rotation matrix A = cannot have real eigenvalues for most θ. Why? Show
sin θ cos θ
that the eigenvalues are, in fact, λ = cos θ ± i sin θ.
7. Suppose that each year, 40% of iPhone users switch to Android. At the same time, 10% of
Android users switch to iPhone. We want to determine what happens in the long run. Let
xn be the fraction who use iPhone after n years and yn the fraction who prefer Android. (So
xn + yn = 1.) Construct the matrix that gives
" # " #
xn+1 x
=A n
yn+1 yn
Find xn and yn and determine the limit of

" #
k 1
A as k → ∞
0
8. Explain why, if A is a matrix whose columns add to 1, then λ = 1 is an eigenvalue. Thus, for
the 2 × 2 Markov matrices, one can determine both eigenvalues just by calculating the trace.
22 DIAGONALIZING A MATRIX 131
22 Diagonalizing a matrix
One reason eigenvectors are important is that they form the ‘best’ basis for a given matrix. Suppose
A has LI eigenvectors v1 , . . . , vn , and eigenvalues λ1 , . . . , λn . That is,
Avi = λi vi , i = 1, . . . , n
Let’s make a matrix X with the eigenvectors as the columns:

 
| | |
X = v1 v2 · · · vn 
 
| | |
Now multiply on the left by A:

h i
AX = A v1 v2 · · · vn
h i
= Av1 Av2 · · · Avn
h i
= λ1 v1 λ2 v2 · · · λn vn
 
λ1
h i
 λ2 

= v1 v2 · · · vn  .. 

 . 

λn
= XΛ
where Λ is the diagonal matrix with the eigenvalues on the diagonal. In other words, we have
AX = XΛ
Now, since the eigenvectors are LI, X is invertible. Thus, we can multiply on the left by X −1 to get
a diagonal matrix:
X −1 AX = Λ
We have diagonalized the matrix! We can also multiply on the right to get A in terms of Λ:
A = XΛX −1
" #
0 −3
Example 22.1. Diagonalize A = .
−1 2
To see why the eigenvectors are the ‘best basis’, suppose we wish to solve Ax = b. Change basis by
letting
x = Xy and b = Xc
Now you are in the basis of eigenvectors. Then
AXy = Xc ⇒ Λy = c
The system of equations becomes diagonal (and easy to solve) in the basis of eigenvectors. Of course,
you would never want to solve a system of linear equations this way. But, it illustrates the point that
in the basis of eigenvectors, systems become diagonalized. We will use this idea to solve systems of
differential equations.
Powers of a matrix
Generally, computing the powers of a matrix can be cumbersome. There is one case where it is easy,
though:
dk1
   
d1

d2  
dk2 
 is diagonal, then D k = 
   
Example 22.2. Show that if D =  .. .. .

 . 


 . 

dn dkn
If we have diagonalization, we can use this to compute the powers of A, since if A = XΛX −1 ,
Ak = (XΛX −1 )k
= (XΛX −1 )(XΛX −1 ) · · · (XΛX −1 )
| {z }
k times
= XΛX −1 XΛX −1
XΛX −1 · · · XΛX −1
= XΛk X −1
 k
λ1

=X
 ..  −1
X
.
λkn
" #18
3 −4
Example 22.3. Compute
2 −3
Example 22.4. Consider the Markov matrix A from Example 21.6,

" #
.8 .1
A=
.2 .9
Compute the powers of A, and the limiting power
A∞ = lim An
n→∞
Diagonalizability
Is every matrix diagonalizable? Alas, no, as this example shows.

" #
0 1
Example 22.5. Show that is not diagonalizable.
0 0
Recall that we could diagonalize, and write A = XΛX −1 , as long as we could find the invertible
X made up of eigenvectors of A. A will be diagonalizale, then, as long as we can find enough LI
eigenvectors. In fact, for an n × n matrix A we need n of them.
A matrix A is diagonalizable if and only if it has n LI eigenvectors. If A is

not diagonalizable it is called defective.
The problem with the matrix in Example 22.5 is that it has only one eigenvector (v = (1, 0)). But, a
matrix can
" have# only one eigenvalue and still be diagonalizable, as long as it has enough eigenvectors,
0 0
e.g. A = .
0 0
How can we tell when a matrix is diagonalizable? There is one case, which is actually the generic
case, when we can do it.
Theorem 22.1. Eigenvectors corresponding to distinct eigenvalues are LI.
Proof.
Corollary 22.1. If A has n distinct eigenvalues, then it is diagonalizable.
When A doesn’t have n distinct eigenvalues, it may still be diagonalizable as long as it has enough
LI eigenvectors. To tell how many we need for each eigenvalue, we need the concept of multiplicity.
There are two kinds of multiplicity.
Definition 22.1. Let p(λ) = det(A − λI) be the characteristic polynomial. Then the eigenvalues
are the roots of this polynomial. From the fundamental theorem of algebra, we know that we can
factor p into a product of factors:
p(λ) = (λ1 − λ)n1 (λ2 − λ)n2 · · · (λp − λ)np
where n1 + n2 + · · · + np = n and the λi are distinct. The power ni on the (λi − λ)ni term is called
the algebraic multiplicity of the eigenvalue λi .
Definition 22.2. For each eigenvalue, the eigenvectors satisfy (A − λi )v = 0. There may be one or
more eigenvectors for a given eigenvalue, but there cannot be more than the algebraic multiplicity.
The number of LI eigenvectors for λi is called the geometric multiplicity of λi . The geometric
multiplicity is the dimension of the nullspace of A − λi I, called the eigenspace. In other words,
geometric multiplicity of λi = dim [N (A − λi I)]
Now, there will be enough eigenvectors as long as the number of LI eigenvectors is the same as the
algebraic multiplicity. In other words,
A is diagonalizable if and only if, for each eigenvalue, the geometric

multiplicity equals the algebraic multiplicity.
Example 22.6. Both A and B have eigenvalues λ = 0 and λ = 2. One is diagonalizable, the other
is not. Determine which is which.
   
1 1 0 2 0 0
A = −2 1 3 , B = 1 1 −1
   
−1 1 2 1 −1 1
Exercises
1. TRUE or FALSE (with reasons): If the eigenvalues of a 3 × 3 matrix A are 1, 3, then A is
(a) invertible (b) diagonalizable (c) defective
2. TRUE or FALSE (with reasons): If the eigenvalues of a 3 × 3 matrix A are 0, 1, 3, then A is

(a) invertible (b) diagonalizable (c) defective
3. Diagonalize A and A−1 for " #

3 0
A=
1 2
4. Which of these matrices is diagonalizable? Which is invertible?

   
1 1 0 2 1 0
A = 1 1 0 B = 0 2 0
   
0 0 2 0 0 1
" #1024
3 2
5. Show that = I.
−5 −3
6. Suppose P is a matrix satisfying P 2 = P . Such a matrix is called a projection matrix. (Not to

be confused with a permutation matrix.)
(a) Show that λ = 1 is an eigenvalue. Hint: The columns of P are eigenvectors. Hence
every vector in the column space is an eigenvector. If rank(P ) = r, what is the geometric
multiplicity of the eigenvalue λ = 1?
(b) If P is invertible, then P is diagonalizable by part (a). If P is not invertible, λ = 0 is an
eigenvalue. What is the geometric multiplicity of the eigenvalue λ = 0 in this case?
(c) Conclude that λ = 0 and λ = 1 are the only eigenvalues, and that P is always diagonal-
izable.
7. Diagonalize A and compute XΛk X −1 to prove the following formula for Ak :

" # " #
2 1 2k 3k − 2k
k
A= , A =
0 3 0 3k
8. Suppose A = XΛX −1 is diagonalizable, and let p(λ) = (λ1 − λ)(λ2 − λ) · · · (λn − λ) be the
characteristic polynomial. Subsitute A into this polynomial to get
p(A) = (λ1 I − A)(λ2 I − A) · · · (λn I − A) = 0
the zero matrix. A matrix satisfies its own characteristic equation. This fact is called the
Cayley-Hamilton Theorem. Note that this holds even if A is not diagonalizable, but is a bit
more technical to prove in the defective case.
23 SYSTEMS OF DIFFERENTIAL EQUATIONS 137
23 Systems of differential equations

In this section we will apply the ideas of the previous sections to the study of linear systems of
differential equations. These are equations of the form
y10 = a y1 + b y2
y20 = c y1 + d y2
The methods of this section generalize naturally to systems of 3, 4, etc. equations. Notice that we
can write this as a matrix-vector equation:
" # " #" #
d y1 a b y1
= , or y0 = Ay
dt y2 c d y2
The key to solving y0 = Ay is to find the eigenvalues and eigenvectors of A. Suppose we have them:
Avi = λi vi
h i
Let X = v1 v2 be the eigenvector matrix. We change bases to the basis of eigenvectors:
y = Xz, or y = z1 v1 + z2 v2
Then, since X −1 AX = Λ, the diagonal matrix of eigenvalues, we have
Xz0 = AXz ⇒ z0 = X −1 AXz ⇒ z0 = Λz
Now, notice that z0 = Λz is, in components,
z10 = λ1 z1
z20 = λ2 z2
The equations have been decoupled! Each of these is easily solved:
z1 = c1 eλ1 t , z2 = c2 eλ2 t
Then we can recover the solution of the original equation by using y = Xz:
y = c1 eλ1 t v1 + c2 eλ2 t v2 (23.1)
The constants c1 , c2 are determined by the initial conditions y(0).
Thus, the problem of solving a linear system of DEs is reduced to finding the eigenvalues and
eigenvectors. Let’s look at the different cases that can occur.
" #
−2 1
Example 23.1. Real, negative eigenvalues – exponential decay. Solve y0 = y. Find the
1 −2
" #
6
general solution, and the solution with y(0) = .
2
" #
−2 1
Example 23.2. Complex eigenvalues – spiral decay. Solve y0 = y. Find the solution
−1 −2
" #
6
with y(0) = .
2
" #
0 1
Example 23.3. Imaginary eigenvalues – periodic solutions. Solve y0 = y
−1 0
Conservative, stable and unstable motion
The last example illustrates an important principle. If the eigenvalues are pure imaginary, solutions
are periodic. Given any initial condition, the solution comes back exactly to that condition and then
starts again. This kind of motion is called conservative, since it conserves lengths over time.
The first two examples above are examples where the origin (0, 0) is stable. Regardless of the initial
condition, y tends to the origin in those cases as t → ∞. This is stable motion, and will occur
whenever the real parts of the eigenvalues are both negative.
In fact, we can tell the qualitative behavior of a (two dimensional) system without even calculating
the eigenvalues. Since we know the solution (23.1) in terms of the eigenvalues, we just need to know
if they are real or complex, and whether the real parts are negative or positive in order to tell if we
have stable, conservative, or unstable motion. Recall that, for a 2 × 2 matrix A, the eigenvalues are
given in terms of the trace T and determinant D by
√
T ± T 2 − 4D
λ1,2 =
2
If D > 0, the sign of the real parts of the eigenvalues is the same as the sign of the trace T . If D < 0,
both eigenvalues will be real: one will be negative and the other will be positive. Thus, we have
T <0 stable
D>0 T >0 unstable
T =0 conservative
D<0 saddle (unstable)
We get oscillations if we have an imaginary component of the eigenvalue, which occurs when T 2 −
4D < 0. Thus, we have a more complete picture by locating the trace and determinant in the
trace–determinant plane:
Higher order equations to first order systems
In §5 we studied 2nd order equations of the form

y 00 + By 0 + Cy = 0 (23.2)
We can convert this 2nd order equation into a system of two first order equations by letting y1 = y
and y2 = y 0 . Then
y10 = y2 , and y20 = y 00 = −By 0 − Cy = −By2 − Cy1
so we have the system " # " #" #
d y1 0 1 y1
=
dt y2 −C −B y2
Recall that when B, C > 0, the equation (23.2) models the mass-spring system. C represents the
spring stiffness and B is the damping coefficient. The trace and determinant are
tr(A) = −B, det(A) = C
Since the determinant is positive, as long as B > 0 (there is some damping), the origin is stable.
Trajectories tend toward zero. The damping ensures that the motion eventually settles to zero.
We can use the trace–determinant relation to determine the boundaries between underdamped and
overdamped motion:
Underdamping B 2 < 4C
Critical damping B 2 = 4C
Overdamping B 2 > 4C
The same idea will work for any nth order linear equation. We can always convert it into a system
of n first order equations. For example, the third order equation
y 000 + By 00 + Cy 0 + Dy = 0
can be written as
    
y 0 1 0 y1
d  1 
y1 = y, y2 = y 0 , y3 = y 00 , y2  =  0 0 1  y2 
 
dt
y3 −D −C −B y3
Exercises
1. Find the general solution y "= c1 e#λ1 t v1 + c2 eλ2 t v2 and the solution to the initial value problem
3 1
with y(0) = (1, 2), for y0 = y.
3 5
" #
0 a
2. Find the solutions to y0= y. Assume a, b > 0. In one direction the solution grows,
b 0
and in the other direction the solution shrinks. On which line do the solutions tend toward as
t → ∞?
" #
a a
3. Find the solutions to y0 = y. Assume a + b 6= 0. Can the motion be stable?
b b
4. Here is a case (the simplest case) where we don’t have two LI eigenvectors:
" #
0 0 1
y = y
0 0
" #
1
One eigenvector/eigenvalue pair gives us one solution y(t) = c1 . To find the other one, use
0
the equations: y10 = y2 , y20 = 0. Solve the second and substitute it into the first. Find the
general solution by taking linear combinations.
5. For the given matrices, determine if the motion is stable, unstable or conservative.
" # " # " #
−2 −3 −2 3 0 −3
A= , B= , C=
−4 −5 −4 −5 3 0
6. Two large rooms initially contain 40 and 10 people, respectively. A door opens between the two
rooms, and people tend to move from the more crowded room to the less crowded. Let y1 be the
number in the first foom and y2 the number in the second room. Then the movement between
the rooms is proportional to the difference y1 − y2 . Assume the proportionality constant is one,
so that
dy1 dy2
= y2 − y1 , and = y1 − y2
dt dt
(a) Show that the total y1 + y2 is constant (50 people).
(b) Find the matrix A in y0 = Ay, and its eigenvalues and eigenvectors.
(c) Find the solution with y1 (0) = 40, y2 (0) = 10. What are y1 and y2 at t = 1 and t = ∞?
7. A real 3 × 3 matrix A will give stable motion in y0 = Ay if the real parts of the eigenvalues are
negative. Use the fact that the trace is the sum of the eigenvalues and the determinant is the
product to show that if we have stable motion, then tr(A) < 0 and det(A) < 0. However, the
converse is not true. Find an example where tr(A) < 0 and det(A) < 0, but the motion is not
stable.
24 THE EXPONENTIAL OF A MATRIX AND SOLUTIONS WITH INPUTS 142
24 The exponential of a matrix and solutions with inputs

Now we turn to the problem of solving a system with inputs:
y0 = Ay + q(t) (24.1)
As before, we will find that the general solution is the sum of the null solution and a particular
solution. The null solution is the solution of the homogeneous problem y0 = Ay, which we did in the
last section. The question, then, is how to find the particular solution.
Recall the case when A = a is just a scalar, which we studied in §2:
Z t
0 at
y = ay + q(t), ⇒ y(t) = e y(0) + ea(t−s) q(s) ds
0
In fact, the solution of equation (24.1) is exactly the same! We just have to change a to A:
Z t
y(t) = eAt y(0) + eA(t−s) q(s) ds
0
We only have to figure out what is meant by eAt , i.e. the exponential of a matrix. To do this, let’s
recall the series for ex :
x2 x3
ex = 1 + x + + + ···
2 3!
The exponential of a matrix is defined the same way:
1 1
Matrix exponential: eAt = I + At + (At)2 + (At)3 + · · ·
2 3!
∞
X 1
= (At)n
n=0
n!
Notice that eAt is another matrix of the same size as A. From this, right away, we see that e0 = I,
the exponential of the zero matrix is the identity matrix. Other key properties are:
d At
1. e = AeAt , 2. eAt eAs = eA(t+s) (24.2)
dt
The first property follows from

d At d 1 1

e = I + At + (At)2 + (At)3 + · · ·
dt dt 2 3!
2 1 3 2
= A + A t + A t + ···
2
1

= A I + At + (At)2 + · · ·
2
At
= Ae
The second property follows from

1 1 1 1

At As
e e = I + At + (At)2 + (At)3 + · · · I + As + (As)2 + (As)3 + · · ·
2 3! 2 3!
1 1
= I + At + As + (At)2 + (As)2 + (At)(As) + · · ·
2 2
1
= I + A(t + s) + (A(t + s))2 + · · ·
2
A(t+s)
=e
(The thing that makes the last part work is that (At)(As) = (As)(At). Generally eA+B is not the
same as eA eB , since generally A and B do not commute.)
Property 1 implies that
y(t) = eAt y(0) solves y0 = Ay
Property 2 implies that
The inverse of eAt is e−At .
How to find the exponential of a matrix
Suppose A is diagonalizable, so A = XΛX −1 . Then Ak = XΛk X −1 , so

∞
X 1
eAt = (At)k
k=0
k!
∞
1
X(Λt)k X −1
X
=
k=0
k!
∞
1
(Λt)k X −1
X
=X
k=0
k!
= XeΛt X −1
where
P 1
(λ1 t)k e λ1 t
  
 k! P 1 k
∞
X 1 k! (λ2 t)
 
eλ2 t 
eΛt = (Λt)k = 
   
=
k! .. .. 
k=0

 .  
  . 

P 1 k
k! (λn t) eλn t
So if we know the eigenvalues and eigenvectors, we have the exponential by
eλ1 t
 
At

 e λ2 t 
 −1
e =X .. X

 . 

e λn t
Example 24.1. The matrix arising in the undamped harmonic oscillator y 00 + ω 2 y = 0 is

" #
0 1
A=
−ω 2 0
" # " #
1 1
The eigenvalues and eigenvectors are λ1 = iω, v1 = , and λ2 = −iω, v2 = . Therefore,
iω −iω
" #" #" #−1
At 1 1 eiωt 0 1 1
e = −iωt
iω −iω 0 e iω −iω
" #" # " #
1 1 eiωt 0 1 1 −i/ω
= −iωt
iω −iω 0 e 2 1 i/ω
" #
cos(ωt) sin(ωt)/ω
=
−ω sin(ωt) cos(ωt)
Example 24.2. Solve the harmonic oscillator
y 00 + 4y = 0, y(0) = 3, y 0 (0) = 8
Now we can turn to the linear system with a source term.
Z t
y(t) = e At
y(0) + eA(t−s) q(s) ds solves y0 = Ay + q(t) (24.3)
0
Reason:
Example 24.3. Solve the forced harmonic oscillator y 00 + y = 3 cos(ωf t).

(Hint: cos A sin B = 21 sin(B − A) + 12 sin(A + B).)
Exercises
1. Here is an example of a matrix" that
# is not diagonalizable,
" # but the matrix exponential can be
0 1 0 0
found fairly easily. Take A = . Then A2 = . Find eAt , and therefore the solution
0 0 0 0
of y0 = Ay. Compare this result with the solution you found in §23 #4.
" # " #
1 b et b(et − 1)
2. Let A = . Compute An . Use this to find eAt = using the series definition.
0 0 0 1
3. Take " # " #

1 3 1 −3
A= , B=
0 0 0 0
Use Problem 2 to compute eA and eB . Compute eA+B , and then show that eA eB , eB eA and
eA+B are all different.
4. (a) Solve " # " #" # " #

y10 1 1 y1 2
= +
y20 0 3 y2 3
by finding the eigenvectors and eigenvalues, computing the matrix exponential, and using
the formula (24.3).
(b) Try this other approach on the same system. For y0 = Ay + q, if we have A = SΛS −1 ,
we can let y = Sz, which is putting the system in the basis of eigenvectors. The equation
for z becomes
z10 = λ1 z1 + r1
z0 = Λz + r or
z20 = λ2 z2 + r2
where r = S −1 q. This decouples the system. Then you can solve for z1 , z2 separately and
recover the solution by using y = Sz. Verify that this gives you the same solution as in
part (a).
5. Consider the “room problem”, Exercise 6 from §23. Suppose we introduce some flow into the
first room. Then we have the following system.
dy1 dy2
= y2 − y1 + 1, and = y1 − y2
dt dt
Solve this with the same initial conditions as in problem #6. What happens in the long run?

Math 250B Lectures Notes

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Math 250B Lectures Notes

Caricato da

Copyright:

Formati disponibili

Math 250B Intro to Differential Equations and Linear Algebra Fall 2019

Course structure & lecture notes

1. First order differential equations

II. Linear algebra

2 First order linear differential equations 8

3 Models of growth and decay 14

4 The harmonic oscillator 18

5 Constant coefficient 2nd order differential equations 23

6 Undetermined coefficients & variation of parameters 28

7 Systems of linear equations & matrix-vector multiplication 36

9 Solving systems of linear equations by elimination 48

11 Symmetric and orthogonal matrices 58

15 The complete solution to Ax = b 86

16 Linear independence, basis and dimension 90

17 The fundamental theorem of linear algebra 98

19 Least squares 113

20 Gram-Schmidt orthogonalization 119

21 Introduction to eigenvalues & eigenvectors 124

22 Diagonalizing a matrix 131

23 Systems of differential equations 137

24 The exponential of a matrix and solutions with inputs 142

1 Differential equations and where they come from

Where differential equations come from

Newton’s (2nd ) Law: F = ma

Example 1.5. Solve the Initial Value Problem (IVP)

The orthogonal projection problem is the following: Given a family of curves

find another family of curves

Example 1.8. Find the set of curves orthogonal to the circles.

2 First order linear differential equations

Homogeneous, particular and complete solutions

ycomplete (t) = null + particular

1. Constant source q(t) = q

Integrating factors for finding particular solutions

To this end, to solve

IMPORTANT CASE: a is constant, y 0 + ay = q(t)

When we solve for y(t), we get

Example 2.3. Sinusoidal source. What is the response to an oscillating input?

e cos(ωs) ds = √ 21 eas sin(ωs + φ) + C, where φ = tan−1 (a/ω).

y 0 + 2y = 3δ(t − 1), y(0) = 4

yn (t) = Ce−t and yp (t) = 2(e3t − 1)

3. Solve y 0 + 2y = 3e−4t + 5, starting with y(0) = 0. What happens as t → ∞?

4. Solve and sketch the solution.

(a) y 0 + y = 2e−t , y(0) = 3

3 Models of growth and decay

Example 3.2. Show that if the doubling time is t2 , then

Example 3.3. Show that if t1/2 is the half-life, then

Example 3.5. A tank contains 8 L of water in which is dissolved 32 g of a chemical. A solution

flow in = (chemical concentration in) × (rate of flow in)

flow out = (chemical concentration) × (rate of flow of mixture out)

This has the property:

So, when y is small, y will grow; when y is large, y will decline.

4. A tank contains 2 L of water in which is dissolved 4 g of a chemical. A solution containing 1

5. A tank contains 5 L of water in which is dissolved 10 g of salt. A solution containing 4 g/L of

4 The harmonic oscillator

Newton’s law and the harmonic oscillator

As we have seen, the general solution of the harmonic oscillator is

y(t) = B cos(ωt) + C sin(ωt)

Inputs and responses

Newton’s Law then gives us the equation

null solution yn (t): solution of hom. eqn.: m y 00 + k y = 0

Then the complete solution is null + particular:

ycomplete (t) = yn (t) + yp (t)

The simple pendulum

The acceleration in the tangential direction is