Linear Algebra WWL Chen

LINEAR ALGEBRA
W W L CHEN
c W W L Chen, 1982, 2008.
This chapter originates from material used by the author at Imperial College, University of London, between 1981 and 1990.
It is available free to all individuals, on the understanding that it is not to be used for nancial gain,
and may be downloaded and/or photocopied, with or without permission from the author.
However, this document may not be kept on any information storage and retrieval system without permission
from the author, unless such system is not accessible to any individuals other than its owners.
Chapter 1
LINEAR EQUATIONS
1.1. Introduction
Example 1.1.1. Try to draw the two lines
3x + 2y = 5,
6x + 4y = 5.
It is easy to see that the two lines are parallel and do not intersect, so that this system of two linear
equations has no solution.
3x + 2y = 5,
x + y = 2.
It is easy to see that the two lines are not parallel and intersect at the point (1, 1), so that this system
of two linear equations has exactly one solution.
3x + 2y = 5,
6x + 4y = 10.
It is easy to see that the two lines overlap completely, so that this system of two linear equations has
innitely many solutions.
Chapter 1 : Linear Equations page 1 of 31
Linear Algebra c W W L Chen, 1982, 2008
In these three examples, we have shown that a system of two linear equations on the plane R
2
may
have no solution, one solution or innitely many solutions. A natural question to ask is whether there
can be any other conclusion. Well, we can see geometrically that two lines cannot intersect at more than
one point without overlapping completely. Hence there can be no other conclusion.
In general, we shall study a system of m linear equations of the form
a
11
x
1
+ a
12
x
2
+. . . + a
1n
x
n
= b
1
,
a
21
x
1
+ a
22
x
2
+. . . + a
2n
x
n
= b
2
,
.
.
.
a
m1
x
1
+a
m2
x
2
+. . . +a
mn
x
n
= b
m
,
(1)
with n variables x
1
, x
2
, . . . , x
n
. Here we may not be so lucky as to be able to see geometrically what is
going on. We therefore need to study the problem from a more algebraic viewpoint. In this chapter, we
shall conne ourselves to the simpler aspects of the problem. In Chapter 6, we shall study the problem
again from the viewpoint of vector spaces.
If we omit reference to the variables, then system (1) can be represented by the array
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
a
m1
a
m2
. . . a
mn
b
1
b
2
.
.
.
b
m
(2)
of all the coecients. This is known as the augmented matrix of the system. Here the rst row of the
array represents the rst linear equation, and so on.
We also write Ax = b, where
A =
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
a
m1
a
m2
. . . a
mn
and b =
b
1
b
2
.
.
.
b
m
represent the coecients and

x =
x
1
x
2
.
.
.
x
n
represents the variables.

Example 1.1.4. The array
1 3 1 5 1
0 1 1 2 1
2 4 0 7 1
5
4
3
(3)
represents the system of 3 linear equations
x
1
+ 3x
2
+x
3
+ 5x
4
+x
5
= 5,
x
2
+x
3
+ 2x
4
+x
5
= 4,
2x
1
+ 4x
2
+ 7x
4
+x
5
= 3,
(4)
with 5 variables x
1
, x
2
, x
3
, x
4
, x
5
. We can also write
1 3 1 5 1
0 1 1 2 1
2 4 0 7 1
x
1
x
2
x
3
x
4
x
5
5
4
3
.
1.2. Elementary Row Operations
Let us continue with Example 1.1.4.
Example 1.2.1. Consider the array (3). Let us interchange the rst and second rows to obtain
0 1 1 2 1
1 3 1 5 1
2 4 0 7 1
4
5
3
.
Then this represents the system of equations
x
2
+x
3
+ 2x
4
+x
5
= 4,
x
1
+ 3x
2
+x
3
+ 5x
4
+x
5
= 5,
2x
1
+ 4x
2
+ 7x
4
+x
5
= 3,
(5)
essentially the same as the system (4), the only dierence being that the rst and second equations have
been interchanged. Any solution of the system (4) is a solution of the system (5), and vice versa.
Example 1.2.2. Consider the array (3). Let us add 2 times the second row to the rst row to obtain
1 5 3 9 3
0 1 1 2 1
2 4 0 7 1
13
4
3
.
x
1
+ 5x
2
+ 3x
3
+ 9x
4
+ 3x
5
= 13,
x
2
+ x
3
+ 2x
4
+ x
5
= 4,
2x
1
+ 4x
2
+ 7x
4
+ x
5
= 3,
(6)
essentially the same as the system (4), the only dierence being that we have added 2 times the second
equation to the rst equation. Any solution of the system (4) is a solution of the system (6), and vice
versa.
Example 1.2.3. Consider the array (3). Let us multiply the second row by 2 to obtain
1 3 1 5 1
0 2 2 4 2
2 4 0 7 1
5
8
3
.
x
1
+ 3x
2
+ x
3
+ 5x
4
+ x
5
= 5,
2x
2
+ 2x
3
+ 4x
4
+ 2x
5
= 8,
2x
1
+ 4x
2
+ 7x
4
+ x
5
= 3,
(7)
essentially the same as the system (4), the only dierence being that the second equation has been
multiplied through by 2. Any solution of the system (4) is a solution of the system (7), and vice versa.
In the general situation, it is not dicult to see the following.
PROPOSITION 1A. (ELEMENTARY ROW OPERATIONS) Consider the array (2) corresponding
to the system (1).
(a) Interchanging the i-th and j-th rows of (2) corresponds to interchanging the i-th and j-th equations
in (1).
(b) Adding a multiple of the i-th row of (2) to the j-th row corresponds to adding the same multiple of
the i-th equation in (1) to the j-th equation.
(c) Multiplying the i-th row of (2) by a non-zero constant corresponds to multiplying the i-th equation
in (1) by the same non-zero constant.
In all three cases, the collection of solutions to the system (1) remains unchanged.
Let us investigate how we may use elementary row operations to help us solve a system of linear
equations. As a rst step, let us continue again with Example 1.1.4.
Example 1.2.4. Consider again the system of linear equations
x
1
+ 3x
2
+x
3
+ 5x
4
+x
5
= 5,
x
2
+x
3
+ 2x
4
+x
5
= 4,
2x
1
+ 4x
2
+ 7x
4
+x
5
= 3,
(8)
represented by the array
1 3 1 5 1
0 1 1 2 1
2 4 0 7 1
5
4
3
. (9)
Let us now perform elementary row operations on the array (9). At this point, do not worry if you do
not understand why we are taking the following steps. Adding 2 times the rst row of (9) to the third
row, we obtain
1 3 1 5 1
0 1 1 2 1
0 2 2 3 1
5
4
7
.
From here, we add 2 times the second row to the third row to obtain
1 3 1 5 1
0 1 1 2 1
0 0 0 1 1
5
4
1
. (10)
Next, we add 3 times the second row to the rst row to obtain
1 0 2 1 2
0 1 1 2 1
0 0 0 1 1
7
4
1
.
Next, we add the third row to the rst row to obtain
1 0 2 0 1
0 1 1 2 1
0 0 0 1 1
6
4
1
.
Finally, we add 2 times the third to row to the second row to obtain
1 0 2 0 1
0 1 1 0 1
0 0 0 1 1
6
2
1
. (11)
We remark here that the array (10) is said to be in row echelon form, while the array (11) is said to be
in reduced row echelon form precise denitions will follow in Sections 1.51.6. Let us see how we may
solve the system (8) by using the arrays (10) or (11). First consider (10). Note that this represents the
system
x
1
+ 3x
2
+x
3
+ 5x
4
+x
5
= 5,
x
2
+x
3
+ 2x
4
+x
5
= 4,
x
4
+x
5
= 1.
(12)
First of all, take the third equation
x
4
+x
5
= 1.
If we let x
5
= t, then x
4
= 1 t. Substituting these into the second equation, we obtain (you must do
the calculation here)
x
2
+x
3
= 2 +t.
If we let x
3
= s, then x
2
= 2 +t s. Substituting all these into the rst equation, we obtain (you must
do the calculation here)
x
1
= 6 +t + 2s.
Hence
x = (x
1
, x
2
, x
3
, x
4
, x
5
) = (6 +t + 2s, 2 +t s, s, 1 t, t)
is a solution of the system (12) for every s, t R. In view of Proposition 1A, these are also precisely the
solutions of the system (8). Alternatively, consider (11) instead. Note that this represents the system
x
1
2x
3
x
5
= 6,
x
2
+ x
3
x
5
= 2,
x
4
+x
5
= 1.
(13)
First of all, take the third equation
x
4
+x
5
= 1.
If we let x
5
= t, then x
4
= 1 t. Substituting these into the second equation, we obtain (you must do
the calculation here)
x
2
+x
3
= 2 +t.
If we let x
3
= s, then x
2
= 2 +t s. Substituting all these into the rst equation, we obtain (you must
do the calculation here)
x
1
= 6 +t + 2s.
Hence
x = (x
1
, x
2
, x
3
, x
4
, x
5
) = (6 +t + 2s, 2 +t s, s, 1 t, t)
is a solution of the system (13) for every s, t R. In view of Proposition 1A, these are also precisely
the solutions of the system (8). However, if you have done the calculations as suggested, you will notice
that the calculation is easier for the system (13) than for the system (12). This is clearly a case of the
array (11) in reduced row echelon form having more 0s than the array (10) in row echelon form, so that
the system (13) has fewer non-zero coecients than the system (12).
1.3. Row Echelon Form
Definition. A rectangular array of numbers is said to be in row echelon form if the following conditions
are satised:
(1) The left-most non-zero entry of any non-zero row has value 1. These are called the pivot entries.
(2) All zero rows are grouped together at the bottom of the array.
(3) The pivot entry of a non-zero row occurring lower in the array is to the right of the pivot entry of
a non-zero row occurring higher in the array.
Next, we investigate how we may reduce a given array to row echelon form. We shall illustrate the
ideas by working on an example.
Example 1.3.1. Consider the array
0 0 5 0 15 5
0 2 4 7 1 3
0 1 2 3 0 1
0 1 2 4 1 2
.
Step 1: Locate the left-most non-zero column and cover all columns to the left of this column (in our
illustration here, denotes an entry that has been covered). We now have
0 5 0 15 5
2 4 7 1 3
1 2 3 0 1
1 2 4 1 2
.
Step 2: Consider the part of the array that remains uncovered. By interchanging rows if necessary,
ensure that the top-left entry is non-zero. So let us interchange rows 1 and 4 to obtain
1 2 4 1 2
2 4 7 1 3
1 2 3 0 1
0 5 0 15 5
.
Step 3: If the top entry on the left-most uncovered column is a, then we multiply the top uncovered row
by 1/a to ensure that this entry becomes 1. So let us divide row 1 by 1 to obtain
1 2 4 1 2
2 4 7 1 3
1 2 3 0 1
0 5 0 15 5
!
Step 4: We now try to make all entries below the top entry on the left-most uncovered column zero.
This can be achieved by adding suitable multiples of row 1 to the other rows. So let us add 2 times
row 1 to row 2 to obtain
1 2 4 1 2
0 0 1 1 1
1 2 3 0 1
0 5 0 15 5
.
Then let us add 1 times row 1 to row 3 to obtain
1 2 4 1 2
0 0 1 1 1
0 0 1 1 1
0 5 0 15 5
.
Step 5: Now cover the top row. We then obtain

0 0 1 1 1
0 0 1 1 1
0 5 0 15 5
.
Step 6: Repeat Steps 15 on the uncovered array, and as many times as necessary so that eventually the
whole array gets covered. So let us continue. Following Step 1, we locate the left-most non-zero column
and cover all columns to the left of this column. We now have

0 1 1 1
0 1 1 1
5 0 15 5
.
Following Step 2, we interchanging rows if necessary to ensure that the top-left entry is non-zero. So let
us interchange rows 1 and 3 (here we do not count any covered rows) to obtain

5 0 15 5
0 1 1 1
0 1 1 1
.
Following Step 3, we multiply the top row by a suitable number to ensure that the top entry on the
left-most uncovered column becomes 1. So let us multiply row 1 by 1/5 to obtain

1 0 3 1
0 1 1 1
0 1 1 1
.
Following Step 4, we do nothing! Following Step 5, we cover the top row. We then obtain

0 1 1 1
0 1 1 1
.
Following Step 1, we locate the left-most non-zero column and cover all columns to the left of this
column. We now have

1 1 1
1 1 1
.
Following Step 2, we do nothing! Following Step 3, we multiply the top row by a suitable number to
ensure that the top entry on the left-most uncovered column becomes 1. So let us multiply row 1 by 1
to obtain

1 1 1
1 1 1
.
Following Step 4, we now try to make all entries below the top entry on the left-most uncovered column
zero. So let us add row 1 to row 2 to obtain

1 1 1
0 0 0
.
Following Step 5, we cover the top row. We then obtain

0 0 0
.
Following Step 1, we locate the left-most non-zero column and cover all columns to the left of this
column. We now have
.
Step . Uncover everything! We then have
0 1 2 4 1 2
0 0 1 0 3 1
0 0 0 1 1 1
0 0 0 0 0 0
,
in row echelon form.
In practice, we do not actually cover any entries of the array, so let us repeat here the same argument
without covering anything the reader is advised to compare this with the earlier discussion. We start
with the array
0 0 5 0 15 5
0 2 4 7 1 3
0 1 2 3 0 1
0 1 2 4 1 2
.
Interchanging rows 1 and 4, we obtain
0 1 2 4 1 2
0 2 4 7 1 3
0 1 2 3 0 1
0 0 5 0 15 5
.
Adding 2 times row 1 to row 2, and adding 1 times row 1 to row 3, we obtain
0 1 2 4 1 2
0 0 0 1 1 1
0 0 0 1 1 1
0 0 5 0 15 5
.
0 1 2 4 1 2
0 0 5 0 15 5
0 0 0 1 1 1
0 0 0 1 1 1
.
Multiplying row 1 by 1/5, we obtain
0 1 2 4 1 2
0 0 1 0 3 1
0 0 0 1 1 1
0 0 0 1 1 1
.
Multiplying row 3 by 1, we obtain
0 1 2 4 1 2
0 0 1 0 3 1
0 0 0 1 1 1
0 0 0 1 1 1
.
Adding row 3 to row 4, we obtain
0 1 2 4 1 2
0 0 1 0 3 1
0 0 0 1 1 1
0 0 0 0 0 0
,
in row echelon form.
Remarks. (1) As already observed earlier, we do not actually physically cover rows or columns. In any
practical situation, we simply copy these entries without changes.
(2) The steps indicated inthe rst part of the last example are for guidance only. In practice, we
do not have to follow the steps above religiously, and what we do is to a great extent dictated by good
common sense. For instance, suppose that we are faced with the array
2 3 2 1
3 2 0 2
.
If we follow the steps religiously, then we shall multiply row 1 by 1/2. However, note that this will
introduce fractions to some entries of the array, and any subsequent calculation will become rather
messy. Instead, let us multiply row 1 by 3 to obtain
6 9 6 3
3 2 0 2
.
Then let us multiply row 2 by 2 to obtain
6 9 6 3
6 4 0 4
.
Adding 1 times row 1 to row 2, we obtain
6 9 6 3
0 5 6 1
.
In this way, we have avoided the introduction of fractions until later in the process. In general, if we start
with an array with integer entries, then it is possible to delay the introduction of fractions by omitting
Step 3 until the very end.
2 1 3 2 5
1 3 2 4 1
3 2 0 0 2
.
Try following the steps indicated in the rst part of the previous example religiously and try to see how
complicated the calculations get. On the other hand, we can modify the steps with some common sense.
First of all, we interchange rows 1 and 2 to obtain
1 3 2 4 1
2 1 3 2 5
3 2 0 0 2
.
The reason for taking this step is to put an entry 1 at the top left without introducing fractions anywhere.
When we next add multiples of row 1 to the other rows to make 0s below this 1, we do not introduce
fractions either. Now adding 2 times row 1 to row 2, we obtain
1 3 2 4 1
0 5 1 6 3
3 2 0 0 2
.
1 3 2 4 1
0 5 1 6 3
0 7 6 12 1
.
Next, multiplying row 2 by 7, we obtain
1 3 2 4 1
0 35 7 42 21
0 7 6 12 1
.
1 3 2 4 1
0 35 7 42 21
0 35 30 60 5
.
Note that here we are essentially covering up row 1. Also, we have multiplied rows 2 and 3 by suitable
multiples to that their leading non-zero entries are the same, in preparation for taking the next step
without introducing fractions. Now adding 1 times row 2 to row 3, we obtain
1 3 2 4 1
0 35 7 42 21
0 0 23 18 26
.
Here, the array is almost in row echelon form, except that the leading non-zero entries in rows 2 and 3
are not equal to 1. However, we can always multiply row 2 by 1/35 and row 3 by 1/23 if we want to
obtain the row echelon form
1 3 2 4 1
0 1 1/5 6/5 3/5
0 0 1 18/23 26/23
.
If this diers from the answer you got when you followed the steps indicated in the previous example
religiously, do not worry. row echelon forms are not unique!
1.4. Reduced Row Echelon Form
Definition. A rectangular array of numbers is said to be in reduced row echelon form if the following
conditions are satised:
(1) The left-most non-zero entry of any non-zero row has value 1. These are called the pivot entries.
(4) Each column containing a pivot entry has 0s everywhere else in the column.
We now investigate how we may reduce a given array to reduced row echelon form. Here, we basically
take an extra step to convert an array from row echelon form to reduced row echelon form. We shall
illustrate the ideas by continuing on an earlier example.
Example 1.4.1. Consider again the array
0 0 5 0 15 5
0 2 4 7 1 3
0 1 2 3 0 1
0 1 2 4 1 2
.
We have already shown in Example 1.3.1 that this array can be reduced to row echelon form
0 1 2 4 1 2
0 0 1 0 3 1
0 0 0 1 1 1
0 0 0 0 0 0
.
Step 1: Cover all zero rows at the bottom of the array. We now have
0 1 2 4 1 2
0 0 1 0 3 1
0 0 0 1 1 1

.
Step 2: We now try to make all the entries above the pivot entry on the bottom row zero (here again we
do not count any covered rows). This can be achieved by adding suitable multiples of the bottom row
to the other rows. So let us add 4 times row 3 to row 1 to obtain
0 1 2 0 3 2
0 0 1 0 3 1
0 0 0 1 1 1

.
Step 3: Now cover the bottom row. We then obtain
0 1 2 0 3 2
0 0 1 0 3 1

.
Step 4: Repeat Steps 23 on the uncovered array, and as many times as necessary so that eventually
the whole array gets covered. So let us continue. Following Step 2, we add 2 times row 2 to row 1 to
obtain
0 1 0 0 9 4
0 0 1 0 3 1

.
Following Step 3, we cover row 2 to obtain
0 1 0 0 9 4

.
Following Step 2, we do nothing! Following Step 3, we cover row 1 to obtain
.
Step . Uncover everything! We then have
0 1 0 0 9 4
0 0 1 0 3 1
0 0 0 1 1 1
0 0 0 0 0 0
,
in reduced row echelon form.
Again, in practice, we do not actually cover any entries of the array, so let us repeat here the same
argument without covering anything the reader is advised to compare this with the earlier discussion.
We start with the row echelon form
0 1 2 4 1 2
0 0 1 0 3 1
0 0 0 1 1 1
0 0 0 0 0 0
.
0 1 2 0 3 2
0 0 1 0 3 1
0 0 0 1 1 1
0 0 0 0 0 0
.
0 1 0 0 9 4
0 0 1 0 3 1
0 0 0 1 1 1
0 0 0 0 0 0
,
in reduced row echelon form.
1.5. Solving a System of Linear Equations
Let us rst summarize what we have done so far. We study a system (1) of m linear equations in n
variables x
1
, . . . , x
n
. If we omit reference to the variables, then the system (1) can be represented by
the array (2), with m rows and n + 1 columns. We next reduce the array (2) to row echelon form or
reduced row echelon form by elementary row operations.
By Proposition 1A, the system of linear equations represented by the array in row echelon form or
reduced row echelon form has the same solution set as the system (1). It follows that to solve the system
(1), it remains to solve the system represented by the array in row echelon form or reduced row echelon
form. We now describe a simple way to obtain all solutions of this system.
Definition. Any column of an array (2) in row echelon form or reduced row echelon form containing a
pivot entry is called a pivot column.
First of all, let us eliminate the situation when the system has no solutions. Suppose that the array
(2) has been reduced to row echelon form, and that this contains a row of the form
0 . . . 0

n
1
corresponding to the last column of the array being a pivot column. This row represents the equation
0x
1
+. . . + 0x
n
= 1;
clearly the system cannot have any solution.
Definition. Suppose that the array (2) in row echelon form or reduced row echelon form satises the
condition that its last column is not a pivot column. Then any variable x
i
corresponding to a pivot
column is called a pivot variable. All other variables are called free variables.
0 1 0 0 9
0 0 1 0 3
0 0 0 1 1
0 0 0 0 0
4
1
1
0
,
representing the system
x
2
9x
5
= 4,
x
3
+ 3x
5
= 1,
x
4
+ x
5
= 1.
Note that the zero row in the array represents an equation which is trivial! Here the last column of the
array is not a pivot column. Now columns 2, 3, 4 are the pivot columns, so that x
2
, x
3
, x
4
are the pivot
variables and x
1
, x
5
are the free variables.
To solve the system, we allow the free variables to take any values we choose, and then solve for the
pivot variables in terms of the values of these free variables.
Example 1.5.2. Consider the system of 4 linear equations
5x
3
+ 15x
5
= 5,
2x
2
+ 4x
3
+ 7x
4
+ x
5
= 3,
x
2
+ 2x
3
+ 3x
4
= 1,
x
2
+ 2x
3
+ 4x
4
+ x
5
= 2,
(14)
in the 5 variables x
1
, x
2
, x
3
, x
4
, x
5
. If we omit reference to the variables, then the system can be repre-
sented by the array
0 0 5 0 15
0 2 4 7 1
0 1 2 3 0
0 1 2 4 1
5
3
1
2
. (15)
As in Example 1.3.1, we can reduce the array (15) to row echelon form
0 1 2 4 1
0 0 1 0 3
0 0 0 1 1
0 0 0 0 0
2
1
1
0
, (16)
x
2
+ 2x
3
+ 4x
4
+ x
5
= 2,
x
3
+ 3x
5
= 1,
x
4
+ x
5
= 1.
(17)
Alternatively, as in Example 1.4.1, we can reduce the array (15) to reduced row echelon form
0 1 0 0 9
0 0 1 0 3
0 0 0 1 1
0 0 0 0 0
4
1
1
0
, (18)
x
2
9x
5
= 4,
x
3
+ 3x
5
= 1,
x
4
+ x
5
= 1.
(19)
By Proposition 1A, the three systems (14), (17) and (19) have exactly the same solution set. Now, we
observe from (16) or (18) that columns 2, 3, 4 are the pivot columns, so that x
2
, x
3
, x
4
are the pivot
variables and x
1
, x
5
are the free variables. If we assign values x
1
= s and x
5
= t, then we have, from
(17) (harder) or (19) (easier), that
(x
1
, x
2
, x
3
, x
4
, x
5
) = (s, 9t 4, 3t + 1, t + 1, t). (20)
It follows that (20) is a solution of the system (14) for every s, t R.
Example 1.5.3. Let us return to Example 1.2.4, and consider again the system (8) of 3 linear equations in
the 5 variables x
1
, x
2
, x
3
, x
4
, x
5
. If we omit reference to the variables, then the system can be represented
by the array (9). We can reduce the array (9) to row echelon form (10), representing the system (12).
Alternatively, we can reduce the array (9) to reduced row echelon form (11), representing the system
(13). By Proposition 1A, the three systems (8), (12) and (13) have exactly the same solution set. Now,
we observe from (10) or (11) that columns 1, 2, 4 are the pivot columns, so that x
1
, x
2
, x
4
are the pivot
variables and x
3
, x
5
are the free variables. If we assign values x
3
= s and x
5
= t, then we have, from
(12) (harder) or (13) (easier), that
(x
1
, x
2
, x
3
, x
4
, x
5
) = (6 +t + 2s, 2 +t s, s, 1 t, t). (21)
It follows that (21) is a solution of the system (8) for every s, t R.
Example 1.5.4. In this example, we do not bother even to reduce the matrix to row echelon form.
Consider the system of 3 linear equations
2x
1
+ x
2
+ 3x
3
+ 2x
4
= 5,
x
1
+ 3x
2
+ 2x
3
+ 4x
4
= 1,
3x
1
+ 2x
2
= 2,
(22)
1
, x
2
, x
3
, x
4
. If we omit reference to the variables, then the system can be represented
by the array
2 1 3 2
1 3 2 4
3 2 0 0
5
1
2
. (23)
As in Example 1.3.2, we can reduce the array (23) to the form
1 3 2 4
0 35 7 42
0 0 23 18
1
21
26
, (24)
x
1
+ 3x
2
+ 2x
3
+ 4x
4
= 1,
35x
2
+ 7x
3
+ 42x
4
= 21,
23x
3
+ 18x
4
= 26.
(25)
Note that the array (24) is almost in row echelon form, except that the pivot entries are not 1. By
Proposition 1A, the two systems (22) and (25) have exactly the same solution set. Now, we observe
from (24) that columns 1, 2, 3 are the pivot columns, so that x
1
, x
2
, x
3
are the pivot variables and x
4
is
the free variable. If we assign values x
4
= s, then we have, from (25), that
(x
1
, x
2
, x
3
, x
4
) =
16
23
s +
28
23
,
24
23
s
19
23
,
18
23
s +
26
23
, s
. (26)
It follows that (26) is a solution of the system (22) for every s R.
1.6. Homogeneous Systems
Consider a homogeneous system of m linear equations of the form
a
11
x
1
+ a
12
x
2
+. . . + a
1n
x
n
= 0,
a
21
x
1
+ a
22
x
2
+. . . + a
2n
x
n
= 0,
.
.
.
a
m1
x
1
+a
m2
x
2
+. . . +a
mn
x
n
= 0,
(27)
with n variables x
1
, x
2
, . . . , x
n
. If we omit reference to the variables, then system (27) can be represented
by the array
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
a
m1
a
m2
. . . a
mn
0
0
.
.
.
0
(28)
of all the coecients.
Note that the system (27) always has a solution, namely the trivial solution
x
1
= x
2
= . . . = x
n
= 0.
Indeed, if we reduce the array (28) to row echelon form or reduced row echelon form, then it is not
dicult to see that the last column is a zero column and so cannot be a pivot column.
On the other hand, if the system (27) has a non-trivial solution, then we can multiply this solution
by any non-zero real number dierent from 1 to obtain another non-trivial solution. We have therefore
proved the following simple result.
PROPOSITION 1B. The homogeneous system (27) either has the trivial solution as its only solution
or has innitely many solutions.
The purpose of this section is to discuss the following stronger result.
PROPOSITION 1C. Suppose that the system (27) has more variables than equations; in other words,
suppose that n > m. Then there are innitely many solutions.
To see this, let us consider the array (28) representing the system (27). Note that (28) has m rows,
corresponding to the number of equations. Also (28) has n + 1 columns, where n is the number of
variables. However, the column of (28) on the extreme right is a zero column, corresponding to the
fact that the system is homogeneous. Furthermore, this column remains a zero column if we perform
elementary row operations on the array (28). If we now reduce (28) to row echelon form by elementary
row operations, then there are at most m pivot columns, since there are only m equations in (27) and m
rows in (28). It follows that if we exclude the zero column on the extreme right, then the remaining n
columns cannot all be pivot columns. Hence at least one of the variables is a free variable. By assigning
this free variable arbitrary real values, we end up with innitely many solutions for the system (27).
1.7. Application to Network Flow
Systems of linear equations arise when we investigate the ow of some quantity through a network.
Such networks arise in science, engineering and economics. Two such examples are the pattern of trac
ow through a city and distribution of products from manufacturers to consumers through a network of
wholesalers and retailers.
A network consists of a set of points, called the nodes, and directed lines connecting some or all of the
nodes. The ow is indicated by a number or a variable. We observe the following basic assumptions:
The total ow into a node is equal to the total ow out of a node.
The total ow into the network is equal to the total ow out of the network.
Example 1.7.1. The picture below represents a system of one way streets in a particular part of some
city and the trac ow along the streets between the junctions:
Example 1.7.1. The picture below represents a system of one way streets in a particular part of some
city and the trac ow along the streets between the junctions:
x
1
200
200 A x
2
B 300
x
3
x
4
300 C x
5
D 500
400 300
We rst equate the total ow into each node with the total ow out of the same node:
node A: 200 +x
3
= x
1
+x
2
,
node B: 200 +x
2
= 300 +x
4
,
node C: 400 +x
5
= 300 +x
3
,
node D: 500 +x
4
= 300 +x
5
.
We then equate the total ow into and out of the network:
400 + 200 + 200 + 500 = 300 + 300 +x
1
+ 300.
These give rise to a system of 5 linear equations
x
1
+x
2
x
3
= 200,
x
2
x
4
= 100,
x
3
x
5
= 100,
x
4
x
5
= 200,
x
1
= 400,
1
, . . . , x
5
, with augmented matrix
1 1 1 0 0
0 1 0 1 0
0 0 1 0 1
0 0 0 1 1
1 0 0 0 0
200
100
100
200
400
.
This has reduced row echelon form
1 0 0 0 0
0 1 0 1 0
0 0 1 0 1
0 0 0 1 1
0 0 0 0 0
400
100
100
200
0
.
We have general solution (x
1
, . . . , x
5
) = (400, t 100, t + 100, t 200, t), where t is a parameter. Since
one way streets do not permit negative ow, all the coordinates have to be non-negative. It follows that
t 200.
Example 1.7.2. The picture below represents the quantities of a particular product that ow from
manufacturers M
1
, M
2
, M
3
, through wholesalers W
1
, W
2
, W
3
and retailers R
1
, R
2
, R
3
, R
4
, to consumers:
Example 1.7.2. The picture below represents the quantities of a particular product that ow from
manufacturers M
1
, M
2
, M
3
, through wholesalers W
1
, W
2
, W
3
and retailers R
1
, R
2
, R
3
, R
4
, to consumers:
M
1
M
2
M
3
W
1
W
2
W
3
R
1
R
2
R
3
R
4
200
x
1
x
2
300
x
3
x
4
x
5
300
x
6
100
x
7
400
x
8 200 500
We rst equate the total ow into each node with the total ow out of the same node:
node W
1
: 200 +x
1
= x
4
+x
5
,
node W
2
: 300 +x
2
= 300 +x
6
,
node W
3
: x
3
= 100 +x
7
,
node R
1
: x
4
= 400,
node R
2
: 300 +x
5
= x
8
,
node R
3
: 100 +x
6
= 200,
node R
4
: x
7
= 500.
We then equate the total ow into and out of the network:
200 +x
1
+x
2
+ 300 +x
3
= 400 +x
8
+ 200 + 500.
These give rise to a system of 8 linear equations
x
1
x
4
x
5
= 200,
x
2
x
6
= 0,
x
3
x
7
= 100,
x
4
= 400,
x
5
x
8
= 300,
x
6
= 100,
x
7
= 500,
x
1
+x
2
+x
3
x
8
= 600,
1
, . . . , x
8
1 0 0 1 1 0 0 0
0 1 0 0 0 1 0 0
0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 1
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
1 1 1 0 0 0 0 1
200
0
100
400
300
100
500
600
.
This has row echelon form
1 0 0 1 1 0 0 0
0 1 0 0 0 1 0 0
0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 1
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0
200
0
100
400
300
100
500
0
.
We have general solution (x
1
, . . . , x
8
) = (t100, 100, 600, 400, t300, 100, 500, t), where t is a parameter.
If no goods is returned, then all the coordinates have to be non-negative. It follows that t 300.
1.8. Application to Electrical Networks
A simple electric circuit consists of two basic components, electrical sources where the electrical potential
E is measured in volts (V ), and resistors where the resistence R is measured in ohms (). We are
interested in determining the current I measured in amperes (A).
The electrical potential between two points is sometimes called the voltage drop between these two
points. Currents and voltage drops can be positive or negative.
The current ow in an electrical circuit is governed by three basic rules:
Ohms law: The voltage drop E across a resistor with resistence R with a current I passing through
it is given by E = IR.
Current law: The sum of the currents owing into any point is the same as the sum of the currents
owing out of the point.
Voltage law: The sum of the voltage drops around any closed loop is equal to zero.
Around any loop, we select a positive direction clockwise or anticlockwise as we see t. We have the
following convention:
The voltage drop across a resistor is taken to be positive if the current ows in the positive direction
of the loop, and negative if the current ows in the negative direction of the loop.
The voltage drop across an electrical source is taken to be positive if the positive direction of the
loop is from + to , and negative if the positive direction of the loop is from to +.
Example 1.8.1. Consider the electric circuit shown in the diagram below:
8
A
I
1
I
3
20V
+
I
2
4 20
+
B
16V
We wish to determine the currents I
1
, I
2
and I
3
. Applying the Current law to the point A, we obtain
I
1
= I
2
+ I
3
. Applying the Current law to the point B, we obtain the same. Hence we have the linear
equation
I
1
I
2
I
3
= 0.
Next, let us consider the left hand loop, and let us take the positive direction to be clockwise. By
Ohms law, the voltage drop across the 8 resistor is 8I
1
, while the voltage drop across the 4 resistor is
4I
2
. On the other hand, the voltage drop across the 20V electrical source is negative, since the positive
direction of the loop is from to +. The Voltage law applied to this loop now gives 8I
1
+4I
2
20 = 0,
and we have the linear equation
8I
1
+ 4I
2
= 20, or 2I
1
+I
2
= 5.
Next, let us consider the right hand loop, and let us take the positive direction to be clockwise. By Ohms
law, the voltage drop across the 20 resistor is 20I
3
, while the voltage drop across the 4 resistor is
4I
2
. On the other hand, the voltage drop across the 16V electrical source is negative, since the positive
direction of the loop is from to +. The Voltage law applied to this loop now gives 20I
3
4I
2
16 = 0,
4I
2
20I
3
= 16, or I
2
5I
3
= 4.
We now have a system of three linear equations
I
1
I
2
I
3
= 0,
2I
1
+I
2
= 5,
I
2
5I
3
= 4.
(29)
The augmented matrix is given by
1 1 1
2 1 0
0 1 5
0
5
4
, with reduced row echelon form
1 0 0
0 1 0
0 0 1
2
1
1
.
Hence I
1
= 2 and I
2
= I
3
= 1. Note here that we have not considered the outer loop. Suppose again that
we take the positive direction to be clockwise. By Ohms law, the voltage drop across the 8 resistor is
8I
1
, while the voltage drop across the 20 resistor is 20I
3
. On the other hand, the voltage drop across
the 20V and 16V electrical sources are both negative. The Voltage law applied to this loop then gives
8I
1
+ 20I
3
36 = 0. But this equation can be obtained by combining the last two equations in (29).
8
A
I
1
I
3
20V
+
I
2
6 8
+
B
5 30V
We wish to determine the currents I
1
, I
2
and I
3
. Applying the Current law to the point A, we obtain
I
1
+ I
2
= I
3
. Applying the Current law to the point B, we obtain the same. Hence we have the linear
equation
I
1
+I
2
I
3
= 0.
Next, let us consider the left hand loop, and let us take the positive direction to be clockwise. By Ohms
law, the voltage drop across the 8 resistor is 8I
1
, the voltage drop across the 6 resistor is 6I
2
, while
the voltage drop across the 5 resistor is 5I
1
. On the other hand, the voltage drop across the 20V
electrical source is negative, since the positive direction of the loop is from to +. The Voltage law
applied to this loop now gives 8I
1
6I
2
+ 5I
1
20 = 0, and we have the linear equation
13I
1
6I
2
= 20.
Next, let us consider the outer loop, and let us take the positive direction to be clockwise. By Ohms
law, the voltage drop across the 8 resistor on the top is 8I
1
, the voltage drop across the 8 resistor
on the right is 8I
3
1
. On the other hand, the voltage
drop across the 30V and 20V electrical sources are both negative, since the positive direction of the loop
is from to + in each case. The Voltage law applied to this loop now gives 8I
1
+ 8I
3
+ 5I
1
50 = 0,
13I
1
+ 8I
3
= 50.
We now have a system of three linear equations
I
1
+ I
2
I
3
= 0,
13I
1
6I
2
= 20,
13I
1
+ 8I
3
= 50.
(30)
The augmented matrix is given by
1 1 1
13 6 0
13 0 8
0
20
50
, with reduced row echelon form
1 0 0
0 1 0
0 0 1
2
1
3
.
Hence I
1
= 2, I
2
= 1 and I
3
= 3. Note here that we have not considered the right hand loop. Suppose
again that we take the positive direction to be clockwise. By Ohms law, the voltage drop across the
8 resistor is 8I
3
2
. On the other hand, the voltage
drop across the 30V electrical source is negative. The Voltage law applied to this loop then gives
8I
3
+ 6I
2
30 = 0. But this equation can be obtained by combining the last two equations in (30).
1.9. Application to Economics
In this section, we describe a simple exchange model due to the economist Leontief. An economy is
divided into sectors. We know the total output for each sector as well as how outputs are exchanged
among the sectors. The value of the total output of a given sector is known as the price of the output.
Leontief has shown that there exist equilibrium prices that can be assigned to the total output of the
sectors in such a way that the income for each sector is exactly the same as its expenses.
Example 1.9.1. An economy consists of three sectors A, B, C which purchase from each other according
to the table below:
proportion of output from sector
A B C
purchased by sector A 0.2 0.6 0.1
purchased by sector B 0.4 0.1 0.5
purchased by sector C 0.4 0.3 0.4
Let p
A
, p
B
, p
C
denote respectively the value of the total output of sectors A, B, C. For the expense to
match the value for each sector, we must have
0.2p
A
+ 0.6p
B
+ 0.1p
C
= p
A
,
0.4p
A
+ 0.1p
B
+ 0.5p
C
= p
B
,
0.4p
A
+ 0.3p
B
+ 0.4p
C
= p
C
,
leading to the homogeneous linear equations
0.8p
A
0.6p
B
0.1p
C
= 0,
0.4p
A
0.9p
B
+ 0.5p
C
= 0,
0.4p
A
+ 0.3p
B
0.6p
C
= 0,
giving rise to the augmented matrix
0.8 0.6 0.1

0.4 0.9 0.5
0.4 0.3 0.6
0
0
0
, or simply
8 6 1
4 9 5
4 3 6
0
0
0
.
This can be reduced by elementary row operations to
16 0 13
0 12 11
0 0 0
0
0
0
,
leading to the solution (p
A
, p
B
, p
C
) = t(
13
16
,
11
12
, 1) if we assign the free variable p
C
= t, or to the solution
(p
A
, p
B
, p
C
) = t(39, 44, 48) if we assign the free variable p
C
= 48t, where t is a real parameter. For the
latter, the choice t = 10
6
gives rise to the prices of 39, 44 and 48 million for the three sectors A, B, C
respectively.
1.10. Application to Chemistry
Chemical equations consist of reactants and products. The problem is to balance such equations so that
the following two rules apply:
Conservation of mass: No atoms are produced or destroyed in a chemical reaction.
Conservation of charge: The total charge of reactants is equal to the total charge of the products.
Example 1.10.1. Consider the oxidation of ammonia to form nitric oxide and water, given by the
chemical equation
(x
1
)NH
3
+ (x
2
)O
2
(x
3
)NO + (x
4
)H
2
O.
Here the reactants are ammonia (NH
3
) and oxygen (O
2
), while the products are nitric oxide (NO) and
water (H
2
O). Our problem is to nd the smallest positive integer values of x
1
, x
2
, x
3
, x
4
such that the
equation balances. To do this, the technique is to equate the total number of each type of atoms on the
two sides of the chemical equation:
atom N: x
1
= x
3
,
atom H: 3x
1
= 2x
4
,
atom O: 2x
2
= x
3
+x
4
.
These give rise to a homogeneous system of 3 linear equations
x
1
x
3
= 0,
3x
1
2x
4
= 0,
2x
2
x
3
x
4
= 0,
1
, . . . , x
4
1 0 1 0
3 0 0 2
0 2 1 1
0
0
0
,
which can be simplied by elementary row operations to
1 0 1 0
0 2 1 1
0 0 3 2
0
0
0
,
leading to the general solution (x
1
, . . . , x
4
) = t(
2
3
,
5
6
,
2
3
, 1) if we assign the free variable x
4
= t. The
choice t = 6 gives rise to the smallest positive integer solution (x
1
, . . . , x
4
) = (4, 5, 4, 6), leading to the
balanced chemical equation
4NH
3
+ 5O
2
4NO + 6H
2
O.
Example 1.10.2. Consider the chemical equation
(x
1
)CO + (x
2
)CO
2
+ (x
3
)H
2
(x
4
)CH
4
+ (x
5
)H
2
O.
We equate the total number of each type of atoms on the two sides of the chemical equation:
atom C: x
1
+x
2
= x
4
,
atom O: x
1
+ 2x
2
= x
5
,
atom H: 2x
3
= 4x
4
+ 2x
5
.
These give rise to a homogeneous system of 3 linear equations
x
1
+ x
2
x
4
= 0,
x
1
+ 2x
2
x
5
= 0,
2x
3
4x
4
2x
5
= 0,
1
, . . . , x
5
1 1 0 1 0
1 2 0 0 1
0 0 2 4 2
0
0
0
,
with reduced row echelon form
1 0 0 2 1
0 1 0 1 1
0 0 1 2 1
0
0
0
,
leading to the general solution (x
1
, . . . , x
5
) = s(2, 1, 2, 1, 0) + t(1, 1, 1, 0, 1) if we assign the two free
variables x
4
= s and x
5
= t. The choice s = 2 and t = 3 leads to the solution (x
1
, . . . , x
5
) = (1, 1, 7, 2, 3),
with balanced chemical equation
CO +CO
2
+ 7H
2
2CH
4
+ 3H
2
O;
the choice s = 3 and t = 4 leads to the solution (x
1
, . . . , x
5
) = (2, 1, 10, 3, 4), with balanced chemical
equation
2CO +CO
2
+ 10H
2
3CH
4
+ 4H
2
O;
while the choice s = 3 and t = 5 leads to the solution (x
1
, . . . , x
5
) = (1, 2, 11, 3, 5), with balanced
chemical equation
CO + 2CO
2
+ 11H
2
3CH
4
+ 5H
2
O.
All these are known to happen.
1.11. Application to Mechanics
In this section, we consider the problem of systems of weights, light ropes and smooth light pulleys,
subject to the following two main principles:
If a light rope passes around one or more smooth light pulleys, then the tension at the two ends are
the same.
Newtons second law of motion: We have F = m x, where F denotes force, m denotes mass and x
denotes acceleration.
Example 1.11.1. Two particles, of mass 2 and 4 (kilograms), are attached to the ends of a light rope
passing around a smooth light pulley suspended from the ceiling as shown in the diagram below:
x
1
x
2
2
4
We would like to nd the tension in the rope and the acceleration of each particle. Here it will be
convenient that the distances x
1
and x
2
are measured downwards, and we take this as the positive
direction, so that any positive accelaration is downwards. We rst apply Newtons law of motion to each
particle. The picture below summarizes the forces action on the two particles:
T T
2 4
2g 4g
1
and x
2
x
1
x
2
2
4
1
and x
2
T T
2 4
2g 4g
Here T denotes the tension in the rope, and g denotes acceleration due to gravity. Newtons law of
motion applied to the two particles (downwards) then give the equations
2 x
1
= 2g T and 4 x
2
= 4g T.
We also have the conservation of the length of the rope, in the form x
1
+x
2
= C, so that x
1
+ x
2
= 0.
To summarize, for the three variables x
1
, x
2
, T, we have the system of linear equations
2 x
1
+T = 2g,
4 x
2
+T = 4g,
x
1
+ x
2
= 0,
with augmented matrix
2 0 1
0 4 1
1 1 0
2g
4g
0
,
which can be reduced by elementary row operations to
1 1 0
0 2 1
0 0 3
0
2g
8g
.
This leads to the solution ( x
1
, x
2
, T) = (
1
3
g,
1
3
g,
8
3
g).
Example 1.11.2. We now generalize the problem in the previous example. Two particles, of mass m
1
and m
2
, are attached to the ends of a light rope passing around a smooth light pulley suspended from
the ceiling as shown in the diagram below:
Example 1.11.2. We now generalize the problem in the previous example. Two particles, of mass m
1
and m
2
, are attached to the ends of a light rope passing around a smooth light pulley suspended from
the ceiling as shown in the diagram below:
x
1
x
2
m
1
m
2
For the three variables x
1
, x
2
, T, we now have the system of linear equations
m
1
x
1
+T = m
1
g,
m
2
x
2
+T = m
2
g,
x
1
+ x
2
= 0,
with augmented matrix
m
1
0 1
0 m
2
1
1 1 0
m
1
g
m
2
g
0
,
which can be reduced by elementary row operations to
1 1 0
0 m
1
m
2
m
1
0 0 m
1
+m
2
0
m
1
m
2
g
2m
1
m
2
g
.
This leads to the solution
( x
1
, x
2
, T) =
m
1
m
2
m
1
+m
2
g,
m
2
m
1
m
1
+m
2
g,
2m
1
m
2
m
1
+m
2
g
.
Note that if m
1
= m
2
, then x
1
= x
2
= 0, so that the particles are stationary. On the other hand, if
m
2
> m
1
, then x
2
> 0 and x
1
< 0. Then
T <
2m
1
m
2
m
1
+m
1
g = m
2
g and T >
2m
1
m
2
m
2
+m
2
g = m
1
g.
Hence m
1
g < T < m
2
g.
Problems for Chapter 1
1. Consider the system of linear equations
2x
1
+ 5x
2
+ 8x
3
= 2,
x
1
+ 2x
2
+ 3x
3
= 4,
3x
1
+ 4x
2
+ 4x
3
= 1.
a) Write down the augmented matrix for this system.
b) Reduce the augmented matrix by elementary row operations to row echelon form.
c) Use your answer in part (b) to solve the system of linear equations.
4x
1
+ 5x
2
+ 8x
3
= 0,
x
1
+ 3x
3
= 6,
3x
1
+ 4x
2
+ 6x
3
= 9.
x
1
x
2
7x
3
+ 7x
4
= 5,
x
1
+ x
2
+ 8x
3
5x
4
= 7,
3x
1
2x
2
17x
3
+ 13x
4
= 14,
2x
1
x
2
11x
3
+ 8x
4
= 7.
4. Solve the system of linear equations
x + 3y 2z = 4,
2x + 7y + 2z = 10.
5. For each of the augmented matrices below, reduce the matrix to row echelon or reduced row echelon
form, and solve the system of linear equations represented by the matrix:
a)
1 1 2 1
3 2 1 3
4 3 1 4
2 1 3 2
5
6
11
1
b)
1 2 3 3
2 5 3 12
7 1 8 5
1
2
7
6. Reduce each of the following arrays by elementary row operations to reduced row echelon form:
a)
1 2 3 4 5
0 2 3 4 5
0 0 3 4 5
0 0 0 4 5
b)
1 1 0 0 0
0 1 1 0 0
0 0 1 1 0
0 0 0 1 1
c)
1 11 21 31 41 51
2 12 22 32 42 52
3 13 23 33 43 53

7. Consider a system of linear equations in ve variables x = (x
1
, x
2
, x
3
, x
4
, x
5
) and expressed in matrix
form Ax = b, where x is written as a column matrix. Suppose that the augmented matrix (A|b)
can be reduced by elementary row operations to the row echelon form
1 3 2 0 6
0 0 1 1 2
0 0 0 1 1
0 0 0 0 0
4
1
7
0
.
a) Which are the pivot variables and which are the free variables?
b) Determine all the solutions of the system of linear equations.
8. Consider a system of linear equations in ve variables x = (x
1
, x
2
, x
3
, x
4
, x
5
) and expressed in matrix
form Ax = b, where x is written as a column matrix. Suppose that the augmented matrix (A|b)
can be reduced by elementary row operations to the row echelon form
1 2 0 3 1
0 1 3 1 2
0 0 0 1 1
0 0 0 0 0
5
3
4
0
.
a) Which are the pivot variables and which are the free variables?
b) Determine all the solutions of the system of linear equations.
x
1
+ x
2
x
3
= 1,
2x
1
+ x
2
+ 2x
3
= 5 + 1,
x
1
x
2
+ 3x
3
= 4 + 2,
x
1
2x
2
+ 7x
3
= 10 1.
a) Reduce its associated augmented matrix to row echelon form.
[Hint: After one or two steps, we will nd the calculations extremely unpleasant, particularly
since we do not know whether is zero or non-zero. Try rewriting the system of equations as
a system in the variables x
1
, x
3
, x
2
, so that columns 2 and 3 of the augmented matrix are now
swapped.]
b) Find a value of for which the system is soluble.
c) Solve the system.
10. Find the minimum value for x
4
in the following system of one way streets:
10. Find the minimum value for x
4
in the following system of one way streets:
120 150
50 A x
1
B 80
x
2
x
3
100 C x
4
D 100
11. Consider the trac ow in the following system of one way streets:
50 A x
1
B x
2
x
3
x
4
80 C x
5
D 10
a) Find the general solution of the system.
b) Find the range for x
5
, and then determine the range for each of the other four variables.
30 30
60 A B C D 40
x
1
x
2
x
3
x
4
10 20 x
5
x
6
x
7
x
8
50 E F G H x
9
50 40
a) Find the general solution of the system.
b) Find the range for x
8
, and then determine the range for each of the other eight variables.
13. Consider the electric circuit shown in the diagram below:
A
I
1
I
3
10 I
2
5 6
+ +
B
20V 40V
Determine the currents I
1
, I
2
and I
3
. You must explain each step carefully, quoting all the relevant
laws on electric circuits. In particular, you must clearly indicate the positive direction of each loop
you are considering, and ensure that the voltage drop across every resistor and electrical source on
the loop carries the correct sign.
A
I
1
I
3
8 I
2
1 8
+ +
B
60V 20V
1
, I
2
and I
3
. You must explain each step carefully, quoting all the relevant
laws on electric circuits. In particular, you must clearly indicate the positive direction of each loop
you are considering, and ensure that the voltage drop across every resistor and electrical source on
the loop carries the correct sign.
8 20
A B
I
1
I
3
50V
+
I
2
5 I
5
5
+
10V
I
4
I
6
C D
1
, I
2
, I
3
, I
4
, I
5
and I
6
. You must explain each step carefully, quoting all the
relevant laws on electric circuits. In particular, you must clearly indicate the positive direction of
each loop you are considering, and ensure that the voltage drop across every resistor and electrical
source on the loop carries the correct sign.
16. Three industries A, B, C consume their own outputs and also buy from each other according to the
table below:
proportion of output of industry
A B C
bought by industry A 0.35 0.50 0.30
bought by industry B 0.25 0.20 0.30
bought by industry C 0.40 0.30 0.40
Use the simple exchange model due to the economist Leontief to determine equilibrium prices that
they can charge each other so that no money changes hands.
17. An arrangement exists for three colleagues A, B, C who work for themselves and each other according
to the table below:
percentage of time spent by
A B C
working for A 50 40 10
working for B 10 20 60
working for C 40 40 30
Use the simple exchange model due to the economist Leontief to determine equilibrium fees that
they can charge each other so that no money changes hands.
18. Three farmers A, B, C grow bananas, oranges and apples respectively, and buy o each other.
Farmer A buys 50% of the oranges and 20% of the apples, farmer B buys 30% of the bananas and
40% of the apples, while farmer C buys 50% of the bananas and 20% of the oranges. Use the simple
exchange model due to the economist Leontief to determine equilibrium prices that they can charge
each other so that no money changes hands.
19. For each of the following chemical reactions, determine the balanced chemical equation:
a) reactants Al and O
2
; product Al
2
O
3
b) reactants C
2
H
6
and O
2
; products CO
2
and H
2
O
c) reactants PbO
2
and HCl; products PbCl
2
, Cl
2
and H
2
O
d) reactants C
2
H
5
OH and O
2
; products CO
2
and H
2
O
e) reactants MnO
2
, H
2
SO
4
and H
2
C
2
O
4
; products MnSO
4
, CO
2
and H
2
O
20. Two particles, of mass m
1
and m
2
(kilograms), are arranged with light ropes and smooth light
pulleys as shown in the diagram below:
20. Two particles, of mass m
1
and m
2
(kilograms), are arranged with light ropes and smooth light
pulleys as shown in the diagram below:
x
1
x
2
m
1
m
2
a) Consider rst of all the case when m
1
= m
2
= 3.
(i) Show that the augmented matrix for a system of linear equations in the three variables x
1
, x
2
, T,
where T denotes the tension of the rope, has reduced row echelon form
1 0 0
0 1 0
0 0 1
g/5
2g/5
9g/5
.
(ii) Determine x
1
, x
2
, T in this case.
(iii) In which direction is the particle on the right moving?
b) Show that in the general case, we have
( x
1
, x
2
, T) =
(m
1
2m
2
)g
m
1
+ 4m
2
,
2(2m
2
m
1
)g
m
1
+ 4m
2
,
3m
1
m
2
g
m
1
+ 4m
2
.
c) What relationship must m
1
and m
2
have in order to achieve equilibrium?
21. Three particles, of mass m
1
, m
2
and m
3
(kilograms), are arranged with light ropes and smooth
light pulleys as shown in the diagram below:
b) Show that in the general case, we have
( x
1
, x
2
, T) =
(m
1
2m
2
)g
m
1
+ 4m
2
,
2(2m
2
m
1
)g
m
1
+ 4m
2
,
3m
1
m
2
g
m
1
+ 4m
2
.
c) What relationship must m
1
and m
2
have in order to achieve equilibrium?
21. Three particles, of mass m
1
, m
2
and m
3
(kilograms), are arranged with light ropes and smooth
light pulleys as shown in the diagram below:
x
2
x
1
x
3
m
3
m
1
m
2
a) Show that we have
( x
1
, x
2
, x
3
, T) =
1
4m
2
m
3
M
g,
1
8m
1
m
3
M
g,
1
4m
1
m
2
M
g,
4m
1
m
2
m
3
g
M
,
where T denotes the tension of the rope and M = m
1
m
2
+m
2
m
3
+ 4m
1
m
3
.
b) Show that equilibrium occurs precisely when m
2
= 2m
1
= 2m
3
.
a) Show that we have
( x
1
, x
2
, x
3
, T) =
1
4m
2
m
3
M
g,
1
8m
1
m
3
M
g,
1
4m
1
m
2
M
g,
4m
1
m
2
m
3
g
M
,
where T denotes the tension of the rope and M = m
1
m
2
+m
2
m
3
+ 4m
1
m
3
.
b) Show that equilibrium occurs precisely when m
2
= 2m
1
= 2m
3
.
LINEAR ALGEBRA
W W L CHEN
c W W L Chen, 1982, 2008.
Chapter 2
MATRICES
2.1. Introduction
A rectangular array of numbers of the form
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
_
_
(1)
is called an mn matrix, with m rows and n columns. We count rows from the top and columns from
the left. Hence
( a
i1
. . . a
in
) and
_
_
_
a
1j
.
.
.
a
mj
_
_
_
represent respectively the i-th row and the j-th column of the matrix (1), and a
ij
represents the entry
in the matrix (1) on the i-th row and j-th column.
Example 2.1.1. Consider the 3 4 matrix
_
_
2 4 3 1
3 1 5 2
1 0 7 6
_
_
.
Here
( 3 1 5 2 ) and
_
_
3
5
7
_
_
Chapter 2 : Matrices page 1 of 39
represent respectively the 2-nd row and the 3-rd column of the matrix, and 5 represents the entry in the
matrix on the 2-nd row and 3-rd column.
We now consider the question of arithmetic involving matrices. First of all, let us study the problem
of addition. A reasonable theory can be derived from the following denition.
Definition. Suppose that the two matrices
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
_
_
and B =
_
_
b
11
. . . b
1n
.
.
.
.
.
.
b
m1
. . . b
mn
_
_
both have m rows and n columns. Then we write
A+B =
_
_
a
11
+b
11
. . . a
1n
+b
1n
.
.
.
.
.
.
a
m1
+b
m1
. . . a
mn
+b
mn
_
_
and call this the sum of the two matrices A and B.
Example 2.1.2. Suppose that
A =
_
_
2 4 3 1
3 1 5 2
1 0 7 6
_
_
and B =
_
_
1 2 2 7
0 2 4 1
2 1 3 3
_
_
.
Then
A+B =
_
_
2 + 1 4 + 2 3 2 1 + 7
3 + 0 1 + 2 5 + 4 2 1
1 2 0 + 1 7 + 3 6 + 3
_
_
=
_
_
3 6 1 6
3 3 9 1
3 1 10 9
_
_
.
Example 2.1.3. We do not have a denition for adding the matrices
_
2 4 3 1
1 0 7 6
_
and
_
_
2 4 3
3 1 5
1 0 7
_
_
.
PROPOSITION 2A. (MATRIX ADDITION) Suppose that A, B, C are m n matrices. Suppose
further that O represents the mn matrix with all entries zero. Then
(a) A+B = B +A;
(b) A+ (B +C) = (A+B) +C;
(c) A+O = A; and
(d) there is an mn matrix A
such that A+A
= O.
Proof. Parts (a)(c) are easy consequences of ordinary addition, as matrix addition is simply entry-wise
addition. For part (d), we can consider the matrix A
obtained from A by multiplying each entry of A

by 1.
The theory of multiplication is rather more complicated, and includes multiplication of a matrix by a
scalar as well as multiplication of two matrices.
We rst study the simpler case of multiplication by scalars.
Definition. Suppose that the matrix
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
_
_
has m rows and n columns, and that c R. Then we write
cA =
_
_
ca
11
. . . ca
1n
.
.
.
.
.
.
ca
m1
. . . ca
mn
_
_
and call this the product of the matrix A by the scalar c.
Example 2.1.4. Suppose that
A =
_
_
2 4 3 1
3 1 5 2
1 0 7 6
_
_
.
Then
2A =
_
_
4 8 6 2
6 2 10 4
2 0 14 12
_
_
.
PROPOSITION 2B. (MULTIPLICATION BY SCALAR) Suppose that A, B are mn matrices, and
that c, d R. Suppose further that O represents the mn matrix with all entries zero. Then
(a) c(A+B) = cA+cB;
(b) (c +d)A = cA+dA;
(c) 0A = O; and
(d) c(dA) = (cd)A.
Proof. These are all easy consequences of ordinary multiplication, as multiplication by scalar c is simply
entry-wise multiplication by the number c.
The question of multiplication of two matrices is rather more complicated. To motivate this, let us
consider the representation of a system of linear equations
a
11
x
1
+. . . + a
1n
x
n
= b
1
,
.
.
.
a
m1
x
1
+. . . +a
mn
x
n
= b
m
,
(2)
in the form Ax = b, where
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
_
_
and b =
_
_
b
1
.
.
.
b
m
_
_
(3)
x =
_
_
x
1
.
.
.
x
n
_
_
(4)
represents the variables. This can be written in full matrix notation by
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
_
_
_
_
x
1
.
.
.
x
n
_
_
=
_
_
b
1
.
.
.
b
m
_
_
.
Can you work out the meaning of this representation?
Now let us dene matrix multiplication more formally.
Definition. Suppose that
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
_
_
and B =
_
_
_
b
11
. . . a
1p
.
.
.
.
.
.
b
n1
. . . b
np
_
_
_
are respectively an m n matrix and an n p matrix. Then the matrix product AB is given by the
mp matrix
AB =
_
_
_
q
11
. . . q
1p
.
.
.
.
.
.
q
m1
. . . q
mp
_
_
_,
where for every i = 1, . . . , m and j = 1, . . . , p, we have
q
ij
=
n
k=1
a
ik
b
kj
= a
i1
b
1j
+. . . +a
in
b
nj
.
Remark. Note rst of all that the number of columns of the rst matrix must be equal to the number
of rows of the second matrix. On the other hand, for a simple way to work out q
ij
, the entry in the i-th
row and j-th column of AB, we observe that the i-th row of A and the j-th column of B are respectively
( a
i1
. . . a
in
) and
_
_
_
b
1j
.
.
.
b
nj
_
_
_.
We now multiply the corresponding entries from a
i1
with b
1j
, and so on, until a
in
with b
nj
and then
add these products to obtain q
ij
.
Example 2.1.5. Consider the matrices
A =
_
_
2 4 3 1
3 1 5 2
1 0 7 6
_
_
and B =
_
_
_
1 4
2 3
0 2
3 1
_
_
_.
Note that A is a 3 4 matrix and B is a 4 2 matrix, so that the product AB is a 3 2 matrix. Let
us calculate the product
AB =
_
_
q
11
q
12
q
21
q
22
q
31
q
32
_
_
.
Consider rst of all q
11
. To calculate this, we need the 1-st row of A and the 1-st column of B, so let us
cover up all unnecessary information, so that
_
_
2 4 3 1

_
_
_
_
_
1
2
0
3
_
_
_ =
_
_
q
11

_
_
.
From the denition, we have
q
11
= 2 1 + 4 2 + 3 0 + (1) 3 = 2 + 8 + 0 3 = 7.
Consider next q
12
. To calculate this, we need the 1-st row of A and the 2-nd column of B, so let us cover
up all unnecessary information, so that
_
_
2 4 3 1

_
_
_
_
_
4
3
2
1
_
_
_ =
_
_
q
12

_
_
.
q
12
= 2 4 + 4 3 + 3 (2) + (1) 1 = 8 + 12 6 1 = 13.
Consider next q
21
. To calculate this, we need the 2-nd row of A and the 1-st column of B, so let us cover
_
_

3 1 5 2

_
_
_
_
_
1
2
0
3
_
_
_ =
_
_

q
21

_
_
.
q
21
= 3 1 + 1 2 + 5 0 + 2 3 = 3 + 2 + 0 + 6 = 11.
Consider next q
22
. To calculate this, we need the 2-nd row of A and the 2-nd column of B, so let us
_
_

3 1 5 2

_
_
_
_
_
4
3
2
1
_
_
_ =
_
_

q
22

_
_
.
q
22
= 3 4 + 1 3 + 5 (2) + 2 1 = 12 + 3 10 + 2 = 7.
Consider next q
31
. To calculate this, we need the 3-rd row of A and the 1-st column of B, so let us cover
_
_

1 0 7 6
_
_
_
_
_
1
2
0
3
_
_
_ =
_
_

q
31

_
_
.
q
31
= (1) 1 + 0 2 + 7 0 + 6 3 = 1 + 0 + 0 + 18 = 17.
Consider nally q
32
. To calculate this, we need the 3-rd row of A and the 2-nd column of B, so let us
_
_

1 0 7 6
_
_
_
_
_
4
3
2
1
_
_
_ =
_
_

q
32
_
_
.
q
32
= (1) 4 + 0 3 + 7 (2) + 6 1 = 4 + 0 +14 + 6 = 12.
We therefore conclude that
AB =
_
_
2 4 3 1
3 1 5 2
1 0 7 6
_
_
_
_
_
1 4
2 3
0 2
3 1
_
_
_ =
_
_
7 13
11 7
17 12
_
_
.
Example 2.1.6. Consider again the matrices
A =
_
_
2 4 3 1
3 1 5 2
1 0 7 6
_
_
and B =
_
_
_
1 4
2 3
0 2
3 1
_
_
_.
Note that B is a 4 2 matrix and A is a 3 4 matrix, so that we do not have a denition for the
product BA.
We leave the proofs of the following results as exercises for the interested reader.
PROPOSITION 2C. (ASSOCIATIVE LAW) Suppose that A is an mn matrix, B is an np matrix
and C is an p r matrix. Then A(BC) = (AB)C.
PROPOSITION 2D. (DISTRIBUTIVE LAWS)
(a) Suppose that A is an mn matrix and B and C are n p matrices. Then A(B +C) = AB +AC.
(b) Suppose that A and B are mn matrices and C is an n p matrix. Then (A+B)C = AC +BC.
PROPOSITION 2E. Suppose that A is an mn matrix, B is an np matrix, and that c R. Then
c(AB) = (cA)B = A(cB).
2.2. Systems of Linear Equations
Note that the system (2) of linear equations can be written in matrix form as
Ax = b,
where the matrices A, x and b are given by (3) and (4). In this section, we shall establish the following
important result.
PROPOSITION 2F. Every system of linear equations of the form (2) has either no solution, one
solution or innitely many solutions.
Proof. Clearly the system (2) has either no solution, exactly one solution, or more than one solution.
It remains to show that if the system (2) has two distinct solutions, then it must have innitely many
solutions. Suppose that x = u and x = v represent two distinct solutions. Then
Au = b and Av = b,
so that
A(u v) = Au Av = b b = 0,
where 0 is the zero m1 matrix. It now follows that for every c R, we have
A(u +c(u v)) = Au +A(c(u v)) = Au +c(A(u v)) = b +c0 = b,
so that x = u +c(u v) is a solution for every c R. Clearly we have innitely many solutions.
2.3. Inversion of Matrices
For the remainder of this chapter, we shall deal with square matrices, those where the number of rows
equals the number of columns.
Definition. The n n matrix
I
n
=
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
_
_
,
where
a
ij
=
_
1 if i = j,
0 if i = j,
is called the identity matrix of order n.
Remark. Note that
I
1
= ( 1 ) and I
4
=
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_.
The following result is relatively easy to check. It shows that the identity matrix I
n
acts as the identity
for multiplication of n n matrices.
PROPOSITION 2G. For every n n matrix A, we have AI
n
= I
n
A = A.
This raises the following question: Given an nn matrix A, is it possible to nd another nn matrix
B such that AB = BA = I
n
?
We shall postpone the full answer to this question until the next chapter. In Section 2.5, however, we
shall be content with nding such a matrix B if it exists. In Section 2.6, we shall relate the existence of
such a matrix B to some properties of the matrix A.
Definition. An n n matrix A is said to be invertible if there exists an n n matrix B such that
AB = BA = I
n
. In this case, we say that B is the inverse of A and write B = A
1
.
PROPOSITION 2H. Suppose that A is an invertible n n matrix. Then its inverse A
1
is unique.
Proof. Suppose that B satises the requirements for being the inverse of A. Then AB = BA = I
n
. It
follows that
A
1
= A
1
I
n
= A
1
(AB) = (A
1
A)B = I
n
B = B.
Hence the inverse A
1
is unique.
PROPOSITION 2J. Suppose that A and B are invertible n n matrices. Then (AB)
1
= B
1
A
1
.
Proof. In view of the uniqueness of inverse, it is sucient to show that B
1
A
1
satises the require-
ments for being the inverse of AB. Note that
(AB)(B
1
A
1
) = A(B(B
1
A
1
)) = A((BB
1
)A
1
) = A(I
n
A
1
) = AA
1
= I
n
and
(B
1
A
1
)(AB) = B
1
(A
1
(AB)) = B
1
((A
1
A)B) = B
1
(I
n
B) = B
1
B = I
n
as required.
PROPOSITION 2K. Suppose that A is an invertible n n matrix. Then (A
1
)
1
= A.
Proof. Note that both (A
1
)
1
and A satisfy the requirements for being the inverse of A
1
. Equality
follows from the uniqueness of inverse.
2.4. Application to Matrix Multiplication
In this section, we shall discuss an application of invertible matrices. Detailed discussion of the technique
involved will be covered in Chapter 7.
Definition. An n n matrix
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
_
_
,
where a
ij
= 0 whenever i = j, is called a diagonal matrix of order n.
Example 2.4.1. The 3 3 matrices
_
_
1 0 0
0 2 0
0 0 0
_
_
and
_
_
0 0 0
0 0 0
0 0 0
_
_
are both diagonal.
Given an n n matrix A, it is usually rather complicated to calculate
A
k
= A. . . A
. .
k
.
However, the calculation is rather simple when A is a diagonal matrix, as we shall see in the following
example.
Example 2.4.2. Consider the 3 3 matrix
A =
_
_
17 10 5
45 28 15
30 20 12
_
_
.
Suppose that we wish to calculate A
98
. It can be checked that if we take
P =
_
_
1 1 2
3 0 3
2 3 0
_
_
,
then
P
1
=
_
_
3 2 1
2 4/3 1
3 5/3 1
_
_
.
Furthermore, if we write
D =
_
_
3 0 0
0 2 0
0 0 2
_
_
,
then it can be checked that A = PDP
1
, so that
A
98
= (PDP
1
) . . . (PDP
1
)
. .
98
= PD
98
P
1
= P
_
_
3
98
0 0
0 2
98
0
0 0 2
98
_
_
P
1
.
This is much simpler than calculating A
98
directly. Note that this example is only an illustration. We
have not discussed here how the matrices P and D are found.
2.5. Finding Inverses by Elementary Row Operations
In this section, we shall discuss a technique by which we can nd the inverse of a square matrix, if the
inverse exists. Before we discuss this technique, let us recall the three elementary row operations we
discussed in the previous chapter. These are: (1) interchanging two rows; (2) adding a multiple of one
row to another row; and (3) multiplying one row by a non-zero constant.
Let us now consider the following example.
Example 2.5.1. Consider the matrices
A =
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
and I
3
=
_
_
1 0 0
0 1 0
0 0 1
_
_
.
Let us interchange rows 1 and 2 of A and do likewise for I
3
. We obtain respectively
_
_
a
21
a
22
a
23
a
11
a
12
a
13
a
31
a
32
a
33
_
_
and
_
_
0 1 0
1 0 0
0 0 1
_
_
.
Note that
_
_
a
21
a
22
a
23
a
11
a
12
a
13
a
31
a
32
a
33
_
_
=
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
.
Let us interchange rows 2 and 3 of A and do likewise for I
3
_
_
a
11
a
12
a
13
a
31
a
32
a
33
a
21
a
22
a
23
_
_
and
_
_
1 0 0
0 0 1
0 1 0
_
_
.
Note that
_
_
a
11
a
12
a
13
a
31
a
32
a
33
a
21
a
22
a
23
_
_
=
_
_
1 0 0
0 0 1
0 1 0
_
_
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
.
Let us add 3 times row 1 to row 2 of A and do likewise for I
3
_
_
a
11
a
12
a
13
3a
11
+a
21
3a
12
+a
22
3a
13
+a
23
a
31
a
32
a
33
_
_
and
_
_
1 0 0
3 1 0
0 0 1
_
_
.
Note that
_
_
a
11
a
12
a
13
3a
11
+a
21
3a
12
+a
22
3a
13
+a
23
a
31
a
32
a
33
_
_
=
_
_
1 0 0
3 1 0
0 0 1
_
_
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
.
Let us add 2 times row 3 to row 1 of A and do likewise for I
3
_
_
2a
31
+a
11
2a
32
+a
12
2a
33
+a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
and
_
_
1 0 2
0 1 0
0 0 1
_
_
.
Note that
_
_
2a
31
+a
11
2a
32
+a
12
2a
33
+a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
=
_
_
1 0 2
0 1 0
0 0 1
_
_
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
.
Let us multiply row 2 of A by 5 and do likewise for I
3
_
_
a
11
a
12
a
13
5a
21
5a
22
5a
23
a
31
a
32
a
33
_
_
and
_
_
1 0 0
0 5 0
0 0 1
_
_
.
Note that
_
_
a
11
a
12
a
13
5a
21
5a
22
5a
23
a
31
a
32
a
33
_
_
=
_
_
1 0 0
0 5 0
0 0 1
_
_
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
.
Let us multiply row 3 of A by 1 and do likewise for I
3
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
and
_
_
1 0 0
0 1 0
0 0 1
_
_
.
Note that
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
=
_
_
1 0 0
0 1 0
0 0 1
_
_
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
.
Let us now consider the problem in general.
Definition. By an elementary nn matrix, we mean an nn matrix obtained from I
n
by an elementary
row operation.
We state without proof the following important result. The interested reader may wish to construct
a proof, taking into account the dierent types of elementary row operations.
PROPOSITION 2L. Suppose that A is an n n matrix, and suppose that B is obtained from A by
an elementary row operation. Suppose further that E is an elementary matrix obtained from I
n
by the
same elementary row operation. Then B = EA.
We now adopt the following strategy. Consider an nn matrix A. Suppose that it is possible to reduce
the matrix A by a sequence
1
,
2
, . . . ,
k
of elementary row operations to the identity matrix I
n
. If
E
1
, E
2
, . . . , E
k
are respectively the elementary n n matrices obtained from I
n
by the same elementary
row operations
1
,
2
. . . ,
k
, then
I
n
= E
k
. . . E
2
E
1
A.
We therefore must have
A
1
= E
k
. . . E
2
E
1
= E
k
. . . E
2
E
1
I
n
.
It follows that the inverse A
1
can be obtained fromI
n
by performing the same elementary row operations
1
,
2
, . . . ,
k
. Since we are performing the same elementary row operations on A and I
n
, it makes sense
to put them side by side. The process can then be described pictorially by
(A|I
n
)

1
(E
1
A|E
1
I
n
)
2
(E
2
E
1
A|E
2
E
1
I
n
)
3
. . .
k
(E
k
. . . E
2
E
1
A|E
k
. . . E
2
E
1
I
n
) = (I
n
|A
1
).
In other words, we consider an array with the matrix A on the left and the matrix I
n
on the right. We
now perform elementary row operations on the array and try to reduce the left hand half to the matrix
I
n
. If we succeed in doing so, then the right hand half of the array gives the inverse A
1
.
Example 2.5.2. Consider the matrix
A =
_
_
1 1 2
3 0 3
2 3 0
_
_
.
To nd A
1
, we consider the array
(A|I
3
) =
_
_
1 1 2 1 0 0
3 0 3 0 1 0
2 3 0 0 0 1
_
_
.
We now perform elementary row operations on this array and try to reduce the left hand half to the
matrix I
3
. Note that if we succeed, then the nal array is clearly in reduced row echelon form. We
therefore follow the same procedure as reducing an array to reduced row echelon form. Adding 3 times
row 1 to row 2, we obtain
_
_
1 1 2 1 0 0
0 3 3 3 1 0
2 3 0 0 0 1
_
_
.
_
_
1 1 2 1 0 0
0 3 3 3 1 0
0 5 4 2 0 1
_
_
.
_
_
1 1 2 1 0 0
0 3 3 3 1 0
0 15 12 6 0 3
_
_
.
_
_
1 1 2 1 0 0
0 3 3 3 1 0
0 0 3 9 5 3
_
_
.
_
_
3 3 6 3 0 0
0 3 3 3 1 0
0 0 3 9 5 3
_
_
.
_
_
3 3 0 15 10 6
0 3 3 3 1 0
0 0 3 9 5 3
_
_
.
_
_
3 3 0 15 10 6
0 3 0 6 4 3
0 0 3 9 5 3
_
_
.
_
_
3 0 0 9 6 3
0 3 0 6 4 3
0 0 3 9 5 3
_
_
.
_
_
1 0 0 3 2 1
0 3 0 6 4 3
0 0 3 9 5 3
_
_
.
_
_
1 0 0 3 2 1
0 1 0 2 4/3 1
0 0 3 9 5 3
_
_
.
_
_
1 0 0 3 2 1
0 1 0 2 4/3 1
0 0 1 3 5/3 1
_
_
.
Note now that the array is in reduced row echelon form, and that the left hand half is the identity matrix
I
3
. It follows that the right hand half of the array represents the inverse A
1
. Hence
A
1
=
_
_
3 2 1
2 4/3 1
3 5/3 1
_
_
.
A =
_
_
_
1 1 2 3
2 2 4 5
0 3 0 0
0 0 0 1
_
_
_.
To nd A
1
, we consider the array
(A|I
4
) =
_
_
_
1 1 2 3 1 0 0 0
2 2 4 5 0 1 0 0
0 3 0 0 0 0 1 0
0 0 0 1 0 0 0 1
_
_
_.
We now perform elementary row operations on this array and try to reduce the left hand half to the
matrix I
4
. Adding 2 times row 1 to row 2, we obtain
_
_
_
1 1 2 3 1 0 0 0
0 0 0 1 2 1 0 0
0 3 0 0 0 0 1 0
0 0 0 1 0 0 0 1
_
_
_.
_
_
_
1 1 2 3 1 0 0 0
0 0 0 1 2 1 0 0
0 3 0 0 0 0 1 0
0 0 0 0 2 1 0 1
_
_
_.
_
_
_
1 1 2 3 1 0 0 0
0 3 0 0 0 0 1 0
0 0 0 1 2 1 0 0
0 0 0 0 2 1 0 1
_
_
_.
At this point, we observe that it is impossible to reduce the left hand half of the array to I
4
. For those
who remain unconvinced, let us continue. Adding 3 times row 3 to row 1, we obtain
_
_
_
1 1 2 0 5 3 0 0
0 3 0 0 0 0 1 0
0 0 0 1 2 1 0 0
0 0 0 0 2 1 0 1
_
_
_.
_
_
_
1 1 2 0 5 3 0 0
0 3 0 0 0 0 1 0
0 0 0 1 0 0 0 1
0 0 0 0 2 1 0 1
_
_
_.
Multiplying row 1 by 6 (here we want to avoid fractions in the next two steps), we obtain
_
_
_
6 6 12 0 30 18 0 0
0 3 0 0 0 0 1 0
0 0 0 1 0 0 0 1
0 0 0 0 2 1 0 1
_
_
_.
_
_
_
6 6 12 0 0 3 0 15
0 3 0 0 0 0 1 0
0 0 0 1 0 0 0 1
0 0 0 0 2 1 0 1
_
_
_.
_
_
_
6 0 12 0 0 3 2 15
0 3 0 0 0 0 1 0
0 0 0 1 0 0 0 1
0 0 0 0 2 1 0 1
_
_
_.
Multiplying row 1 by 1/6, multiplying row 2 by 1/3, multiplying row 3 by 1 and multiplying row 4 by
1/2, we obtain
_
_
_
1 0 2 0 0 1/2 1/3 5/2
0 1 0 0 0 0 1/3 0
0 0 0 1 0 0 0 1
0 0 0 0 1 1/2 0 1/2
_
_
_.
Note now that the array is in reduced row echelon form, and that the left hand half is not the identity
matrix I
4
. Our technique has failed. In fact, the matrix A is not invertible.
2.6. Criteria for Invertibility
Examples 2.5.22.5.3 raise the question of when a given matrix is invertible. In this section, we shall
obtain some partial answers to this question. Our rst step here is the following simple observation.
PROPOSITION 2M. Every elementary matrix is invertible.
Proof. Let us consider elementary row operations. Recall that these are: (1) interchanging two rows;
(2) adding a multiple of one row to another row; and (3) multiplying one row by a non-zero constant.
These elementary row operations can clearly be reversed by elementary row operations. For (1), we
interchange the two rows again. For (2), if we have originally added c times row i to row j, then we can
reverse this by adding c times row i to row j. For (3), if we have multiplied any row by a non-zero
constant c, we can reverse this by multiplying the same row by the constant 1/c. Note now that each
elementary matrix is obtained from I
n
by an elementary row operation. The inverse of this elementary
matrix is clearly the elementary matrix obtained from I
n
by the elementary row operation that reverses
the original elementary row operation.
Suppose that an n n matrix B can be obtained from an n n matrix A by a nite sequence of
elementary row operations. Then since these elementary row operations can be reversed, the matrix A
can be obtained from the matrix B by a nite sequence of elementary row operations.
Definition. An n n matrix A is said to be row equivalent to an n n matrix B if there exist a nite
number of elementary n n matrices E
1
, . . . , E
k
such that B = E
k
. . . E
1
A.
Remark. Note that B = E
k
. . . E
1
A implies that A = E
1
1
. . . E
1
k
B. It follows that if A is row
equivalent to B, then B is row equivalent to A. We usually say that A and B are row equivalent.
The following result gives conditions equivalent to the invertibility of an n n matrix A.
PROPOSITION 2N. Suppose that
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
_
_
,
and that
x =
_
_
x
1
.
.
.
x
n
_
_
and 0 =
_
_
0
.
.
.
0
_
_
are n 1 matrices, where x
1
, . . . , x
n
are variables.
(a) Suppose that the matrix A is invertible. Then the system Ax = 0 of linear equations has only the
trivial solution.
(b) Suppose that the system Ax = 0 of linear equations has only the trivial solution. Then the matrices
A and I
n
are row equivalent.
(c) Suppose that the matrices A and I
n
are row equivalent. Then A is invertible.
Proof. (a) Suppose that x
0
is a solution of the system Ax = 0. Then since A is invertible, we have
x
0
= I
n
x
0
= (A
1
A)x
0
= A
1
(Ax
0
) = A
1
0 = 0.
It follows that the trivial solution is the only solution.
(b) Note that if the system Ax = 0 of linear equations has only the trivial solution, then it can be
reduced by elementary row operations to the system
x
1
= 0, . . . , x
n
= 0.
This is equivalent to saying that the array
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
0
.
.
.
0
_
_
can be reduced by elementary row operations to the reduced row echelon form
_
_
1 . . . 0
.
.
.
.
.
.
0 . . . 1
0
.
.
.
0
_
_
.
Hence the matrices A and I
n
are row equivalent.
(c) Suppose that the matrices A and I
n
are row equivalent. Then there exist elementary nn matrices
E
1
, . . . , E
k
such that I
n
= E
k
. . . E
1
A. By Proposition 2M, the matrices E
1
, . . . , E
k
are all invertible, so
that
A = E
1
1
. . . E
1
k
I
n
= E
1
1
. . . E
1
k
is a product of invertible matrices, and is therefore itself invertible.
2.7. Consequences of Invertibility
Suppose that the matrix
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
_
_
is invertible. Consider the system Ax = b, where
x =
_
_
x
1
.
.
.
x
n
_
_
and b =
_
_
b
1
.
.
.
b
n
_
_
are n1 matrices, where x
1
, . . . , x
n
are variables and b
1
, . . . , b
n
R are arbitrary. Since A is invertible,
let us consider x = A
1
b. Clearly
Ax = A(A
1
b) = (AA
1
)b = I
n
b = b,
so that x = A
1
b is a solution of the system. On the other hand, let x
0
be any solution of the system.
Then Ax
0
= b, so that
x
0
= I
n
x
0
= (A
1
A)x
0
= A
1
(Ax
0
) = A
1
b.
It follows that the system has unique solution. We have proved the following important result.
PROPOSITION 2P. Suppose that
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
_
_
,
and that
x =
_
_
x
1
.
.
.
x
n
_
_
and b =
_
_
b
1
.
.
.
b
n
_
_
1
, . . . , x
n
are variables and b
1
, . . . , b
n
R are arbitrary. Suppose further
that the matrix A is invertible. Then the system Ax = b of linear equations has the unique solution
x = A
1
b.
We next attempt to study the question in the opposite direction.
PROPOSITION 2Q. Suppose that
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
_
_
,
and that
x =
_
_
x
1
.
.
.
x
n
_
_
and b =
_
_
b
1
.
.
.
b
n
_
_
1
, . . . , x
n
are variables. Suppose further that for every b
1
, . . . , b
n
R, the
system Ax = b of linear equations is soluble. Then the matrix A is invertible.
Proof. Suppose that
b
1
=
_
_
_
_
_
_
1
0
.
.
.
0
0
_
_
_
_
_
_
, . . . , b
n
=
_
_
_
_
_
_
0
0
.
.
.
0
1
_
_
_
_
_
_
.
In other words, for every j = 1, . . . , n, b
j
is an n1 matrix with entry 1 on row j and entry 0 elsewhere.
Now let
x
1
=
_
_
x
11
.
.
.
x
n1
_
_
, . . . , x
n
=
_
_
x
1n
.
.
.
x
nn
_
_
denote respectively solutions of the systems of linear equations
Ax = b
1
, . . . , Ax = b
n
.
It is easy to check that
A( x
1
. . . x
n
) = ( b
1
. . . b
n
) ;
in other words,
A
_
_
x
11
. . . x
1n
.
.
.
.
.
.
x
n1
. . . x
nn
_
_
= I
n
,
so that A is invertible.
We can now summarize Propositions 2N, 2P and 2Q as follows.
PROPOSITION 2R. In the notation of Proposition 2N, the following four statements are equivalent:
(a) The matrix A is invertible.
(b) The system Ax = 0 of linear equations has only the trivial solution.
(c) The matrices A and I
n
are row equivalent.
(d) The system Ax = b of linear equations is soluble for every n 1 matrix b.
2.8. Application to Economics
In this section, we describe briey the Leontief input-output model, where an economy is divided into n
sectors.
For every i = 1, . . . , n, let x
i
denote the monetary value of the total output of sector i over a xed
period, and let d
i
denote the output of sector i needed to satisfy outside demand over the same xed
period. Collecting together x
i
and d
i
for i = 1, . . . , n, we obtain the vectors
x =
_
_
x
1
.
.
.
x
n
_
_
R
n
and d =
_
_
d
1
.
.
.
d
n
_
_
R
n
,
known respectively as the production vector and demand vector of the economy.
On the other hand, each of the n sectors requires material from some or all of the sectors to produce
its output. For i, j = 1, . . . , n, let c
ij
denote the monetary value of the output of sector i needed by
sector j to produce one unit of monetary value of output. For every j = 1, . . . , n, the vector
c
j
=
_
_
_
c
1j
.
.
.
c
nj
_
_
_ R
n
is known as the unit consumption vector of sector j. Note that the column sum
c
1j
+. . . +c
nj
1 (5)
in order to ensure that sector j does not make a loss. Collecting together the unit consumption vectors,
we obtain the matrix
C = ( c
1
. . . c
n
) =
_
_
c
11
. . . c
1n
.
.
.
.
.
.
c
n1
. . . c
nn
_
_
,
known as the consumption matrix of the economy.
Consider the matrix product
Cx =
_
_
c
11
x
1
+. . . +c
1n
x
n
.
.
.
c
n1
x
1
+. . . +c
nn
x
n
_
_
.
For every i = 1, . . . , n, the entry c
i1
x
1
+. . . +c
in
x
n
represents the monetary value of the output of sector
i needed by all the sectors to produce their output. This leads to the production equation
x = Cx +d. (6)
Here Cx represents the part of the total output that is required by the various sectors of the economy
to produce the output in the rst place, and d represents the part of the total output that is available
to satisfy outside demand.
Clearly (I C)x = d. If the matrix I C is invertible, then
x = (I C)
1
d
represents the perfect production level. We state without proof the following fundamental result.
PROPOSITION 2S. Suppose that the entries of the consumption matrix C and the demand vector d
are non-negative. Suppose further that the inequality (5) holds for each column of C. Then the inverse
matrix (I C)
1
exists, and the production vector x = (I C)
1
d has non-negative entries and is the
unique solution of the production equation (6).
Let us indulge in some heuristics. Initially, we have demand d. To produce d, we need Cd as input.
To produce this extra Cd, we need C(Cd) = C
2
d as input. To produce this extra C
2
d, we need
C(C
2
d) = C
3
d as input. And so on. Hence we need to produce
d +Cd +C
2
d +C
3
d +. . . = (I +C +C
2
+C
3
+. . .)d
in total. Now it is not dicult to check that for every positive integer k, we have
(I C)(I +C +C
2
+C
3
+. . . +C
k
) = I C
k+1
.
If the entries of C
k+1
are all very small, then
(I C)(I +C +C
2
+C
3
+. . . +C
k
) I,
so that
(I C)
1
I +C +C
2
+C
3
+. . . +C
k
.
This gives a practical way of approximating (I C)
1
, and also suggests that
(I C)
1
= I +C +C
2
+C
3
+. . . .
Example 2.8.1. An economy consists of three sectors. Their dependence on each other is summarized
in the table below:
To produce one unit of monetary
value of output in sector
1 2 3
monetary value of output required from sector 1 0.3 0.2 0.1
Suppose that the nal demand from sectors 1, 2 and 3 are respectively 30, 50 and 20. Then the production
vector and demand vector are respectively
x =
_
_
x
1
x
2
x
3
_
_
and d =
_
_
d
1
d
2
d
3
_
_
=
_
_
30
50
20
_
_
,
while the consumption matrix is given by
C =
_
_
0.3 0.2 0.1
0.4 0.5 0.2
0.1 0.1 0.3
_
_
, so that I C =
_
_
0.7 0.2 0.1
0.4 0.5 0.2
0.1 0.1 0.7
_
_
.
The production equation (I C)x = d has augmented matrix
_
_
0.7 0.2 0.1
0.4 0.5 0.2
0.1 0.1 0.7
30
50
20
_
_
, equivalent to
_
_
7 2 1
4 5 2
1 1 7
300
500
200
_
_
,
and which can be converted to reduced row echelon form
_
_
1 0 0
0 1 0
0 0 1
3200/27
6100/27
700/9
_
_
.
This gives x
1
119, x
2
226 and x
3
78, to the nearest integers.
2.9. Matrix Transformation on the Plane
Let A be a 2 2 matrix with real entries. A matrix transformation T : R
2
R
2
can be dened as
follows: For every x = (x
1
, x
2
) R, we write T(x) = y, where y = (y
1
, y
2
) R
2
satises
_
y
1
y
2
_
= A
_
x
1
x
2
_
.
Such a transformation is linear, in the sense that T(x
+x
) = T(x
) +T(x
) for every x
, x
R
2
and
T(cx) = cT(x) for every x R
2
and every c R. To see this, simply observe that
A
_
x
1
+x
1
x
2
+x
2
_
= A
_
x
1
x
2
_
+A
_
x
1
x
2
_
and A
_
cx
1
cx
2
_
= cA
_
x
1
x
2
_
.
We shall study linear transformations in greater detail in Chapter 8. Here we conne ourselves to looking
at a few simple matrix transformations on the plane.
Example 2.9.1. The matrix
A =
_
1 0
0 1
_
satises A
_
x
1
x
2
_
=
_
1 0
0 1
__
x
1
x
2
_
=
_
x
1
x
2
_
for every (x
1
, x
2
) R
2
, and so represents reection across the x
1
-axis, whereas the matrix
A =
_
1 0
0 1
_
satises A
_
x
1
x
2
_
=
_
1 0
0 1
__
x
1
x
2
_
=
_
x
1
x
2
_
for every (x
1
, x
2
) R
2
, and so represents reection across the x
2
-axis. On the other hand, the matrix
A =
_
1 0
0 1
_
satises A
_
x
1
x
2
_
=
_
1 0
0 1
__
x
1
x
2
_
=
_
x
1
x
2
_
for every (x
1
, x
2
) R
2
, and so represents reection across the origin, whereas the matrix
A =
_
0 1
1 0
_
satises A
_
x
1
x
2
_
=
_
0 1
1 0
__
x
1
x
2
_
=
_
x
2
x
1
_
for every (x
1
, x
2
) R
2
, and so represents reection across the line x
1
= x
2
. We give a summary in the
table below:
Transformation Equations Matrix
Reection across x
1
-axis
_
y
1
= x
1
y
2
= x
2
_
1 0
0 1
_
Reection across x
2
-axis
_
y
1
= x
1
y
2
= x
2
_
1 0
0 1
_
Reection across origin
_
y
1
= x
1
y
2
= x
2
_
1 0
0 1
_
Reection across x
1
= x
2
_
y
1
= x
2
y
2
= x
1
_
0 1
1 0
_
Example 2.9.2. Let k be a xed positive real number. The matrix
A =
_
k 0
0 k
_
satises A
_
x
1
x
2
_
=
_
k 0
0 k
__
x
1
x
2
_
=
_
kx
1
kx
2
_
for every (x
1
, x
2
) R
2
, and so represents a dilation if k > 1 and a contraction if 0 < k < 1. On the
other hand, the matrix
A =
_
k 0
0 1
_
satises A
_
x
1
x
2
_
=
_
k 0
0 1
__
x
1
x
2
_
=
_
kx
1
x
2
_
for every (x
1
, x
2
) R
2
, and so represents an expansionn in the x
1
-direction if k > 1 and a compression
in the x
1
-direction if 0 < k < 1, whereas the matrix
A =
_
1 0
0 k
_
satises A
_
x
1
x
2
_
=
_
1 0
0 k
__
x
1
x
2
_
=
_
x
1
kx
2
_
for every (x
1
, x
2
) R
2
, and so represents a expansion in the x
2
-direction if k > 1 and a compression in
the x
2
-direction if 0 < k < 1. We give a summary in the table below:
Dilation or contraction by factor k > 0
_
y
1
= kx
1
y
2
= kx
2
_
k 0
0 k
_
Expansion or compression in x
1
-direction by factor k > 0
_
y
1
= kx
1
y
2
= x
2
_
k 0
0 1
_
2
_
y
1
= x
1
y
2
= kx
2
_
1 0
0 k
_
Example 2.9.3. Let k be a xed real number. The matrix
A =
_
1 k
0 1
_
satises A
_
x
1
x
2
_
=
_
1 k
0 1
__
x
1
x
2
_
=
_
x
1
+kx
2
x
2
_
for every (x
1
, x
2
) R
2
, and so represents a shear in the x
1
-direction. For the case k = 1, we have the
following:
A =
k 0
0 k
satises A
x
1
x
2
k 0
0 k
x
1
x
2
kx
1
kx
2
for every (x
1
, x
2
) R
2
A =
k 0
0 1
satises A
x
1
x
2
k 0
0 1
x
1
x
2
kx
1
x
2
for every (x
1
, x
2
) R
2
1
in the x
1
A =
1 0
0 k
satises A
x
1
x
2
1 0
0 k
x
1
x
2
x
1
kx
2
for every (x
1
, x
2
) R
2
2
the x
2
y
1
= kx
1
y
2
= kx
2
k 0
0 k
1
y
1
= kx
1
y
2
= x
2
k 0
0 1
2
y
1
= x
1
y
2
= kx
2
1 0
0 k

A =
1 k
0 1
satises A
x
1
x
2
1 k
0 1
x
1
x
2
x
1
+kx
2
x
2
for every (x
1
, x
2
) R
2
1
following:

T
(k=1)
For the case k = 1, we have the following:

T
(k=1)
A =
k 0
0 k
satises A
x
1
x
2
k 0
0 k
x
1
x
2
kx
1
kx
2
for every (x
1
, x
2
) R
2
A =
k 0
0 1
satises A
x
1
x
2
k 0
0 1
x
1
x
2
kx
1
x
2
for every (x
1
, x
2
) R
2
1
in the x
1
A =
1 0
0 k
satises A
x
1
x
2
1 0
0 k
x
1
x
2
x
1
kx
2
for every (x
1
, x
2
) R
2
2
the x
2
y
1
= kx
1
y
2
= kx
2
k 0
0 k
1
y
1
= kx
1
y
2
= x
2
k 0
0 1
2
y
1
= x
1
y
2
= kx
2
1 0
0 k

A =
1 k
0 1
satises A
x
1
x
2
1 k
0 1
x
1
x
2
x
1
+kx
2
x
2
for every (x
1
, x
2
) R
2
1
following:

T
(k=1)

T
(k=1)
Chapter 2 : Matrices page 21 of 39 Chapter 2 : Matrices page 21 of 39
Similarly, the matrix
A =
_
1 0
k 1
_
satises A
_
x
1
x
2
_
=
_
1 0
k 1
__
x
1
x
2
_
=
_
x
1
kx
1
+x
2
_
for every (x
1
, x
2
) R
2
2
-direction. We give a summary in the table
below:
Shear in x
1
-direction
_
y
1
= x
1
+kx
2
y
2
= x
2
_
1 k
0 1
_
Shear in x
2
-direction
_
y
1
= x
1
y
2
= kx
1
+x
2
_
1 0
k 1
_
Example 2.9.4. For anticlockwise rotation by an angle , we have T(x
1
, x
2
) = (y
1
, y
2
), where
y
1
+ iy
2
= (x
1
+ ix
2
)(cos + i sin ),
and so
_
y
1
y
2
_
=
_
cos sin
sin cos
__
x
1
x
2
_
.
It follows that the matrix in question is given by
A =
_
cos sin
sin cos
_
.
We give a summary in the table below:
Anticlockwise rotation by angle
_
y
1
= x
1
cos x
2
sin
y
2
= x
1
sin +x
2
cos
_
cos sin
sin cos
_
We conclude this section by establishing the following result which reinforces the linearity of matrix
transformations on the plane.
PROPOSITION 2T. Suppose that a matrix transformation T : R
2
R
2
is given by an invertible
matrix A. Then
(a) the image under T of a straight line is a straight line;
(b) the image under T of a straight line through the origin is a straight line through the origin; and
(c) the images under T of parallel straight lines are parallel straight lines.
Proof. Suppose that T(x
1
, x
2
) = (y
1
, y
2
). Since A is invertible, we have x = A
1
y, where
x =
_
x
1
x
2
_
and y =
_
y
1
y
2
_
.
The equation of a straight line is given by x
1
+x
2
= or, in matrix form, by
( )
_
x
1
x
2
_
= ( ) .
Hence
( ) A
1
_
y
1
y
2
_
= ( ) .
Let
(
) = ( ) A
1
.
Then
(
)
_
y
1
y
2
_
= ( ) .
In other words, the image under T of the straight line x
1
+x
2
= is
y
1
+
y
2
= , clearly another
straight line. This proves (a). To prove (b), note that straight lines through the origin correspond to
= 0. To prove (c), note that parallel straight lines correspond to dierent values of for the same
values of and .
2.10. Application to Computer Graphics
Example 2.10.1. Consider the letter M in the diagram below:
Hence
( ) A
1
y
1
y
2
= ( ) .
Let
(
) = ( ) A
1
.
Then
(
y
1
y
2
= ( ) .
1
+x
2
= is
y
1
+
y
2
= , clearly another
values of and .
2.10. Application to Computer Graphics
Example 2.10.1. Consider the letter M in the diagram below:
Following the boundary in the anticlockwise direction starting at the origin, the 12 vertices can be
represented by the coordinates
0
0
1
0
1
6
4
0
7
6
7
0
8
0
8
8
7
8
4
2
1
8
0
8
.
Let us apply a matrix transformation to these vertices, using the matrix
A =
1
1
2
0 1
,
representing a shear in the x
1
-direction with factor 0.5, so that
A
x
1
x
2
x
1
+
1
2
x
2
x
2
for every (x
1
, x
2
) R
2
.
Following the boundary in the anticlockwise direction starting at the origin, the 12 vertices can be
represented by the coordinates
_
0
0
_
,
_
1
0
_
,
_
1
6
_
,
_
4
0
_
,
_
7
6
_
,
_
7
0
_
,
_
8
0
_
,
_
8
8
_
,
_
7
8
_
,
_
4
2
_
,
_
1
8
_
,
_
0
8
_
.
Let us apply a matrix transformation to these vertices, using the matrix
A =
_
1
1
2
0 1
_
,
representing a shear in the x
1
-direction with factor 0.5, so that
A
_
x
1
x
2
_
=
_
x
1
+
1
2
x
2
x
2
_
for every (x
1
, x
2
) R
2
.
Then the images of the 12 vertices are respectively
_
0
0
_
,
_
1
0
_
,
_
4
6
_
,
_
4
0
_
,
_
10
6
_
,
_
7
0
_
,
_
8
0
_
,
_
12
8
_
,
_
11
8
_
,
_
5
2
_
,
_
5
8
_
,
_
4
8
_
,
noting that
_
1
1
2
0 1
__
0 1 1 4 7 7 8 8 7 4 1 0
0 0 6 0 6 0 0 8 8 2 8 8
_
=
_
0 1 4 4 10 7 8 12 11 5 5 4
0 0 6 0 6 0 0 8 8 2 8 8
_
.
In view of Proposition 2T, the image of any line segment that joins two vertices is a line segment that
joins the images of the two vertices. Hence the image of the letter M under the shear looks like the
following:
Then the images of the 12 vertices are respectively
0
0
1
0
4
6
4
0
10
6
7
0
8
0
12
8
11
8
5
2
5
8
4
8
,
noting that
1
1
2
0 1
0 1 1 4 7 7 8 8 7 4 1 0
0 0 6 0 6 0 0 8 8 2 8 8
0 1 4 4 10 7 8 12 11 5 5 4
0 0 6 0 6 0 0 8 8 2 8 8
.
In view of Proposition 2T, the image of any line segment that joins two vertices is a line segment that
joins the images of the two vertices. Hence the image of the letter M under the shear looks like the
following:
Next, we may wish to translate this image. However, a translation is a transformation by vector
h = (h
1
, h
2
) R
2
is of the form
y
1
y
2
x
1
x
2
h
1
h
2
for every (x
1
, x
2
) R
2
,
and this cannot be described by a matrix transformation on the plane. To overcome this deciency,
we introduce homogeneous coordinates. For every point (x
1
, x
2
) R
2
, we identify it with the point
(x
1
, x
2
, 1) R
3
. Now we wish to translate a point (x
1
, x
2
) to (x
1
, x
2
) + (h
1
, h
2
) = (x
1
+h
1
, x
2
+h
2
), so
we attempt to nd a 3 3 matrix A
such that
x
1
+h
1
x
2
+h
2
1
= A
x
1
x
2
1
for every (x
1
, x
2
) R
2
.
x
1
+h
1
x
2
+h
2
1
1 0 h
1
0 1 h
2
0 0 1
x
1
x
2
1
for every (x
1
, x
2
) R
2
.
It follows that using homogeneous coordinates, translation by vector h = (h
1
, h
2
) R
2
can be described
by the matrix
A
1 0 h
1
0 1 h
2
0 0 1
.
Next, we may wish to translate this image. However, a translation is a transformation by vector
h = (h
1
, h
2
) R
2
is of the form
_
y
1
y
2
_
=
_
x
1
x
2
_
+
_
h
1
h
2
_
for every (x
1
, x
2
) R
2
,
and this cannot be described by a matrix transformation on the plane. To overcome this deciency,
we introduce homogeneous coordinates. For every point (x
1
, x
2
) R
2
, we identify it with the point
(x
1
, x
2
, 1) R
3
. Now we wish to translate a point (x
1
, x
2
) to (x
1
, x
2
) +(h
1
, h
2
) = (x
1
+h
1
, x
2
+h
2
), so
we attempt to nd a 3 3 matrix A
such that
_
_
x
1
+h
1
x
2
+h
2
1
_
_
= A
_
_
x
1
x
2
1
_
_
for every (x
1
, x
2
) R
2
.
_
_
x
1
+h
1
x
2
+h
2
1
_
_
=
_
_
1 0 h
1
0 1 h
2
0 0 1
_
_
_
_
x
1
x
2
1
_
_
for every (x
1
, x
2
) R
2
.
It follows that using homogeneous coordinates, translation by vector h = (h
1
, h
2
) R
2
can be described
by the matrix
A
=
_
_
1 0 h
1
0 1 h
2
0 0 1
_
_
.
Remark. Consider a matrix transformation T : R
2
R
2
on the plane given by a matrix
A =
_
a
11
a
12
a
21
a
22
_
.
Suppose that T(x
1
, x
2
) = (y
1
, y
2
). Then
_
y
1
y
2
_
= A
_
x
1
x
2
_
=
_
a
11
a
12
a
21
a
22
__
x
1
x
2
_
.
Under homogeneous coordinates, the image of the point (x
1
, x
2
, 1) is now (y
1
, y
2
, 1). Note that
_
_
y
1
y
2
1
_
_
=
_
_
a
11
a
12
0
a
21
a
22
0
0 0 1
_
_
_
_
x
1
x
2
1
_
_
.
It follows that homogeneous coordinates can also be used to study all the matrix transformations we
have discussed in Section 2.9. By moving over to homogeneous coordinates, we simply replace the 2 2
matrix A by the 3 3 matrix
A
=
_
A 0
0 1
_
.
Example 2.10.2. Returning to Example 2.10.1 of the letter M, the 12 vertices are now represented by
homogeneous coordinates, put in an array in the form
_
_
0 1 1 4 7 7 8 8 7 4 1 0
0 0 6 0 6 0 0 8 8 2 8 8
1 1 1 1 1 1 1 1 1 1 1 1
_
_
.
Then the 2 2 matrix
A =
_
1
1
2
0 1
_
is now replaced by the 3 3 matrix
A
=
_
_
1
1
2
0
0 1 0
0 0 1
_
_
.
Note that
A
_
_
0 1 1 4 7 7 8 8 7 4 1 0
0 0 6 0 6 0 0 8 8 2 8 8
1 1 1 1 1 1 1 1 1 1 1 1
_
_
=
_
_
1
1
2
0
0 1 0
0 0 1
_
_
_
_
0 1 1 4 7 7 8 8 7 4 1 0
0 0 6 0 6 0 0 8 8 2 8 8
1 1 1 1 1 1 1 1 1 1 1 1
_
_
=
_
_
0 1 4 4 10 7 8 12 11 5 5 4
0 0 6 0 6 0 0 8 8 2 8 8
1 1 1 1 1 1 1 1 1 1 1 1
_
_
.
Next, let us consider a translation by the vector (2, 3). The matrix under homogeneous coordinates for
this translation is given by
B
=
_
_
1 0 2
0 1 3
0 0 1
_
_
.
Note that
B
_
_
0 1 1 4 7 7 8 8 7 4 1 0
0 0 6 0 6 0 0 8 8 2 8 8
1 1 1 1 1 1 1 1 1 1 1 1
_
_
=
_
_
1 0 2
0 1 3
0 0 1
_
_
_
_
0 1 4 4 10 7 8 12 11 5 5 4
0 0 6 0 6 0 0 8 8 2 8 8
1 1 1 1 1 1 1 1 1 1 1 1
_
_
=
_
_
2 3 6 6 12 9 10 14 13 7 7 6
3 3 9 3 9 3 3 11 11 5 11 11
1 1 1 1 1 1 1 1 1 1 1 1
_
_
,
giving rise to coordinates in R
2
, displayed as an array
_
2 3 6 6 12 9 10 14 13 7 7 6
3 3 9 3 9 3 3 11 11 5 11 11
_
Hence the image of the letter M under the shear followed by translation looks like the following:
Note that
B
0 1 1 4 7 7 8 8 7 4 1 0
0 0 6 0 6 0 0 8 8 2 8 8
1 1 1 1 1 1 1 1 1 1 1 1
1 0 2
0 1 3
0 0 1
0 1 4 4 10 7 8 12 11 5 5 4
0 0 6 0 6 0 0 8 8 2 8 8
1 1 1 1 1 1 1 1 1 1 1 1
2 3 6 6 12 9 10 14 13 7 7 6
3 3 9 3 9 3 3 11 11 5 11 11
1 1 1 1 1 1 1 1 1 1 1 1
,
giving rise to coordinates in R
2
, displayed as an array
2 3 6 6 12 9 10 14 13 7 7 6
3 3 9 3 9 3 3 11 11 5 11 11
Hence the image of the letter M under the shear followed by translation looks like the following:
Example 2.10.3. Under homogeneous coordinates, the transformation representing a reection across
the x
1
-axis, followed by a shear by factor 2 in the x
1
-direction, followed by anticlockwise rotation by
90
, and followed by translation by vector (2, 1), has matrix
1 0 2
0 1 1
0 0 1
0 1 0
1 0 0
0 0 1
1 2 0
0 1 0
0 0 1
1 0 0
0 1 0
0 0 1
0 1 2
1 2 1
0 0 1
.
2.11. Complexity of a Non-Homogeneous System
Consider the problem of solving a system of linear equations of the form Ax = b, where A is an n n
invertible matrix. We are interested in the number of operations required to solve such a system. By an
operation, we mean interchanging, adding or multiplying two real numbers.
Example 2.10.3. Under homogeneous coordinates, the transformation representing a reection across
the x
1
-axis, followed by a shear by factor 2 in the x
1
-direction, followed by anticlockwise rotation by
90
, and followed by translation by vector (2, 1), has matrix

_
_
1 0 2
0 1 1
0 0 1
_
_
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_
1 2 0
0 1 0
0 0 1
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
=
_
_
0 1 2
1 2 1
0 0 1
_
_
.
2.11. Complexity of a Non-Homogeneous System
Consider the problem of solving a system of linear equations of the form Ax = b, where A is an n n
invertible matrix. We are interested in the number of operations required to solve such a system. By an
operation, we mean interchanging, adding or multiplying two real numbers.
One way of solving the system Ax = b is to write down the augmented matrix
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
b
1
.
.
.
b
n
_
_
, (7)
and then convert it to reduced row echelon form by elementary row operations.
The rst step is to reduce it to row echelon form:
(I) First of all, we may need to interchange two rows in order to ensure that the top left entry in the
array is non-zero. This requires n + 1 operations.
(II) Next, we need to multiply the new rst row by a constant in order to make the top left pivot
entry equal to 1. This requires n + 1 operations, and the array now looks like
_
_
_
_
1 a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
a
n1
a
n2
. . . a
nn
b
1
b
2
.
.
.
b
n
_
_
_
_
.
Note that we are abusing notation somewhat, as the entry a
12
here, for example, may well be dierent
from the entry a
12
in the augmented matrix (7).
(III) For each row i = 2, . . . , n, we now multiply the rst row by a
i1
and then add to row i. This
requires 2(n 1)(n + 1) operations, and the array now looks like
_
_
_
_
1 a
12
. . . a
1n
0 a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
0 a
n2
. . . a
nn
b
1
b
2
.
.
.
b
n
_
_
_
_
. (8)
(IV) In summary, to proceed from the form (7) to the form (8), the number of operations required is
at most 2(n + 1) + 2(n 1)(n + 1) = 2n(n + 1).
(V) Our next task is to convert the smaller array
_
_
a
22
. . . a
2n
.
.
.
.
.
.
a
n2
. . . a
nn
b
2
.
.
.
b
n
_
_
to an array that looks like
_
_
_
_
1 a
23
. . . a
2n
0 a
33
. . . a
3n
.
.
.
.
.
.
.
.
.
0 a
n3
. . . a
nn
b
2
b
3
.
.
.
b
n
_
_
_
_
.
These have one row and one column fewer than the arrays (7) and (8), and the number of operations
required is at most 2m(m + 1), where m = n 1. We continue in this way systematically to reach row
echelon form, and conclude that the number of operations required to convert the augmented matrix (7)
to row echelon form is at most
n
m=1
2m(m+ 1)
2
3
n
3
.
The next step is to convert the row echelon form to reduced row echelon form. This is simpler, as
many entries are now zero. It can be shown that the number of operations required is bounded by
something like 2n
2
indeed, by something like n
2
if one analyzes the problem more carefully. In any
case, these estimates are insignicant compared to the estimate
2
3
n
3
earlier.
We therefore conclude that the number of operations required to solve the system Ax = b by reducing
the augmented matrix to reduced row echelon form is bounded by something like
2
3
n
3
when n is large.
Another way of solving the system Ax = b is to rst nd the inverse matrix A
1
. This may involve
converting the array
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
1
.
.
.
1
_
_
to reduced row echelon form by elementary row operations. It can be shown that the number of operations
required is something like 2n
3
, so this is less ecient than our rst method.
2.12. Matrix Factorization
In some situations, we may need to solve systems of linear equations of the form Ax = b, with the same
coecient matrix A but for many dierent vectors b. If A is an invertible square matrix, then we can
nd its inverse A
1
and then compute A
1
b for each vector b. However, the matrix A may not be a
square matrix, and we may have to convert the augmented matrix to reduced row echelon form.
In this section, we describe a way for solving this problem in a more ecient way. To describe this,
we rst need a denition.
Definition. A rectangular array of numbers is said to be in quasi row echelon form if the following
(1) The left-most non-zero entry of any non-zero row is called a pivot entry. It is not necessary for its
value to be equal to 1.
In other words, the array looks like row echelon form in shape, except that the pivot entries do not
have to be equal to 1.
We consider rst of all a special case.
PROPOSITION 2U. Suppose that an mn matrix A can be converted to quasi row echelon form by
elementary row operations but without interchanging any two rows. Then A = LU, where L is an mm
lower triangular matrix with diagonal entries all equal to 1 and U is a quasi row echelon form of A.
Sketch of Proof. Recall that applying an elementary row operation to an mn matrix corresponds
to multiplying the matrix on the left by an elementary m m matrix. On the other hand, if we are
aiming for quasi row echelon form and not row echelon form, then there is no need to multiply any row
of the array by a non-zero constant. Hence the only elementary row operation we need to perform is
to add a multiple of one row to another row. In fact, it is sucient even to restrict this to adding a
multiple of a row higher in the array to another row lower in the array, and it is easy to see that the
corresponding elementary matrix is lower triangular, with diagonal entries all equal to 1. Let us call
such elementary matrices unit lower triangular. If an m n matrix A can be reduced in this way to
quasi row echelon form U, then
U = E
k
. . . E
2
E
1
A,
where the elementary matrices E
1
, E
2
, . . . , E
k
are all unit lower triangular. Let L = (E
k
. . . E
2
E
1
)
1
.
Then A = LU. It can be shown that products and inverses of unit lower triangular matrices are also
unit lower triangular. Hence L is a unit lower triangular matrix as required.
If Ax = b and A = LU, then L(Ux) = b. Writing y = Ux, we have
Ly = b and Ux = y.
It follows that the problem of solving the system Ax = b corresponds to rst solving the system Ly = b
and then solving the system Ux = y. Both of these systems are easy to solve since both L and U have
many zero entries. It remains to nd L and U.
If we reduce the matrix A to quasi row echelon form by only performing the elementary row operation
of adding a multiple of a row higher in the array to another row lower in the array, then U can be
taken as the quasi row echelon form resulting from this. It remains to nd L. However, note that
L = (E
k
. . . E
2
E
1
)
1
, where U = E
k
. . . E
2
E
1
A, and so
I = E
k
. . . E
2
E
1
L.
This means that the very elementary row operations that convert A to U will convert L to I. We
therefore wish to create a matrix L such that this is satised. It is simplest to illustrate the technique
by an example.
A =
_
_
_
2 1 2 2 3
4 1 6 5 8
2 10 4 8 5
2 13 6 16 5
_
_
_.
The entry 2 in row 1 and column 1 is a pivot entry, and column 1 is a pivot column. Adding 2 times
row 1 to row 2, adding 1 times row 1 to row 3, and adding 1 times row 1 to row 4, we obtain
_
_
_
2 1 2 2 3
0 3 2 1 2
0 9 6 10 8
0 12 8 18 8
_
_
_.
Note that the same three elementary row operations convert
_
_
_
1 0 0 0
2 1 0 0
1 1 0
1 1
_
_
_ to
_
_
_
1 0 0 0
0 1 0 0
0 1 0
0 1
_
_
_.
Next, the entry 3 in row 2 and column 2 is a pivot entry, and column 2 is a pivot column. Adding 3
times row 2 to row 3, and adding 4 times row 2 to row 4, we obtain
_
_
_
2 1 2 2 3
0 3 2 1 2
0 0 0 7 2
0 0 0 14 0
_
_
_.
Note that the same two elementary row operations convert
_
_
_
1 0 0 0
0 1 0 0
0 3 1 0
0 4 1
_
_
_ to
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 1
_
_
_.
Next, the entry 7 in row 3 and column 4 is a pivot entry, and column 4 is a pivot column. Adding 2
times row 3 to row 4, we obtain the quasi row echelon form
U =
_
_
_
2 1 2 2 3
0 3 2 1 2
0 0 0 7 2
0 0 0 0 4
_
_
_,
where the entry 4 in row 4 and column 5 is a pivot entry, and column 5 is a pivot column. Note that
the same elementary row operation converts
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 2 1
_
_
_ to
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_.
Now observe that if we take
L =
_
_
_
1 0 0 0
2 1 0 0
1 3 1 0
1 4 2 1
_
_
_,
then L can be converted to I
4
by the same elementary operations that convert A to U.
The strategy is now clear. Every time we nd a new pivot, we note its value and the entries below it.
The lower triangular entries of L are formed by these columns with each column divided by the value of
the pivot entry in that column.
Example 2.12.2. Let us examine our last example again. The pivot columns at the time of establishing
the pivot entries are respectively
_
_
_
2
4
2
2
_
_
_,
_
_
_
3
9
12
_
_
_,
_
_
_
7
14
_
_
_,
_
_
_
4
_
_
_.
Dividing them respectively by the pivot entries 2, 3, 7 and 4, we obtain respectively the columns
_
_
_
1
2
1
1
_
_
_,
_
_
_
1
3
4
_
_
_,
_
_
_
1
2
_
_
_,
_
_
_
1
_
_
_.
Note that the lower triangular entries of the matrix
L =
_
_
_
1 0 0 0
2 1 0 0
1 3 1 0
1 4 2 1
_
_
_
correspond precisely to the entries in these columns.
LU FACTORIZATION ALGORITHM.
(1) Reduce the matrix A to quasi row echelon form by only performing the elementary row operation of
adding a multiple of a row higher in the array to another row lower in the array. Let U be the quasi
row echelon form obtained.
(2) Record any new pivot column at the time of its rst recognition, and modify it by replacing any entry
above the pivot entry by zero and dividing every other entry by the value of the pivot entry.
(3) Let L denote the square matrix obtained by letting the columns be the pivot columns as modied in
step (2).
Example 2.12.3. We wish to solve the system of linear equations Ax = b, where
A =
_
_
_
3 1 2 4 1
3 3 5 5 2
6 4 11 10 6
6 8 21 13 9
_
_
_ and b =
_
_
_
1
2
9
15
_
_
_.
Let us rst apply LU factorization to the matrix A. The rst pivot column is column 1, with modied
version
_
_
_
1
1
2
2
_
_
_.
Adding row 1 to row 2, adding 2 times row 1 to row 3, and adding 2 times row 1 to row 4, we obtain
_
_
_
3 1 2 4 1
0 2 3 1 1
0 2 7 2 4
0 6 17 5 7
_
_
_.
The second pivot column is column 2, with modied version
_
_
_
0
1
1
3
_
_
_.
Adding row 2 to row 3, and adding 3 times row 2 to row 4, we obtain
_
_
_
3 1 2 4 1
0 2 3 1 1
0 0 4 1 3
0 0 8 2 4
_
_
_.
The third pivot column is column 3, with modied version
_
_
_
0
0
1
2
_
_
_.
Adding 2 times row 3 to row 4, we obtain the quasi row echelon form
_
_
_
3 1 2 4 1
0 2 3 1 1
0 0 4 1 3
0 0 0 0 2
_
_
_.
The last pivot column is column 5, with modied version
_
_
_
0
0
0
1
_
_
_.
It follows that
L =
_
_
_
1 0 0 0
1 1 0 0
2 1 1 0
2 3 2 1
_
_
_ and U =
_
_
_
3 1 2 4 1
0 2 3 1 1
0 0 4 1 3
0 0 0 0 2
_
_
_.
We now consider the system Ly = b, with augmented matrix
_
_
_
1 0 0 0
1 1 0 0
2 1 1 0
2 3 2 1
1
2
9
15
_
_
_.
Using row 1, we obtain y
1
= 1. Using row 2, we obtain y
2
y
1
= 2, so that y
2
= 1. Using row 3, we
obtain y
3
+ 2y
1
y
2
= 9, so that y
3
= 6. Using row 4, we obtain y
4
2y
1
+ 3y
2
2y
3
= 15, so that
y
4
= 2. Hence
y =
_
_
_
1
1
6
2
_
_
_.
We next consider the system Ux = y, with augmented matrix
_
_
_
3 1 2 4 1
0 2 3 1 1
0 0 4 1 3
0 0 0 0 2
1
1
6
2
_
_
_.
Here the free variable is x
4
. Let x
4
= t. Using row 4, we obtain 2x
5
= 2, so that x
5
= 1. Using row 3,
we obtain 4x
3
= 6 +x
4
3x
5
= 3 +t, so that x
3
=
3
4
+
1
4
t. Using row 2, we obtain
2x
2
= 1 + 3x
3
x
4
+x
5
=
9
4

1
4
t,
so that x
2
=
9
8
1
8
t. Using row 1, we obtain 3x
1
= 1+x
2
2x
3
+4x
4
x
5
=
27
8
t
3
8
, so that x
1
=
9
8
t
1
8
.
Hence
(x
1
, x
2
, x
3
, x
4
, x
5
) =
_
9t 1
8
,
9 t
8
,
3 +t
4
, t, 1
_
, where t R.
Remarks. (1) In practical situations, interchanging rows is usually necessary to convert a matrix A to
quasi row echelon form. The technique here can be modied to produce a matrix L which is not unit
lower triangular, but which can be made unit lower triangular by interchanging rows.
(2) Computing an LU factorization of an n n matrix takes approximately
2
3
n
3
operations. Solving
the systems Ly = b and Ux = y requires approximately 2n
2
operations.
(3) LU factorization is particularly ecient when the matrix A has many zero entries, in which case
the matrices L and U may also have many zero entries.
2.13. Application to Games of Strategy
Consider a game with two players. Player R, usually known as the row player, has m possible moves,
denoted by i = 1, 2, 3, . . . , m, while player C, usually known as the column player, has n possible moves,
denoted by j = 1, 2, 3, . . . , n. For every i = 1, 2, 3, . . . , m and j = 1, 2, 3, . . . , n, let a
ij
denote the payo
that player C has to make to player R if player R makes move i and player C makes move j. These
numbers give rise to the payo matrix
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
_
_
.
The entries can be positive, negative or zero.
Suppose that for every i = 1, 2, 3, . . . , m, player R makes move i with probability p
i
, and that for
every j = 1, 2, 3, . . . , n, player C makes move j with probability q
j
. Then
p
1
+. . . +p
m
= 1 and q
1
+. . . +q
n
= 1.
Assume that the players make moves independently of each other. Then for every i = 1, 2, 3, . . . , m and
j = 1, 2, 3, . . . , n, the number p
i
q
j
represents the probability that player R makes move i and player C
makes move j. Then the double sum
E
A
(p, q) =
m
i=1
n
j=1
a
ij
p
i
q
j
represents the expected payo that player C has to make to player R.
The matrices
p = ( p
1
. . . p
m
) and q =
_
_
_
q
1
.
.
.
q
n
_
_
_
are known as the strategies of player R and player C respectively. Clearly the expected payo
E
A
(p, q) =
m
i=1
n
j=1
a
ij
p
i
q
j
= ( p
1
. . . p
m
)
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
_
_
_
_
_
q
1
.
.
.
q
n
_
_
_ = pAq.
Here we have slightly abused notation. The right hand side is a 1 1 matrix!
We now consider the following problem: Suppose that A is xed. Is it possible for player R to choose
a strategy p to try to maximize the expected payo E
A
(p, q)? Is it possible for player C to choose a
strategy q to try to minimize the expected payo E
A
(p, q)?
FUNDEMENTAL THEOREM OF ZERO SUM GAMES. There exist strategies p
and q
such
that
E
A
(p
, q) E
A
(p
, q
) E
A
(p, q
)
for every strategy p of player R and every strategy q of player C.
Remark. The strategy p
is known as an optimal strategy for player R, and the strategy q
is known as
an optimal strategy for player C. The quantity E
A
(p
, q
) is known as the value of the game. Optimal

strategies are not necessarily unique. However, if p
and q
are another pair of optimal strategies,

then E
A
(p
, q
) = E
A
(p
, q
).
Zero sum games which are strictly determined are very easy to analyze. Here the payo matrix A
contains saddle points. An entry a
ij
in the payo matrix A is called a saddle point if it is a least entry
in its row and a greatest entry in its column. In this case, the strategies
p
= ( 0 . . . 0 1 0 . . . 0 ) and q =
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
,
where the 1s occur in position i in p
and position j in q
, are optimal strategies, so that the value of

the game is a
ij
.
Remark. It is very easy to show that dierent saddle points in the payo matrix have the same value.
Example 2.13.1. In some sports mad school, the teachers require 100 students to each choose between
rowing (R) and cricket (C). However, the students cannot make up their mind, and will only decide
when the identities of the rowing coach and cricket coach are known. There are 3 possible rowing coaches
and 4 possible cricket coaches the school can hire. The number of students who will choose rowing ahead
of cricket in each scenario is as follows, where R1, R2 and R3 denote the 3 possible rowing coaches, and
C1, C2, C3 and C4 denote the 4 possible cricket coaches:
C1 C2 C3 C4
R1 75 50 45 60
R2 20 60 30 55
R3 45 70 35 30
[For example, if coaches R2 and C1 are hired, then 20 students will choose rowing, and so 80 students
will choose cricket.] We rst reset the problem by subtracting 50 from each entry and create a payo
matrix
A =
_
_
25 0 5 10
30 10 20 5
5 20 15 20
_
_
.
[For example, the top left entry denotes that if each sport starts with 50 students, then 25 is the number
cricket concedes to rowing.] Here the entry 5 in row 1 and column 3 is a saddle point, so the optimal
strategy for rowing is to use coach R1 and the optimal strategy for cricket is to use coach C3.
In general, saddle points may not exist, so that the problem is not strictly determined. Then the
solution for these optimal problems are solved by linear programming techniques which we do not
discuss here. However, in the case of 2 2 payo matrices
A =
_
a
11
a
12
a
21
a
22
_
which do not contain saddle points, we can write p
2
= 1 p
1
and q
2
= 1 q
1
. Then
E
A
(p, q) = a
11
p
1
q
1
+a
12
p
1
(1 q
1
) +a
21
(1 p
1
)q
1
+a
22
(1 p
1
)(1 q
1
)
= ((a
11
a
12
a
21
+a
22
)p
1
(a
22
a
21
))q
1
+ (a
12
a
22
)p
1
+a
22
.
Let
p
1
= p
1
=
a
22
a
21
a
11
a
12
a
21
+a
22
.
Then
E
A
(p
, q) =
(a
12
a
22
)(a
22
a
21
)
a
11
a
12
a
21
+a
22
+a
22
=
a
11
a
22
a
12
a
21
a
11
a
12
a
21
+a
22
,
which is independent of q. Similarly, if
q
1
= q
1
=
a
22
a
12
a
11
a
12
a
21
+a
22
,
then
E
A
(p, q
) =
a
11
a
22
a
12
a
21
a
11
a
12
a
21
+a
22
,
which is independent of p. Hence
E
A
(p
, q) = E
A
(p
, q
) = E
A
(p, q
) for all strategies p and q.

Note that
p
=
_
a
22
a
21
a
11
a
12
a
21
+a
22
a
11
a
12
a
11
a
12
a
21
+a
22
_
(9)
and
q
=
_
_
_
a
22
a
12
a
11
a
12
a
21
+a
22
a
11
a
21
a
11
a
12
a
21
+a
22
_
_
_, (10)
with value
E
A
(p
, q
) =
a
11
a
22
a
12
a
21
a
11
a
12
a
21
+a
22
.
1. Consider the four matrices
A =
_
_
2 5
1 4
2 1
_
_
, B =
_
1 7 2 9
9 2 7 1
_
, C =
_
_
_
1 0 4
2 1 3
1 1 5
3 2 1
_
_
_, D =
_
_
1 0 7
2 1 2
1 3 0
_
_
.
Calculate all possible products.
2. In each of the following cases, determine whether the products AB and BA are both dened; if
so, determine also whether AB and BA have the same number of rows and the same number of
columns; if so, determine also whether AB = BA:
a) A =
_
0 3
4 5
_
and B =
_
2 1
3 2
_
b) A =
_
1 1 5
3 0 4
_
and B =
_
_
2 1
3 6
1 5
_
_
c) A =
_
2 1
3 2
_
and B =
_
1 4
12 1
_
d) A =
_
_
3 1 4
2 0 5
1 2 3
_
_
and B =
_
_
2 0 0
0 5 0
0 0 1
_
_
3. Evaluate A
2
, where A =
_
2 5
3 1
_
, and nd , , R, not all zero, such that the matrix
I +A+A
2
is the zero matrix.
4. a) Let A =
_
6 4
9 6
_
. Show that A
2
is the zero matrix.
b) Find all 2 2 matrices B =
_

_
such that B
2
is the zero matrix.
5. Prove that if A and B are matrices such that I AB is invertible, then the inverse of I BA is
given by the formula (I BA)
1
= I +B(I AB)
1
A.
[Hint: Write C = (I AB)
1
. Then show that (I BA)(I +BCA) = I.]
6. For each of the matrices below, use elementary row operations to nd its inverse, if the inverse
exists:
a)
_
_
1 1 1
1 1 1
0 0 1
_
_
b)
_
_
1 2 2
1 5 3
2 6 1
_
_
c)
_
_
1 5 2
1 1 7
0 3 4
_
_
d)
_
_
2 3 4
3 4 2
2 3 3
_
_
e)
_
_
1 a b +c
1 b a +c
1 c a +b
_
_
7. a) Using elementary row operations, show that the inverse of
_
_
_
2 5 8 5
1 2 3 1
2 4 7 2
1 3 5 3
_
_
_ is
_
_
_
3 2 1 5
2 5 2 3
0 2 1 0
1 1 0 1
_
_
_.
b) Without performing any further elementary row operations, use part (a) to solve the system of
linear equations
2x
1
+ 5x
2
+ 8x
3
+ 5x
4
= 0,
x
1
+ 2x
2
+ 3x
3
+ x
4
= 1,
2x
1
+ 4x
2
+ 7x
3
+ 2x
4
= 0,
x
1
+ 3x
2
+ 5x
3
+ 3x
4
= 1.
8. Consider the matrix
A =
_
_
_
1 0 3 1
1 1 5 5
2 1 9 8
2 0 6 3
_
_
_.
a) Use elementary row operations to nd the inverse of A.
b) Without performing any further elementary row operations, use your solution in part (a) to solve
the system of linear equations
x
1
+ 3x
3
+ x
4
= 1,
x
1
+x
2
+ 5x
3
+ 5x
4
= 0,
2x
1
+x
2
+ 9x
3
+ 8x
4
= 0,
2x
1
+ 6x
3
+ 3x
4
= 0.
9. In each of the following, solve the production equation x = Cx +d:
a) C =
_
0.1 0.5
0.6 0.2
_
and d =
_
50000
30000
_
b) C =
_
0 0.6
0.5 0.2
_
and d =
_
36000
22000
_
c) C =
_
_
0.2 0.2 0
0.1 0 0.2
0.3 0.1 0.3
_
_
and d =
_
_
4000000
8000000
6000000
_
_
10. Consider three industries A, B and C. For industry A to manufacture $1 worth of its product,
it needs to purchase 25c worth of product from each of industries B and C. For industry B to
manufacture $1 worth of its product, it needs to purchase 65c worth of product from industry A
and 5c worth of product from industry C, as well as use 5c worth of its own product. For industry
C to manufacture $1 worth of its product, it needs to purchase 55c worth of product from industry
A and 10c worth of product from industry B. In a particular week, industry A receives $500000
worth of outside order, industry B receives $250000 worth of outside order, but industry C receives
no outside order. What is the production level required to satisfy all the demands precisely?
11. Suppose that C is an n n consumption matrix with all column sums less than 1. Suppose further
that x
is the production vector that satises an outside demand d
, and that x
is the production
vector that satises an outside demand d
. Show that x
+x
is the production vector that satises

an outside demand d
+d
.
12. Suppose that C is an n n consumption matrix with all column sums less than 1. Suppose further
that the demand vector d has 1 for its top entry and 0 for all other entries. Describe the production
vector x in terms of the columns of the matrix (I C)
1
, and give an interpretation of your
observation.
13. Consider a pentagon in R
2
with vertices (1, 1), (3, 1), (4, 2), (2, 4) and (1, 3). For each of the following
transformations on the plane, nd the 3 3 matrix that describes the transformation with respect
to homogeneous coordinates, and use it to nd the image of the pentagon:
a) reection across the x
2
-axis
b) reection across the line x
1
= x
2
c) anticlockwise rotation by 90
d) translation by the xed vector (3, 2)

e) shear in the x
2
-direction with factor 2
f) dilation by factor 2
g) expansion in x
1
-direction by factor 2
h) reection across the x
2
-axis, followed by anticlockwise rotation by 90
i) translation by the xed vector (3, 2), followed by reection across the line x
1
= x
2
j) shear in the x
2
-direction with factor 2, followed by dilation by factor 2, followed by expansion in
x
1
-direction by factor 2
14. In homogeneous coordinates, a 3 3 matrix that describes a transformation on the plane is of the
form
A
=
_
_
a
11
a
12
h
1
a
21
a
22
h
2
0 0 1
_
_
.
Show that this transformation can be described by a matrix transformation on R
2
followed by a
translation in R
2
.
15. Consider the matrices
A
1
=
_
_
1 0 0
sin cos 0
0 0 1
_
_
and A
2
=
_
_
sec tan 0
0 1 0
0 0 1
_
_
,
where (0,
1
2
) is xed.
a) Show that A
1
represents a shear in the x
2
-direction followed by a compression in the x
2
-direction.
b) Show that A
2
represents a shear in the x
1
-direction followed by an expansion in the x
1
-direction.
c) What transformation on the plane does the matrix A
2
A
1
describe?
[Remark: This technique is often used in computer graphics to speed up calculations.]
16. Consider the matrices
A
1
=
_
_
1 tan 0
0 1 0
0 0 1
_
_
and A
2
=
_
_
1 0 0
sin 2 1 0
0 0 1
_
_
,
where R is xed.
a) What transformation on the plane does the matrix A
1
describe?
b) What transformation on the plane does the matrix A
2
describe?
c) What transformation on the plane does the matrix A
1
A
2
A
1
describe?
[Remark: This technique is often used to reduce the number of multiplication operations.]
17. Show that the products and inverses of 3 3 unit lower triangular matrices are also unit lower
triangular.
18. For each of the following matrices A and b, nd an LU factorization of the matrix A and use it to
solve the system Ax = b:
a) A =
_
_
2 1 2
4 6 5
4 6 8
_
_
and b =
_
_
6
21
24
_
_
b) A =
_
_
3 1 3
9 4 10
6 1 5
_
_
and b =
_
_
5
18
9
_
_
c) A =
_
_
2 1 2 1
4 3 5 4
4 3 5 7
_
_
and b =
_
_
1
9
18
_
_
d) A =
_
_
3 1 1 5
9 3 4 19
6 2 1 0
_
_
and b =
_
_
10
35
7
_
_
e) A =
_
_
_
2 3 1 2
6 10 5 4
4 7 6 1
4 2 10 19
_
_
_ and b =
_
_
_
1
1
6
28
_
_
_
f) A =
_
_
_
2 2 1 2 2
4 3 0 7 5
4 7 5 3 2
6 8 19 8 18
_
_
_ and b =
_
_
_
4
12
14
48
_
_
_
19. Consider a payo matrix
A =
_
_
4 1 6 4
6 2 0 8
3 8 7 5
_
_
.
a) What is the expected payo if p = ( 1/3 0 2/3 ) and q =
_
_
_
1/4
1/4
1/4
1/4
_
_
_?
b) Suppose that player R adopts the strategy p = ( 1/3 0 2/3 ). What strategy should player C
adopt?
c) Suppose that player C adopts the strategy q =
_
_
_
1/4
1/4
1/4
1/4
_
_
_. What strategy should player R adopt?
20. Construct a simple example show that optimal strategies are not unique.
21. Show that the entries in the matrices in (9) and (10) are in the range [0, 1].
LINEAR ALGEBRA
W W L CHEN
c W W L Chen, 1982, 2008.
Chapter 3
DETERMINANTS
3.1. Introduction
In the last chapter, we have related the question of the invertibility of a square matrix to a question of
solutions of systems of linear equations. In some sense, this is unsatisfactory, since it is not simple to
nd an answer to either of these questions without a lot of work. In this chapter, we shall relate these
two questions to the question of the determinant of the matrix in question. As we shall see later, the
task is reduced to checking whether this determinant is zero or non-zero. So what is the determinant?
Let us start with 1 1 matrices, of the form
A = ( a ) .
Note here that I
1
= ( 1 ). If a = 0, then clearly the matrix A is invertible, with inverse matrix
A
1
= ( a
1
) .
On the other hand, if a = 0, then clearly no matrix B can satisfy AB = BA = I
1
, so that the matrix A
is not invertible. We therefore conclude that the value a is a good determinant to determine whether
the 1 1 matrix A is invertible, since the matrix A is invertible if and only if a = 0.
Let us then agree on the following denition.
A = ( a )
is a 1 1 matrix. We write
det(A) = a,
and call this the determinant of the matrix A.
Chapter 3 : Determinants page 1 of 24
Next, let us turn to 2 2 matrices, of the form
A =
a b
c d
.
We shall use elementary row operations to nd out when the matrix A is invertible. So we consider the
array
(A|I
2
) =
a b 1 0
c d 0 1
, (1)
and try to use elementary row operations to reduce the left hand half of the array to I
2
. Suppose rst
of all that a = c = 0. Then the array becomes
0 b 1 0
0 d 0 1
,
and so it is impossible to reduce the left hand half of the array by elementary row operations to the
matrix I
2
. Consider next the case a = 0. Multiplying row 2 of the array (1) by a, we obtain
a b 1 0
ac ad 0 a
.
Adding c times row 1 to row 2, we obtain
a b 1 0
0 ad bc c a
. (2)
If D = ad bc = 0, then this becomes
a b 1 0
0 0 c a
,
matrix I
2
. On the other hand, if D = ad bc = 0, then the array (2) can be reduced by elementary row
operations to
1 0 d/D b/D
0 1 c/D a/D
,
so that
A
1
=
1
ad bc
d b
c a
.
Consider nally the case c = 0. Interchanging rows 1 and 2 of the array (1), we obtain
c d 0 1
a b 1 0
.
Multiplying row 2 of the array by c, we obtain
c d 0 1
ac bc c 0
.
Adding a times row 1 to row 2, we obtain
c d 0 1
0 bc ad c a
.
c d 0 1
0 ad bc c a
. (3)
Again, if D = ad bc = 0, then this becomes
c d 0 1
0 0 c a
,
matrix I
2
. On the other hand, if D = ad bc = 0, then the array (3) can be reduced by elementary row
operations to
1 0 d/D b/D
0 1 c/D a/D
,
so that
A
1
=
1
ad bc
d b
c a
.
Finally, note that a = c = 0 is a special case of adbc = 0. We therefore conclude that the value adbc
is a good determinant to determine whether the 2 2 matrix A is invertible, since the matrix A is
invertible if and only if ad bc = 0.
Let us then agree on the following denition.
A =
a b
c d
is a 2 2 matrix. We write
det(A) = ad bc,
and call this the determinant of the matrix A.
3.2. Determinants for Square Matrices of Higher Order
If we attempt to repeat the argument for 2 2 matrices to 3 3 matrices, then it is very likely that
we shall end up in a mess with possibly no rm conclusion. Try the argument on 4 4 matrices if you
must. Those who have their feet rmly on the ground will try a dierent approach.
Our approach is inductive in nature. In other words, we shall dene the determinant of 22 matrices in
terms of determinants of 11 matrices, dene the determinant of 33 matrices in terms of determinants
of 2 2 matrices, dene the determinant of 4 4 matrices in terms of determinants of 3 3 matrices,
and so on.
Suppose now that we have dened the determinant of (n 1) (n 1) matrices. Let
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
(4)
be an n n matrix. For every i, j = 1, . . . , n, let us delete row i and column j of A to obtain the
(n 1) (n 1) matrix
A
ij
=
a
11
. . . a
1(j1)
a
1(j+1)
. . . a
1n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
(i1)1
. . . a
(i1)(j1)
a
(i1)(j+1)
. . . a
(i1)n
. . . . . .
a
(i+1)1
. . . a
(i+1)(j1)
a
(i+1)(j+1)
. . . a
(i+1)n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
. . . a
n(j1)
a
n(j+1)
. . . a
nn
. (5)
Here denotes that the entry has been deleted.
Definition. The number C
ij
= (1)
i+j
det(A
ij
) is called the cofactor of the entry a
ij
of A. In other
words, the cofactor of the entry a
ij
is obtained from A by rst deleting the row and the column containing
the entry a
ij
, then calculating the determinant of the resulting (n 1) (n 1) matrix, and nally
multiplying by a sign (1)
i+j
.
Note that the entries of A in row i are given by
( a
i1
. . . a
in
) .
Definition. By the cofactor expansion of A by row i, we mean the expression
n
j=1
a
ij
C
ij
= a
i1
C
i1
+. . . +a
in
C
in
. (6)
Note that the entries of A in column j are given by
a
1j
.
.
.
a
nj
.
Definition. By the cofactor expansion of A by column j, we mean the expression
n
i=1
a
ij
C
ij
= a
1j
C
1j
+. . . +a
nj
C
nj
. (7)
We shall state without proof the following important result. The interested reader is referred to
Section 3.8 for further discussion.
PROPOSITION 3A. Suppose that A is an n n matrix given by (4). Then the expressions (6) and
(7) are all equal and independent of the row or column chosen.
Definition. Suppose that A is an n n matrix given by (4). We call the common value in (6) and (7)
the determinant of the matrix A, denoted by det(A).
Let us check whether this agrees with our earlier denition of the determinant of a 2 2 matrix.
Writing
A =
a
11
a
12
a
21
a
22
,
we have
C
11
= a
22
, C
12
= a
21
, C
21
= a
12
, C
22
= a
11
.
It follows that
by row 1 : a
11
C
11
+a
12
C
12
= a
11
a
22
a
12
a
21
,
by row 2 : a
21
C
21
+a
22
C
22
= a
21
a
12
+a
22
a
11
,
by column 1 : a
11
C
11
+a
21
C
21
= a
11
a
22
a
21
a
12
,
by column 2 : a
12
C
12
+a
22
C
22
= a
12
a
21
+a
22
a
11
.
The four values are clearly equal, and of the form ad bc as before.
A =
2 3 5
1 4 2
2 1 5
.
Let us use cofactor expansion by row 1. Then
C
11
= (1)
1+1
det
4 2
1 5
= (1)
2
(20 2) = 18,
C
12
= (1)
1+2
det
1 2
2 5
= (1)
3
(5 4) = 1,
C
13
= (1)
1+3
det
1 4
2 1
= (1)
4
(1 8) = 7,
so that
det(A) = a
11
C
11
+a
12
C
12
+a
13
C
13
= 36 3 35 = 2.
Alternatively, let us use cofactor expansion by column 2. Then
C
12
= (1)
1+2
det
1 2
2 5
= (1)
3
(5 4) = 1,
C
22
= (1)
2+2
det
2 5
2 5
= (1)
4
(10 10) = 0,
C
32
= (1)
3+2
det
2 5
1 2
= (1)
5
(4 5) = 1,
so that
det(A) = a
12
C
12
+a
22
C
22
+a
32
C
32
= 3 + 0 + 1 = 2.
When using cofactor expansion, we should choose a row or column with as few non-zero entries as
possible in order to minimize the calculations.
A =
2 3 0 5
1 4 0 2
5 4 8 5
2 1 0 5
.
Here it is convenient to use cofactor expansion by column 3, since then
det(A) = a
13
C
13
+a
23
C
23
+a
33
C
33
+a
43
C
43
= 8C
33
= 8(1)
3+3
det
2 3 5
1 4 2
2 1 5
= 16,
in view of Example 3.2.1.
3.3. Some Simple Observations
In this section, we shall describe two simple observations which follow immediately from the denition
of the determinant by cofactor expansion.
PROPOSITION 3B. Suppose that a square matrix A has a zero row or has a zero column. Then
det(A) = 0.
Proof. We simply use cofactor expansion by the zero row or zero column.
Definition. Consider an n n matrix
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
.
If a
ij
= 0 whenever i > j, then A is called an upper triangular matrix. If a
ij
= 0 whenever i < j, then
A is called a lower triangular matrix. We also say that A is a triangular matrix if it is upper triangular
or lower triangular.
1 2 3
0 4 5
0 0 6
is upper triangular.
Example 3.3.2. A diagonal matrix is both upper triangular and lower triangular.
PROPOSITION 3C. Suppose that the n n matrix
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
is triangular. Then det(A) = a

11
a
22
. . . a
nn
, the product of the diagonal entries.
Proof. Let us assume that A is upper triangular for the case when A is lower triangular, change the
term left-most column to the term top row in the proof. Using cofactor expansion by the left-most
column at each step, we see that
det(A) = a
11
det
a
22
. . . a
2n
.
.
.
.
.
.
a
n2
. . . a
nn
= a
11
a
22
det
a
33
. . . a
3n
.
.
.
.
.
.
a
n3
. . . a
nn
= . . . = a
11
a
22
. . . a
nn
as required.
3.4. Elementary Row Operations
We now study the eect of elementary row operations on determinants. Recall that the elementary row
operations that we consider are: (1) interchanging two rows; (2) adding a multiple of one row to another
row; and (3) multiplying one row by a non-zero constant.
PROPOSITION 3D. (ELEMENTARY ROW OPERATIONS) Suppose that A is an n n matrix.
(a) Suppose that the matrix B is obtained from the matrix A by interchanging two rows of A. Then
det(B) = det(A).
(b) Suppose that the matrix B is obtained from the matrix A by adding a multiple of one row of A to
another row. Then det(B) = det(A).
(c) Suppose that the matrix B is obtained from the matrix A by multiplying one row of A by a non-zero
constant c. Then det(B) = c det(A).
Sketch of Proof. (a) The proof is by induction on n. It is easily checked that the result holds when
n = 2. When n > 2, we use cofactor expansion by a third row, say row i. Then
det(B) =
n
j=1
a
ij
(1)
i+j
det(B
ij
).
Note that the (n 1) (n 1) matrices B
ij
are obtained from the matrices A
ij
by interchanging two
rows of A
ij
, so that det(B
ij
) = det(A
ij
). It follows that
det(B) =
n
j=1
a
ij
(1)
i+j
det(A
ij
) = det(A)
as required.
(b) Again, the proof is by induction on n. It is easily checked that the result holds when n = 2. When
n > 2, we use cofactor expansion by a third row, say row i. Then
det(B) =
n
j=1
a
ij
(1)
i+j
det(B
ij
).
Note that the (n 1) (n 1) matrices B
ij
are obtained from the matrices A
ij
by adding a multiple of
one row of A
ij
to another row, so that det(B
ij
) = det(A
ij
). It follows that
det(B) =
n
j=1
a
ij
(1)
i+j
det(A
ij
) = det(A)
as required.
(c) This is simpler. Suppose that the matrix B is obtained from the matrix A by multiplying row i of
A by a non-zero constant c. Then
det(B) =
n
j=1
ca
ij
(1)
i+j
det(B
ij
).
Note now that B
ij
= A
ij
, since row i has been removed respectively from B and A. It follows that
det(B) =
n
j=1
ca
ij
(1)
i+j
det(A
ij
) = c det(A)
as required.
In fact, the above operations can also be carried out on the columns of A. More precisely, we have
the following result.
PROPOSITION 3E. (ELEMENTARY COLUMN OPERATIONS) Suppose that A is an nn matrix.
(a) Suppose that the matrix B is obtained from the matrix A by interchanging two columns of A. Then
det(B) = det(A).
(b) Suppose that the matrix B is obtained from the matrix A by adding a multiple of one column of A
to another column. Then det(B) = det(A).
(c) Suppose that the matrix B is obtained from the matrix A by multiplying one column of A by a
non-zero constant c. Then det(B) = c det(A).
Elementary row and column operations can be combined with cofactor expansion to calculate the
determinant of a given matrix. We shall illustrate this point by the following examples.
A =
2 3 2 5
1 4 1 2
5 4 4 5
2 2 0 4
.
Adding 1 times column 3 to column 1, we have
det(A) = det
0 3 2 5
0 4 1 2
1 4 4 5
2 2 0 4
.
Adding 1/2 times row 4 to row 3, we have
det(A) = det
0 3 2 5
0 4 1 2
0 3 4 3
2 2 0 4
.
Using cofactor expansion by column 1, we have
det(A) = 2(1)
4+1
det
3 2 5
4 1 2
3 4 3
= 2 det
3 2 5
4 1 2
3 4 3
.
Adding 1 times row 1 to row 3, we have
det(A) = 2 det
3 2 5
4 1 2
0 2 2
.
det(A) = 2 det
3 2 7
4 1 3
0 2 0
.
Using cofactor expansion by row 3, we have
det(A) = 2 2(1)
3+2
det
3 7
4 3
= 4 det
3 7
4 3
.
Using the formula for the determinant of 2 2 matrices, we conclude that det(A) = 4(9 28) = 76.
Let us start again and try a dierent way. Dividing row 4 by 2, we have
det(A) = 2 det
2 3 2 5
1 4 1 2
5 4 4 5
1 1 0 2
.
det(A) = 2 det
2 3 2 5
0 3 1 0
5 4 4 5
1 1 0 2
.
det(A) = 2 det
2 3 2 5
0 0 1 0
5 8 4 5
1 1 0 2
.
Using cofactor expansion by row 2, we have
det(A) = 2 1(1)
2+3
det
2 3 5
5 8 5
1 1 2
= 2 det
2 3 5
5 8 5
1 1 2
.
det(A) = 2 det
0 5 1
5 8 5
1 1 2
.
det(A) = 2 det
0 5 1
0 13 5
1 1 2
.
det(A) = 2 1(1)
3+1
det
5 1
13 5
= 2 det
5 1
13 5
.
Using the formula for the determinant of 2 2 matrices, we conclude that det(A) = 2(25 +13) = 76.
A =
2 1 0 1 3
2 3 1 2 5
4 7 2 3 7
1 0 1 1 3
2 1 0 2 0
.
Here we have the least number of non-zero entries in column 3, so let us work to get more zeros into this
column. Adding 1 times row 4 to row 2, we have
det(A) = det
2 1 0 1 3
1 3 0 1 2
4 7 2 3 7
1 0 1 1 3
2 1 0 2 0
.
det(A) = det
2 1 0 1 3
1 3 0 1 2
2 7 0 1 1
1 0 1 1 3
2 1 0 2 0
.
det(A) = 1(1)
4+3
det
2 1 1 3
1 3 1 2
2 7 1 1
2 1 2 0
= det
2 1 1 3
1 3 1 2
2 7 1 1
2 1 2 0
.
det(A) = det
1 1 1 3
0 3 1 2
1 7 1 1
0 1 2 0
.
det(A) = det
1 1 1 3
0 3 1 2
0 6 0 2
0 1 2 0
.
det(A) = 1(1)
1+1
det
3 1 2
6 0 2
1 2 0
= det
3 1 2
6 0 2
1 2 0
.
det(A) = det
3 1 2
9 1 0
1 2 0
.
det(A) = 2(1)
1+3
det
9 1
1 2
= 2 det
9 1
1 2
.
Using the formula for the determinant of 2 2 matrices, we conclude that det(A) = 2(18 1) = 34.
A =
1 0 2 4 1 0
2 4 5 7 6 2
4 6 1 9 2 1
3 5 0 1 2 5
2 4 5 3 6 2
1 0 2 5 1 0
.
Here note that rows 1 and 6 are almost identical. Adding 1 times row 1 to row 6, we have
det(A) = det
1 0 2 4 1 0
2 4 5 7 6 2
4 6 1 9 2 1
3 5 0 1 2 5
2 4 5 3 6 2
0 0 0 1 0 0
.
det(A) = det
1 0 2 4 1 0
0 0 0 4 0 0
4 6 1 9 2 1
3 5 0 1 2 5
2 4 5 3 6 2
0 0 0 1 0 0
.
det(A) = det
1 0 2 4 1 0
0 0 0 0 0 0
4 6 1 9 2 1
3 5 0 1 2 5
2 4 5 3 6 2
0 0 0 1 0 0
.
It follows from Proposition 3B that det(A) = 0.
3.5. Further Properties of Determinants
Definition. Consider the n n matrix
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
.
By the transpose A
t
of A, we mean the matrix
A
t
=
a
11
. . . a
n1
.
.
.
.
.
.
a
1n
. . . a
nn

obtained from A by transposing rows and columns.
A =
1 2 3
4 5 6
7 8 9
.
Then
A
t
=
1 4 7
2 5 8
3 6 9
.
Recall that determinants of 22 matrices depend on determinants of 11 matrices; in turn, determi-
nants of 33 matrices depend on determinants of 22 matrices, and so on. It follows that determinants
of n n matrices ultimately depend on determinants of 1 1 matrices. Note now that transposing a
1 1 matrix does not aect its determinant (why?). The result below follows in view of Proposition 3A.
PROPOSITION 3F. For every n n matrix A, we have det(A
t
) = det(A).
Example 3.5.2. We have
det
2 2 4 1 2
1 3 7 0 1
0 1 2 1 0
1 2 3 1 2
3 5 7 3 0
= det
2 1 0 1 3
2 3 1 2 5
4 7 2 3 7
1 0 1 1 3
2 1 0 2 0
= 34.
Next, we shall study the determinant of a product. In Section 3.8, we shall sketch a proof of the
following important result.
PROPOSITION 3G. For every n n matrices A and B, we have det(AB) = det(A) det(B).
PROPOSITION 3H. Suppose that the n n matrix A is invertible. Then
det(A
1
) =
1
det(A)
.
Proof. In view of Propositions 3G and 3C, we have det(A) det(A
1
) = det(I
n
) = 1. The result follows
immediately.
Finally, the main reason for studying determinants, as outlined in the introduction, is summarized by
PROPOSITION 3J. Suppose that A is an nn matrix. Then A is invertible if and only if det(A) = 0.
Proof. Suppose that A is invertible. Then det(A) = 0 follows immediately from Proposition 3H.
Suppose now that det(A) = 0. Let us now reduce A by elementary row operations to reduced row
echelon form B. Then there exist a nite sequence E
1
, . . . , E
k
of elementary n n matrices such that
B = E
k
. . . E
1
A.
It follows from Proposition 3G that
det(B) = det(E
k
) . . . det(E
1
) det(A).
Recall that all elementary matrices are invertible and so have non-zero determinants. It follows that
det(B) = 0, so that B has no zero rows by Proposition 3B. Since B is an n n matrix in reduced row
echelon form, it must be I
n
. We therefore conclude that A is row equivalent to I
n
. It now follows from
Proposition 2N(c) that A is invertible.
Combining Propositions 2Q and 3J, we have the following result.
PROPOSITION 3K. In the notation of Proposition 2N, the following statements are equivalent:
(b) The system Ax = 0 of linear equations has only the trivial solution.
(c) The matrices A and I
n
are row equivalent.
(d) The system Ax = b of linear equations is soluble for every n 1 matrix b.
(e) The determinant det(A) = 0.
3.6. Application to Curves and Surfaces
A special case of Proposition 3K states that a homogeneous system of n linear equations in n variables
has a non-trivial solution if and only if the determinant if the coecient matrix is equal to zero. In this
section, we shall use this to solve some problems in geometry. We illustrate our ideas by a few simple
examples.
Example 3.6.1. Suppose that we wish to determine the equation of the unique line on the xy-plane that
passes through two distinct given points (x
1
, y
1
) and (x
2
, y
2
). The equation of a line on the xy-plane is
of the form ax + by + c = 0. Since the two points lie on the line, we must have ax
1
+ by
1
+ c = 0 and
ax
2
+by
2
+c = 0. Hence
xa + yb +c = 0,
x
1
a +y
1
b +c = 0,
x
2
a +y
2
b +c = 0.
Written in matrix notation, we have
x y 1
x
1
y
1
1
x
2
y
2
1
a
b
c
0
0
0
.
Clearly there is a non-trivial solution (a, b, c) to this system of linear equations, and so we must have
det
x y 1
x
1
y
1
1
x
2
y
2
1
= 0,
the equation of the line required.
Example 3.6.2. Suppose that we wish to determine the equation of the unique circle on the xy-plane
that passes through three distinct given points (x
1
, y
1
), (x
2
, y
2
) and (x
3
, y
3
), not all lying on a straight
line. The equation of a circle on the xy-plane is of the form a(x
2
+y
2
) +bx+cy +d = 0. Since the three
points lie on the circle, we must have a(x
2
1
+y
2
1
) +bx
1
+cy
1
+d = 0, a(x
2
2
+y
2
2
) +bx
2
+cy
2
+d = 0, and
a(x
2
3
+y
2
3
) +bx
3
+cy
3
+d = 0. Hence
(x
2
+y
2
)a + xb + yc +d = 0,
(x
2
1
+y
2
1
)a +x
1
b +y
1
c +d = 0,
(x
2
2
+y
2
2
)a +x
2
b +y
2
c +d = 0,
(x
2
3
+y
2
3
)a +x
3
b +y
3
c +d = 0.
x
2
+y
2
x y 1
x
2
1
+y
2
1
x
1
y
1
1
x
2
2
+y
2
2
x
2
y
2
1
x
2
3
+y
2
3
x
3
y
3
1
a
b
c
d
0
0
0
0
.
Clearly there is a non-trivial solution (a, b, c, d) to this system of linear equations, and so we must have
det
x
2
+y
2
x y 1
x
2
1
+y
2
1
x
1
y
1
1
x
2
2
+y
2
2
x
2
y
2
1
x
2
3
+y
2
3
x
3
y
3
1
= 0,
the equation of the circle required.
Example 3.6.3. Suppose that we wish to determine the equation of the unique plane in 3-space that
passes through three distinct given points (x
1
, y
1
, z
1
), (x
2
, y
2
, z
2
) and (x
3
, y
3
, z
3
), not all lying on a
straight line. The equation of a plane in 3-space is of the form ax + by + cz + d = 0. Since the
three points lie on the plane, we must have ax
1
+ by
1
+ cz
1
+ d = 0, ax
2
+ by
2
+ cz
2
+ d = 0, and
ax
3
+by
3
+cz
3
+d = 0. Hence
xa + yb + zc +d = 0,
x
1
a +y
1
b +z
1
c +d = 0,
x
2
a +y
2
b +z
2
c +d = 0,
x
3
a +y
3
b +z
3
c +d = 0.
x y z 1
x
1
y
1
z
1
1
x
2
y
2
z
2
1
x
3
y
3
z
3
1
a
b
c
d
0
0
0
0
.
Clearly there is a non-trivial solution (a, b, c, d) to this system of linear equations, and so we must have
det
x y z 1
x
1
y
1
z
1
1
x
2
y
2
z
2
1
x
3
y
3
z
3
1
= 0,
the equation of the plane required.
Example 3.6.4. Suppose that we wish to determine the equation of the unique sphere in 3-space that
passes through four distinct given points (x
1
, y
1
, z
1
), (x
2
, y
2
, z
2
), (x
3
, y
3
, z
3
) and (x
4
, y
4
, z
4
), not all lying
on a plane. The equation of a sphere in 3-space is of the form a(x
2
+ y
2
+ z
2
) + bx + cy + dz + e = 0.
Since the four points lie on the sphere, we must have
a(x
2
1
+y
2
1
+z
2
1
) +bx
1
+cy
1
+dz
1
+e = 0,
a(x
2
2
+y
2
2
+z
2
2
) +bx
2
+cy
2
+dz
2
+e = 0,
a(x
2
3
+y
2
3
+z
2
3
) +bx
3
+cy
3
+dz
3
+e = 0,
a(x
2
4
+y
2
4
+z
2
4
) +bx
4
+cy
4
+dz
4
+e = 0.
Hence
(x
2
+y
2
+z
2
)a + xb + yc + zd +e = 0,
(x
2
1
+y
2
1
+z
2
1
)a +x
1
b +y
1
c +z
1
d +e = 0,
(x
2
2
+y
2
2
+z
2
2
)a +x
2
b +y
2
c +z
2
d +e = 0,
(x
2
3
+y
2
3
+z
2
3
)a +x
3
b +y
3
c +z
3
d +e = 0,
(x
2
4
+y
2
4
+z
2
4
)a +x
4
b +y
4
c +z
4
d +e = 0.
x
2
+y
2
+z
2
x y z 1
x
2
1
+y
2
1
+z
2
1
x
1
y
1
z
1
1
x
2
2
+y
2
2
+z
2
2
x
2
y
2
z
2
1
x
2
3
+y
2
3
+z
2
3
x
3
y
3
z
3
1
x
2
4
+y
2
4
+z
2
4
x
4
y
4
z
4
1
a
b
c
d
e
0
0
0
0
0
.
Clearly there is a non-trivial solution (a, b, c, d, e) to this system of linear equations, and so we must have
det
x
2
+y
2
+z
2
x y z 1
x
2
1
+y
2
1
+z
2
1
x
1
y
1
z
1
1
x
2
2
+y
2
2
+z
2
2
x
2
y
2
z
2
1
x
2
3
+y
2
3
+z
2
3
x
3
y
3
z
3
1
x
2
4
+y
2
4
+z
2
4
x
4
y
4
z
4
1
= 0,
the equation of the sphere required.
3.7. Some Useful Formulas
In this section, we shall discuss two very useful formulas which involve determinants only. The rst one
enables us to nd the inverse of a matrix, while the second one enables us to solve a system of linear
equations. The interested reader is referred to Section 3.8 for proofs.
Recall rst of all that for any n n matrix
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
,
the number C
ij
= (1)
i+j
det(A
ij
) is called the cofactor of the entry a
ij
, and the (n1)(n1) matrix
A
ij
=
a
11
. . . a
1(j1)
a
1(j+1)
. . . a
1n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
(i1)1
. . . a
(i1)(j1)
a
(i1)(j+1)
. . . a
(i1)n
. . . . . .
a
(i+1)1
. . . a
(i+1)(j1)
a
(i+1)(j+1)
. . . a
(i+1)n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
. . . a
n(j1)
a
n(j+1)
. . . a
nn
is obtained from A by deleting row i and column j; here denotes that the entry has been deleted.
Definition. The n n matrix
adj(A) =
C
11
. . . C
n1
.
.
.
.
.
.
C
1n
. . . C
nn
is called the adjoint of the matrix A.

Remark. Note that adj(A) is obtained from the matrix A rst by replacing each entry of A by its
cofactor and then by transposing the resulting matrix.
PROPOSITION 3L. Suppose that the n n matrix A is invertible. Then
A
1
=
1
det(A)
adj(A).
A =
1 1 0
0 1 2
2 0 3
.
Then
adj(A) =
det
1 2
0 3
det
1 0
0 3
det
1 0
1 2
det
0 2
2 3
det
1 0
2 3
det
1 0
0 2
det
0 1
2 0
det
1 1
2 0
det
1 1
0 1
3 3 2
4 3 2
2 2 1
.
On the other hand, adding 1 times column 1 to column 2 and then using cofactor expansion on row 1,
we have
det(A) = det
1 1 0
0 1 2
2 0 3
= det
1 0 0
0 1 2
2 2 3
= det
1 2
2 3
= 1.
It follows that
A
1
=
3 3 2
4 3 2
2 2 1
.
Next, we turn our attention to systems of n linear equations in n unknowns, of the form
a
11
x
1
+. . . + a
1n
x
n
= b
1
,
.
.
.
a
n1
x
1
+. . . +a
nn
x
n
= b
n
,
represented in matrix notation in the form
Ax = b,
where
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
and b =
b
1
.
.
.
b
n
(8)
x =
x
1
.
.
.
x
n
(9)
represents the variables.
For every j = 1, . . . , k, write
A
j
(b) =
a
11
. . . a
1(j1)
b
1
a
1(j+1)
. . . a
1n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
. . . a
n(j1)
b
n
a
n(j+1)
. . . a
nn
; (10)
in other words, we replace column j of the matrix A by the column b.
PROPOSITION 3M. (CRAMERS RULE) Suppose that the matrix A is invertible. Then the unique
solution of the system Ax = b, where A, x and b are given by (8) and (9), is given by
x
1
=
det(A
1
(b))
det(A)
, . . . , x
n
=
det(A
n
(b))
det(A)
,
where the matrices A
1
(b), . . . , A
1
(b) are dened by (10).
Example 3.7.2. Consider the system Ax = b, where
A =
1 1 0
0 1 2
2 0 3
and b =
1
2
3
.
Recall that det(A) = 1. By Cramers rule, we have
x
1
=
det
1 1 0
2 1 2
3 0 3
det(A)
= 3, x
2
=
det
1 1 0
0 2 2
2 3 3
det(A)
= 4, x
3
=
det
1 1 1
0 1 2
2 0 3
det(A)
= 3.
Let us check our calculations. Recall from Example 3.7.1 that
A
1
=
3 3 2
4 3 2
2 2 1
.
We therefore have
x
1
x
2
x
3
3 3 2
4 3 2
2 2 1
1
2
3
3
4
3
.
3.8. Further Discussion
In this section, we shall rst discuss a denition of the determinant in terms of permutations. In order
to do so, we need to make a digression and discuss rst the rudiments of permutations on non-empty
nite sets.
Definition. Let X be a non-empty nite set. A permutation on X is a function : X X which is
one-to-one and onto. If x X, we denote by x the image of x under the permutation .
It is not dicult to see that if : X X and : X X are both permutations on X, then
: X X, dened by x = (x) for every x X so that is followed by , is also a permutation
on X.
Remark. Note that we use the notation x instead of our usual notation (x) to denote the image
of x under . Note also that we write to denote the composition . We shall do this only for
permutations. The reasons will become a little clearer later in the discussion.
Since the set X is non-empty and nite, we may assume, without loss of generality, that it is
{1, 2, . . . , n}, where n N. We now let S
n
denote the set of all permutations on the set {1, 2, . . . , n}. In
other words, S
n
denotes the collection of all functions from {1, 2, . . . , n} to {1, 2, . . . , n} that are both
one-to-one and onto.
PROPOSITION 3N. For every n N, the set S
n
has n! elements.
Proof. There are n choices for 1. For each such choice, there are (n 1) choices left for 2. And so
on.
To represent particular elements of S
n
, there are various notations. For example, we can use the
notation
1 2 . . . n
1 2 . . . n
to denote the permutation .

Example 3.8.1. In S
4
,
1 2 3 4
2 4 1 3
denotes the permutation , where 1 = 2, 2 = 4, 3 = 1 and 4 = 3. On the other hand, the reader
can easily check that
1 2 3 4
2 4 1 3
1 2 3 4
3 2 4 1
1 2 3 4
2 1 3 4
.
A more convenient way is to use the cycle notation. The permutations
1 2 3 4
2 4 1 3
and
1 2 3 4
3 2 4 1
can be represented respectively by the cycles (1 2 4 3) and (1 3 4). Here the cycle (1 2 4 3) gives the
information 1 = 2, 2 = 4, 4 = 3 and 3 = 1. Note also that in the latter case, since the image of 2
is 2, it is not necessary to include this in the cycle. Furthermore, the information
1 2 3 4
2 4 1 3
1 2 3 4
3 2 4 1
1 2 3 4
2 1 3 4
can be represented in cycle notation by (1 2 4 3)(1 3 4) = (1 2). We also say that the cycles (1 2 4 3),
(1 3 4) and (1 2) have lengths 4, 3 and 2 respectively.
Example 3.8.2. In S
6
, the permutation
1 2 3 4 5 6
2 4 1 3 6 5
can be represented in cycle notation as (1 2 4 3)(5 6).

Example 3.8.3. In S
4
or S
6
, we have (1 2 4 3) = (1 2)(1 4)(1 3).
The last example motivates the following important idea.
Definition. Suppose that n N. A permutation in S
n
that interchanges two numbers among the
elements of {1, 2, . . . , n} and leaves all the others unchanged is called a transposition.
Remark. It is obvious that a transposition can be represented by a 2-cycle, and is its own inverse.
Definition. Two cycles (x
1
x
2
. . . x
k
) and (y
1
y
2
. . . y
l
) in S
n
are said to be disjoint if the elements
x
1
, . . . , x
k
, y
1
, . . . , y
l
are all dierent.
The interested reader may try to prove the following result.
PROPOSITION 3P. Suppose that n N.
(a) Every permutation in S
n
can be written as a product of disjoint cycles.
(b) For every subset {x
1
, x
2
, . . . , x
k
} of the set {1, 2, . . . , n}, where the elements x
1
, x
2
, . . . , x
k
are dis-
tinct, the cycle (x
1
x
2
. . . x
k
) satises
(x
1
x
2
. . . x
k
) = (x
1
x
2
)(x
1
x
3
) . . . (x
1
x
k
);
in other words, every cycle can be written as a product of transpositions.
(c) Consequently, every permutation in S
n
can be written as a product of transpositions.
Example 3.8.4. In S
9
, the permutation
1 2 3 4 5 6 7 8 9
3 2 5 1 7 8 4 9 6
can be written in cycle notation as (1 3 5 7 4)(6 8 9). By Theorem 3P(b), we have

(1 3 5 7 4) = (1 3)(1 5)(1 7)(1 4) and (6 8 9) = (6 8)(6 9).
Hence the permutation can be represented by (1 3)(1 5)(1 7)(1 4)(6 8)(6 9).
Definition. Suppose that n N. Then a permutation in S
n
is said to be even if it is representable as
the product of an even number of transpositions and odd if it is representable as the product of an odd
number of transpositions. Furthermore, we write
() =
+1 if is even,
1 if is odd.
Remark. It can be shown that no permutation can be simultaneously odd and even.
We are now in a position to dene the determinant of a matrix. Suppose that
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
(11)
is an n n matrix.
Definition. By an elementary product from the matrix A, we mean the product of n entries of A, no
two of which are from the same row or same column.
It follows that any such elementary product must be of the form
a
1(1)
a
2(2)
. . . a
n(n)
,
where is a permutation in S
n
.
Definition. By the determinant of an n n matrix A of the form (11), we mean the sum
det(A) =
S
n
()a
1(1)
a
2(2)
. . . a
n(n)
, (12)
where the summation is over all the n! permutations in S
n
.
It is be shown that the determinant dened in this way is the same as that dened earlier by row or
column expansions. Indeed, one can use (12) to establish Proposition 3A. The very interested reader
may wish to make an attempt. Here we conne our study to the special cases when n = 2 and n = 3.
In the two examples below, we use e to denote the identity permutation.
Example 3.8.5. Suppose that n = 2. We have the following:
elementary product permutation sign contribution
a
11
a
22
e +1 +a
11
a
22
a
12
a
21
(1 2) 1 a
12
a
21
Hence det(A) = a
11
a
22
a
12
a
21
as shown before.
a
11
a
22
a
33
e +1 +a
11
a
22
a
33
a
12
a
23
a
31
(1 2 3) +1 +a
12
a
23
a
31
a
13
a
21
a
32
(1 3 2) +1 +a
13
a
21
a
32
a
13
a
22
a
31
(1 3) 1 a
13
a
22
a
31
a
11
a
23
a
32
(2 3) 1 a
11
a
23
a
32
a
12
a
21
a
33
(1 2) 1 a
12
a
21
a
33
Hence det(A) = a
11
a
22
a
33
+ a
12
a
23
a
31
+ a
13
a
21
a
32
a
13
a
22
a
31
a
11
a
23
a
32
a
12
a
21
a
33
. We have the
picture below:
We are now in a position to dene the determinant of a matrix. Suppose that
(11) A =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
is an n n matrix.
Definition. By an elementary product from the matrix A, we mean the product of n entries of A, no
two of which are from the same row or same column.
It follows that any such elementary product must be of the form
a
1(1)
a
2(2)
. . . a
n(n)
,
where is a permutation in S
n
.
Definition. By the determinant of an n n matrix A of the form (11), we mean the sum
(12) det(A) =
S
n
()a
1(1)
a
2(2)
. . . a
n(n)
,
where the summation is over all the n! permutations in S
n
.
It is be shown that the determinant dened in this way is the same as that dened earlier by row or
column expansions. Indeed, one can use (12) to establish Proposition 3A. The very interested reader
may wish to make an attempt. Here we conne our study to the special cases when n = 2 and n = 3.
In the two examples below, we use e to denote the identity permutation.
a
11
a
22
e +1 +a
11
a
22
a
12
a
21
(1 2) 1 a
12
a
21
Hence det(A) = a
11
a
22
a
12
a
21
as shown before.
a
11
a
22
a
33
e +1 +a
11
a
22
a
33
a
12
a
23
a
31
(1 2 3) +1 +a
12
a
23
a
31
a
13
a
21
a
32
(1 3 2) +1 +a
13
a
21
a
32
a
13
a
22
a
31
(1 3) 1 a
13
a
22
a
31
a
11
a
23
a
32
(2 3) 1 a
11
a
23
a
32
a
12
a
21
a
33
(1 2) 1 a
12
a
21
a
33
Hence det(A) = a
11
a
22
a
33
+ a
12
a
23
a
31
+ a
13
a
21
a
32
a
13
a
22
a
31
a
11
a
23
a
32
a
12
a
21
a
33
. We have the
picture below:
+ + +
a
11
a
12
a
13
a
11
a
12
a
21
a
22
a
23
a
21
a
22
a
31
a
32
a
33
a
31
a
32
Next, we discuss briey how one may prove Proposition 3G concerning the determinant of the product
of two matrices. The idea is to use elementary matrices. Corresponding to Proposition 3D, we can easily
establish the following result.
PROPOSITION 3Q. Suppose that E is an elementary matrix.
(a) If E arises from interchanging two rows of I
n
, then det(E) = 1.
(b) If E arises from adding one row of I
n
to another row, then det(E) = 1.
(c) If E arises from multiplying one row of I
n
by a non-zero constant c, then det(E) = c.
Combining Propositions 3D and 3Q, we can establish the following intermediate result.
PROPOSITION 3R. Suppose that E is an n n elementary matrix. Then for any n n matrix B,
we have det(EB) = det(E) det(B).
Proof of Proposition 3G. Let us reduce A by elementary row operations to reduced row echelon form
A
. Then there exist a nite sequence G

1
, . . . , G
k
of elementary matrices such that A
= G
k
. . . G
1
A.
Since elementary matrices are invertible with elementary inverse matrices, it follows that there exist a
nite sequence E
1
, . . . , E
k
of elementary matrices such that
A = E
1
. . . E
k
A
. (13)
Suppose rst of all that det(A) = 0. Then it follows from (13) that the matrix A
must have a zero row.

Hence A
B must have a zero row, and so det(A
B) = 0. But AB = E
1
. . . E
k
(A
B), so it follows from

Proposition 3R that det(AB) = 0. Suppose next that det(A) = 0. Then A
= I
n
, and so it follows from
(13) that AB = E
1
. . . E
k
B. The result now follows on applying Proposition 3R.
We complete this chapter by establishing the two formulas discussed in Section 3.7.
Proof of Proposition 3L. It suces to show that
Aadj(A) = det(A)I
n
, (14)
as this clearly implies
A
1
det(A)
adj(A)
= I
n
,
giving the result. To show (14), note that
Aadj(A) =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
C
11
. . . C
n1
.
.
.
.
.
.
C
1n
. . . C
nn
. (15)
Suppose that the right hand side of (15) is equal to
B =
b
11
. . . b
1n
.
.
.
.
.
.
b
n1
. . . b
nn
.
Then for every i, j = 1, . . . , n, we have
b
ij
= a
i1
C
j1
+. . . +a
in
C
jn
. (16)
It follows that when i = j, we have
b
ii
= a
i1
C
i1
+. . . +a
in
C
in
= det(A).
On the other hand, if i = j, then (16) is equal to the determinant of the matrix obtained from A by
replacing row j by row i. This matrix has therefore two identical rows, and so the determinant is 0
(why?). The identity (14) follows immediately.
Proof of Proposition 3M. Since A is invertible, it follows from Proposition 3L that
A
1
=
1
det(A)
adj(A).
By Proposition 2P, the unique solution of the system Ax = b is given by
x = A
1
b =
1
det(A)
adj(A)b.
Written in full, this becomes
x
1
.
.
.
x
n
=
1
det(A)
C
11
. . . C
n1
.
.
.
.
.
.
C
1n
. . . C
nn
b
1
.
.
.
b
n
=
1
det(A)
b
1
C
11
+. . . +b
n
C
n1
.
.
.
b
1
C
1n
+. . . +b
n
C
nn
.
Hence, for every j = 1, . . . , n, we have
x
j
=
b
1
C
1j
+. . . +b
n
C
nj
det(A)
.
To complete the proof, it remains to show that
b
1
C
1j
+. . . +b
n
C
nj
= det(A
j
(b)).
Note, on using cofactor expansion by column j, that
det(A
j
(b)) =
n
i=1
b
i
(1)
i+j
det
a
11
. . . a
1(j1)
a
1(j+1)
. . . a
1n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
(i1)1
. . . a
(i1)(j1)
a
(i1)(j+1)
. . . a
(i1)n
. . . . . .
a
(i+1)1
. . . a
(i+1)(j1)
a
(i+1)(j+1)
. . . a
(i+1)n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
. . . a
n(j1)
a
n(j+1)
. . . a
nn
=
n
i=1
b
i
(1)
i+j
det(A
ij
) =
n
i=1
b
i
C
ij
as required.
1. Compute the determinant of each of the matrices in Problem 2.6.
2. Find the determinant of each of the following matrices:
P =
1 3 2
8 4 0
2 1 2
, Q =
1 1 1
1 1 1
1 1 1
, R =
a a
2
a
3
b b
2
b
3
c c
2
c
3
.
3. Find the determinant of the matrix
3 4 5 2
1 0 1 0
2 3 6 3
7 2 9 4
.
4. By using suitable elementary row and column operations as well as row and column expansions,
show that
det
2 3 7 1 3
2 3 7 1 5
2 3 6 1 9
4 6 2 3 4
5 8 7 4 5
= 2.
[Remark: Note that rows 1 and 2 of the matrix are almost identical.]
5. By using suitable elementary row and column operations as well as row and column expansions,
show that
det
2 1 5 1 3
2 1 5 1 2
4 3 2 1 1
4 3 2 0 1
2 1 6 7
= 2.
[Remark: The entry is not a misprint!]
6. If A and B are square matrices of the same size and det A = 2 and det B = 3, nd det(A
2
B
1
).
7. a) Compute the Vandermonde determinants
det
1 a a
2
1 b b
2
1 c c
2
and det
1 a a
2
a
3
1 b b
2
b
3
1 c c
2
c
3
1 d d
2
d
3
.
b) Establish a formula for the Vandermonde determinant
det
1 a
1
a
2
1
. . . a
n1
1
1 a
2
a
2
2
. . . a
n1
2
.
.
.
.
.
.
.
.
.
.
.
.
1 a
n
a
2
n
. . . a
n1
n
.
8. Compute the determinant
det
a b c
a +x b +x c +x
a +y b +y c +y
.
9. For each of the matrices below, compute its adjoint and use Proposition 3L to calculate its inverse:
a)
1 1 3
2 2 1
0 1 0
b)
3 5 4
2 1 1
1 0 1
10. Use Cramers rule to solve the system of linear equations

2x
1
+x
2
+ x
3
= 4,
x
1
+ 2x
3
= 2,
3x
1
+x
2
+ 3x
3
= 2.
LINEAR ALGEBRA
W W L CHEN
c W W L Chen, 1982, 2008.
Chapter 4
VECTORS
4.1. Introduction
A vector is an object which has magnitude and direction.
Example 4.1.1. We may be travelling north-east at 50 kph. In this case, the direction of the velocity
is north-east and the magnitude of the velocity is 50 kph. We can describe our velocity in kph as
50
2
,
50
,
where the rst coordinate describes the speed with which we are moving east and the second coordinate
describes the speed with which we are moving north.
Example 4.1.2. An object in the sky may be 100 metres away in the south-east direction 45 degrees
upwards. In this case, the direction of its position is south-eastand 45 degrees upwards and the magnitude
of its distance is 100 metres. We can describe the position of the object in metres as
50, 50,
100
,
where the rst coordinate describes the distance east, the second coordinate describes the distance north
and the third coordinate describes the distance up.
The purpose of this chapter is to study some relationship between algebra and geometry. We shall
rst study some algebra which is motivated by geometric considerations. We then use the algebra later
to better understand some problems in geometry.
Chapter 4 : Vectors page 1 of 24
4.2. Vectors in R
2
A vector on the plane R
2
can be described as an ordered pair u = (u
1
, u
2
), where u
1
, u
2
R.
Definition. Two vectors u = (u
1
, u
2
) and v = (v
1
, v
2
) in R
2
are said to be equal, denoted by u = v, if
u
1
= v
1
and u
2
= v
2
.
Definition. For any two vectors u = (u
1
, u
2
) and v = (v
1
, v
2
) in R
2
, we dene their sum to be
u +v = (u
1
, u
2
) + (v
1
, v
2
) = (u
1
+v
1
, u
2
+v
2
).
Geometrically, if we represent the two vectors u and v by

AB and

BC respectively, then the sum
u +v is represented by

AC as shown in the diagram below:
4.2. Vectors in R
2
2
1
, u
2
), where u
1
, u
2
R.
1
, u
2
) and v = (v
1
, v
2
) in R
2
u
1
= v
1
and u
2
= v
2
.
1
, u
2
) and v = (v
1
, v
2
) in R
2
u +v = (u
1
, u
2
) + (v
1
, v
2
) = (u
1
+v
1
, u
2
+v
2
).

AB and


C
A B
u
u+v
v
The next diagram demonstrates geometrically that u +v = v +u:
D C
A B
u
u
u+v
v
v
PROPOSITION 4A. (VECTOR ADDITION)
(a) For every u, v R
2
, we have u +v R
2
.
(b) For every u, v, w R
2
, we have u + (v +w) = (u +v) +w.
(c) For every u R
2
, we have u +0 = u, where 0 = (0, 0) R
2
.
(d) For every u R
2
, there exists v R
2
such that u +v = 0.
(e) For every u, v R
2
, we have u +v = v +u.
Proof. Write u = (u
1
, u
2
), v = (v
1
, v
2
) and w = (w
1
, w
2
), where u
1
, u
2
, v
1
, v
2
, w
1
, w
2
R. To check
part (a), simply note that u
1
+v
1
, u
2
+v
2
R. To check part (b), note that
u + (v +w) = (u
1
, u
2
) + (v
1
+w
1
, v
2
+w
2
) = (u
1
+ (v
1
+w
1
), u
2
+ (v
2
+w
2
))
= ((u
1
+v
1
) +w
1
, (u
2
+v
2
) +w
2
) = (u
1
+v
1
, u
2
+v
2
) + (w
1
, w
2
)
= (u +v) +w.
Part (c) is trivial. Next, if v = (u
1
, u
2
), then u + v = 0, giving part (d). To check part (e), note
that u +v = (u
1
+v
1
, u
2
+v
2
) = (v
1
+u
1
, v
2
+u
2
) = v +u.
4.2. Vectors in R
2
2
1
, u
2
), where u
1
, u
2
R.
1
, u
2
) and v = (v
1
, v
2
) in R
2
u
1
= v
1
and u
2
= v
2
.
1
, u
2
) and v = (v
1
, v
2
) in R
2
u +v = (u
1
, u
2
) + (v
1
, v
2
) = (u
1
+v
1
, u
2
+v
2
).

AB and


C
A B
u
u+v
v
D C
A B
u
u
u+v
v
v
2
, we have u +v R
2
.
2
, we have u + (v +w) = (u +v) +w.
(c) For every u R
2
, we have u +0 = u, where 0 = (0, 0) R
2
.
(d) For every u R
2
, there exists v R
2
such that u +v = 0.
2
Proof. Write u = (u
1
, u
2
), v = (v
1
, v
2
) and w = (w
1
, w
2
), where u
1
, u
2
, v
1
, v
2
, w
1
, w
2
R. To check
1
+v
1
, u
2
+v
2
u + (v +w) = (u
1
, u
2
) + (v
1
+w
1
, v
2
+w
2
) = (u
1
+ (v
1
+w
1
), u
2
+ (v
2
+w
2
))
= ((u
1
+v
1
) +w
1
, (u
2
+v
2
) +w
2
) = (u
1
+v
1
, u
2
+v
2
) + (w
1
, w
2
)
= (u +v) +w.
1
, u
2
that u +v = (u
1
+v
1
, u
2
+v
2
) = (v
1
+u
1
, v
2
+u
2
) = v +u.
2
, we have u +v R
2
.
2
, we have u + (v +w) = (u +v) +w.
(c) For every u R
2
, we have u +0 = u, where 0 = (0, 0) R
2
.
(d) For every u R
2
, there exists v R
2
such that u +v = 0.
2
Proof. Write u = (u
1
, u
2
), v = (v
1
, v
2
) and w = (w
1
, w
2
), where u
1
, u
2
, v
1
, v
2
, w
1
, w
2
R. To check
1
+v
1
, u
2
+v
2
u + (v +w) = (u
1
, u
2
) + (v
1
+w
1
, v
2
+w
2
) = (u
1
+ (v
1
+w
1
), u
2
+ (v
2
+w
2
))
= ((u
1
+v
1
) +w
1
, (u
2
+v
2
) +w
2
) = (u
1
+v
1
, u
2
+v
2
) + (w
1
, w
2
)
= (u +v) +w.
1
, u
2
that u +v = (u
1
+v
1
, u
2
+v
2
) = (v
1
+u
1
, v
2
+u
2
) = v +u.
Definition. For any vector u = (u
1
, u
2
) in R
2
and any scalar c R, we dene the scalar multiple to be
cu = c(u
1
, u
2
) = (cu
1
, cu
2
).
Example 4.2.1. Suppose that u = (2, 1). Then 2u = (4, 2). Geometrically, if we represent the two
vectors u and 2u by

OA and

OB respectively, then we have the diagram below:
1
, u
2
) in R
2
and any scalar c R, we dene the scalar multiple to be
cu = c(u
1
, u
2
) = (cu
1
, cu
2
).
Example 4.2.1. Suppose that u = (2, 1). Then 2u = (4, 2). Geometrically, if we represent the two
vectors u and 2u by

OA and

OB respectively, then we have the diagram below:
A
O
B
u
2u
PROPOSITION 4B. (SCALAR MULTIPLICATION)
(a) For every c R and u R
2
, we have cu R
2
.
(b) For every c R and u, v R
2
, we have c(u +v) = cu + cv.
(c) For every a, b R and u R
2
, we have (a + b)u = au + bu.
(d) For every a, b R and u R
2
, we have (ab)u = a(bu).
(e) For every u R
2
, we have 1u = u.
Proof. Write u = (u
1
, u
2
) and v = (v
1
, v
2
), where u
1
, u
2
, v
1
, v
2
R. To check part (a), simply note
that cu
1
, cu
2
c(u +v) = c(u
1
+ v
1
, u
2
+ v
2
) = (c(u
1
+ v
1
), c(u
2
+ v
2
))
= (cu
1
+ cv
1
, cu
2
+ cv
2
) = (cu
1
, cu
2
) + (cv
1
, cv
2
) = cu + cv.
To check part (c), note that
(a + b)u = ((a + b)u
1
, (a + b)u
2
) = (au
1
+ bu
1
, au
2
+ bu
2
)
= (au
1
, au
2
) + (bu
1
, bu
2
) = au + bu.
To check part (d), note that
(ab)u = ((ab)u
1
, (ab)u
2
) = (a(bu
1
), a(bu
2
)) = a(bu
1
, bu
2
) = a(bu).
Finally, to check part (e), note that 1u = (1u
1
, 1u
2
) = (u
1
, u
2
) = u.
1
, u
2
) in R
2
, we dene the norm of u to be the non-negative real
number
u =
u
2
1
+ u
2
2
.
Remarks. (1) The norm of a vector is simply its magnitude or length. The denition follows from the
famous theorem of Pythagoras.
(2) Suppose that P(u
1
, u
2
) and Q(v
1
, v
2
) are two points on the plane R
2
. To calculate the distance
d(P, Q) between the two points, we can rst nd a vector from P to Q. This is given by (v
1
u
1
, v
2
u
2
).
The distance d(P, Q) is then the norm of this vector, so that
d(P, Q) =
(v
1
u
1
)
2
+ (v
2
u
2
)
2
.
2
, we have cu R
2
.
2
, we have c(u +v) = cu +cv.
2
, we have (a +b)u = au +bu.
2
(e) For every u R
2
, we have 1u = u.
Proof. Write u = (u
1
, u
2
) and v = (v
1
, v
2
), where u
1
, u
2
, v
1
, v
2
R. To check part (a), simply note
that cu
1
, cu
2
c(u +v) = c(u
1
+v
1
, u
2
+v
2
) = (c(u
1
+v
1
), c(u
2
+v
2
))
= (cu
1
+cv
1
, cu
2
+cv
2
) = (cu
1
, cu
2
) + (cv
1
, cv
2
) = cu +cv.
To check part (c), note that
(a +b)u = ((a +b)u
1
, (a +b)u
2
) = (au
1
+bu
1
, au
2
+bu
2
)
= (au
1
, au
2
) + (bu
1
, bu
2
) = au +bu.
To check part (d), note that
(ab)u = ((ab)u
1
, (ab)u
2
) = (a(bu
1
), a(bu
2
)) = a(bu
1
, bu
2
) = a(bu).
Finally, to check part (e), note that 1u = (1u
1
, 1u
2
) = (u
1
, u
2
) = u.
1
, u
2
) in R
2
number
u =
u
2
1
+u
2
2
.
Remarks. (1) The norm of a vector is simply its magnitude or length. The denition follows from the
famous theorem of Pythagoras.
(2) Suppose that P(u
1
, u
2
) and Q(v
1
, v
2
) are two points on the plane R
2
. To calculate the distance
d(P, Q) between the two points, we can rst nd a vector from P to Q. This is given by (v
1
u
1
, v
2
u
2
).
The distance d(P, Q) is then the norm of this vector, so that
d(P, Q) =
(v
1
u
1
)
2
+ (v
2
u
2
)
2
.
(3) It is not dicult to see that for any vector u R
2
and any scalar c R, we have cu = |c|u.
Definition. Any vector u R
2
satisfying u = 1 is called a unit vector.
Example 4.2.2. The vector (3, 4) has norm 5.
Example 4.2.3. The distance between the points (6, 3) and (9, 7) is
(9 6)
2
+ (7 3)
2
= 5.
Example 4.2.4. The vectors (1, 0) and (0, 1) are unit vectors in R
2
.
Example 4.2.5. The unit vector in the direction of the vector (1, 1) is (1/
2, 1/
2).
Example 4.2.6. In fact, all unit vectors in R
2
are of the form (cos , sin ), where R.
Quite often, we may want to nd the angle between two vectors. The scalar product of the two vectors
then comes in handy. We shall dene the scalar product in two ways, one in terms of the angle between
the two vectors and the other not in terms of this angle, and show that the two denitions are in fact
equivalent.
Definition. Suppose that u = (u
1
, u
2
) and v = (v
1
, v
2
) are vectors in R
2
, and that [0, ] represents
the angle between them. We dene the scalar product u v of u and v by
u v =
uv cos if u = 0 and v = 0,
0 if u = 0 or v = 0.
(1)
Alternatively, we write
u v = u
1
v
1
+u
2
v
2
. (2)
The denitions (1) and (2) are clearly equivalent if u = 0 or v = 0. On the other hand, we have the
following result.
PROPOSITION 4C. Suppose that u = (u
1
, u
2
) and v = (v
1
, v
2
) are non-zero vectors in R
2
, and that
[0, ] represents the angle between them. Then
uv cos = u
1
v
1
+u
2
v
2
.
Proof. Geometrically, if we represent the two vectors u and v by

OA and

OB respectively, then the
dierence v u is represented by

AB as shown in the diagram below:
2
and any scalar c R, we have cu = |c|u.
2
Example 4.2.2. The vector (3, 4) has norm 5.
Example 4.2.3. The distance between the points (6, 3) and (9, 7) is

(9 6)
2
+ (7 3)
2
= 5.
Example 4.2.4. The vectors (1, 0) and (0, 1) are unit vectors in R
2
.
Example 4.2.5. The unit vector in the direction of the vector (1, 1) is (1/
2, 1/
2).
Example 4.2.6. In fact, all unit vectors in R
2
are of the form (cos , sin ), where R.
Quite often, we may want to nd the angle between two vectors. The scalar product of the two vectors
then comes in handy. We shall dene the scalar product in two ways, one in terms of the angle between
the two vectors and the other not in terms of this angle, and show that the two denitions are in fact
equivalent.
1
, u
2
) and v = (v
1
, v
2
) are vectors in R
2
, and that [0, ] represents
the angle between them. We dene the scalar product u v of u and v by
(1) u v =
0 if u = 0 or v = 0.
(2) u v = u
1
v
1
+ u
2
v
2
.
following result.
1
, u
2
) and v = (v
1
, v
2
2
, and that
[0, ] represents the angle between them. Then
uv cos = u
1
v
1
+ u
2
v
2
.
Proof. Geometrically, if we represent the two vectors u and v by

OA and

OB respectively, then the
dierence v u is represented by

AB as shown in the diagram below:
B
A
O
vu
u
v
By the Law of cosines, we have
AB
2
= OA
2
+ OB
2
2OAOB cos ;
By the Law of cosines, we have
AB
2
= OA
2
+OB
2
2OAOBcos ;
in other words, we have
v u
2
= u
2
+v
2
2uv cos ,
so that
uv cos =
1
2
(u
2
+v
2
v u
2
)
=
1
2
(u
2
1
+u
2
2
+v
2
1
+v
2
2
(v
1
u
1
)
2
(v
2
u
2
)
2
)
= u
1
v
1
+u
2
v
2
as required.
Remarks. (1) We say that two non-zero vectors in R
2
are orthogonal if the angle between them is /2.
It follows immediately from the denition of the scalar product that two non-zero vectors u, v R
2
are
orthogonal if and only if u v = 0.
(2) We can calculate the scalar product of any two non-zero vectors u, v R
2
by the formula (2) and
then use the formula (1) to calculate the angle between u and v.
Example 4.2.7. Suppose that u = (
3, 1) and v = (
3, 3). Then by the formula (2), we have

u v = 3 + 3 = 6.
Note now that
u = 2 and v = 2
3.
It follows from the formula (1) that
cos =
u v
uv
=
6
4
3
=
3
2
,
so that = /6.
Example 4.2.8. Suppose that u = (
3, 1) and v = (
3, 3). Then by the formula (2), we have

u v = 0. It follows that u and v are orthogonal.
PROPOSITION 4D. (SCALAR PRODUCT) Suppose that u, v, w R
2
and c R. Then
(a) u v = v u;
(b) u (v +w) = (u v) + (u w);
(c) c(u v) = (cu) v = u (cv);
(d) u u 0; and
(e) u u = 0 if and only if u = 0.
Proof. Write u = (u
1
, u
2
), v = (v
1
, v
2
) and w = (w
1
, w
2
), where u
1
, u
2
, v
1
, v
2
, w
1
, w
2
R. Part (a) is
trivial. To check part (b), note that
u (v +w) = u
1
(v
1
+w
1
) +u
2
(v
2
+w
2
) = (u
1
v
1
+u
2
v
2
) + (u
1
w
1
+u
2
w
2
) = u v +u w.
Part (c) is rather simple. To check parts (d) and (e), note that u u = u
2
1
+ u
2
2
0, and that equality
holds precisely when u
1
= u
2
= 0.
Consider the diagram below:
(3)
P
R A
Q
O
a
v
w
u
Here we represent the two vectors a and u by
OA and
OP respectively. If we project the vector u on to

the line OA, then the image of the projection is the vector w, represented by

OQ. On the other hand,
if we project the vector u on to a line perpendicular to the line OA, then the image of the projection is
the vector v, represented by

OR.
Definition. In the notation of the diagram (3), the vector w is called the orthogonal projection of the
vector u on the vector a, and denoted by w = proj
a
u.
PROPOSITION 4E. (ORTHOGONAL PROJECTION) Suppose that u, a R
2
. Then
proj
a
u =
u a
a
2
a.
Remark. Note that the component of u orthogonal to a, represented by

OR in the diagram (3), is
u proj
a
u = u
u a
a
2
a.
Proof of Proposition 4E. Note that w = ka for some k R. It clearly suces to prove that
k =
u a
a
2
.
It is easy to see that the vectors u w and a are orthogonal. It follows that the scalar product
(u w) a = 0. In other words, (u ka) a = 0. Hence
k =
u a
a a
=
u a
a
2
as required.
To end this section, we shall apply our knowledge gained so far to nd a formula that gives the
perpendicular distance of a point (x
0
, y
0
) from a line ax + by + c = 0. Consider the diagram below:
P
n = (a, b)
Q
O
D
u
ax+by+c=0
(3)
OA and



OR.
a
u.
2
. Then
proj
a
u =
u a
a
2
a.

u proj
a
u = u
u a
a
2
a.
k =
u a
a
2
.
k =
u a
a a
=
u a
a
2
as required.
0
, y
0
) from a line ax +by +c = 0. Consider the diagram below:
(3)
P
R A
Q
O
a
v
w
u
OA and



OR.
a
u.
2
. Then
proj
a
u =
u a
a
2
a.

u proj
a
u = u
u a
a
2
a.
k =
u a
a
2
.
k =
u a
a a
=
u a
a
2
as required.
0
, y
0
) from a line ax + by + c = 0. Consider the diagram below:
P
n = (a, b)
Q
O
D
u
ax+by+c=0
Suppose that (x
1
, y
1
) is any arbitrary point O on the line ax +by +c = 0. For any other point (x, y) on
the line ax +by +c = 0, the vector (x x
1
, y y
1
) is parallel to the line. On the other hand,
(a, b) (x x
1
, y y
1
) = (ax +by) (ax
1
+by
1
) = c +c = 0,
so that the vector n = (a, b), in the direction

OQ, is perpendicular to the line ax + by + c = 0.
Suppose next that the point (x
0
, y
0
) is represented by the point P in the diagram. Then the vector
u = (x
0
x
1
, y
0
y
1
) is represented by

OP, and

OQ represents the orthogonal projection proj
n
u of u
on the vector n. Clearly the perpendicular distance D of the point (x
0
, y
0
) from the line ax+by +c = 0
satises
D = proj
n
u =
u n
n
2
n
=
|(x
0
x
1
, y
0
y
1
) (a, b)|
a
2
+b
2
=
|ax
0
+by
0
ax
1
by
1
|
a
2
+b
2
=
|ax
0
+by
0
+c|
a
2
+b
2
.
We have proved the following result.
PROPOSITION 4F. The perpendicular distance D of a point (x
0
, y
0
) from a line ax + by + c = 0 is
given by
D =
|ax
0
+by
0
+c|
a
2
+b
2
.
Example 4.2.9. The perpendicular distance D of the point (5, 7) from the line 2x 3y +5 = 0 is given
by
D =
|10 21 + 5|
4 + 9
=
6
13
.
4.3. Vectors in R
3
In this section, we consider the same problems as in Section 4.2, but in 3-space R
3
. Any reader who
feels condent may skip this section.
3
can be described as an ordered triple u = (u
1
, u
2
, u
3
), where u
1
, u
2
, u
3
R.
1
, u
2
, u
3
) and v = (v
1
, v
2
, v
3
) in R
3
are said to be equal, denoted by
u = v, if u
1
= v
1
, u
2
= v
2
and u
3
= v
3
.
1
, u
2
, u
3
) and v = (v
1
, v
2
, v
3
) in R
3
u +v = (u
1
, u
2
, u
3
) + (v
1
, v
2
, v
3
) = (u
1
+v
1
, u
2
+v
2
, u
3
+v
3
).
1
, u
2
, u
3
) in R
3
and any scalar c R, we dene the scalar multiple
to be
cu = c(u
1
, u
2
, u
3
) = (cu
1
, cu
2
, cu
3
).
The following two results are the analogues of Propositions 4A and 4B. The proofs are essentially
similar.
3
, we have u +v R
3
.
3
, we have u + (v +w) = (u +v) +w.
(c) For every u R
3
, we have u +0 = u, where 0 = (0, 0, 0) R
3
.
(d) For every u R
3
, there exists v R
3
such that u +v = 0.
3
3
, we have cu R
3
.
3
3
3
(e) For every u R
3
, we have 1u = u.
1
, u
2
, u
3
) in R
3
number
u =
u
2
1
+u
2
2
+u
2
3
.
Remarks. (1) Suppose that P(u
1
, u
2
, u
3
) and Q(v
1
, v
2
, v
3
) are two points in R
3
. To calculate the
distance d(P, Q) between the two points, we can rst nd a vector from P to Q. This is given by
(v
1
u
1
, v
2
u
2
, v
3
u
3
). The distance d(P, Q) is then the norm of this vector, so that
d(P, Q) =
(v
1
u
1
)
2
+ (v
2
u
2
)
2
+ (v
3
u
3
)
2
.
3
and any scalar c R, we have
cu = |c|u.
3
Example 4.3.1. The vector (3, 4, 12) has norm 13.
Example 4.3.2. The distance between the points (6, 3, 12) and (9, 7, 0) is 13.
Example 4.3.3. The vectors (1, 0, 0) and (0, 1, 0) are unit vectors in R
3
.
Example 4.3.4. The unit vector in the direction of the vector (1, 0, 1) is (1/
2, 0, 1/
2).
The theory of scalar products can be extended to R
3
is the natural way.
1
, u
2
, u
3
) and v = (v
1
, v
2
, v
3
) are vectors in R
3
, and that [0, ]
represents the angle between them. We dene the scalar product u v of u and v by
u v =
0 if u = 0 or v = 0.
(4)
u v = u
1
v
1
+u
2
v
2
+u
3
v
3
. (5)
following analogue of Proposition 4C. The proof is similar.
1
, u
2
, u
3
) and v = (v
1
, v
2
, v
3
3
,
and that [0, ] represents the angle between them. Then
uv cos = u
1
v
1
+u
2
v
2
+u
3
v
3
.
3
It follows immediately from the denition of the scalar productthat two non-zero vectors u, v R
3
are
3
Example 4.3.5. Suppose that u = (2, 0, 0) and v = (1, 1,
2). Then by the formula (5), we have

u v = 2. Note now that u = 2 and v = 2. It follows from the formula (4) that
cos =
u v
uv
=
2
4
=
1
2
,
so that = /3.
Example 4.3.6. Suppose that u = (2, 3, 5) and v = (1, 1, 1). Then by the formula (5), we have
The following result is the analogue of Proposition 4D. The proof is similar.
3
and c R. Then
(a) u v = v u;
(b) u (v +w) = (u v) + (u w);
(c) c(u v) = (cu) v = u (cv);
(d) u u 0; and
Suppose now that a and u are two vectors in R
3
. Then since two vectors are always coplanar, we can
draw the following diagram which represents the plane they lie on:
1
, u
2
, u
3
) and v = (v
1
, v
2
, v
3
3
,
uv cos = u
1
v
1
+u
2
v
2
+u
3
v
3
.
3
It follows immediately from the denition of the scalar productthat two non-zero vectors u, v R
3
are
3
Example 4.3.5. Suppose that u = (2, 0, 0) and v = (1, 1,
2). Then by the formula (5), we have

u v = 2. Note now that u = 2 and v = 2. It follows from the formula (4) that
cos =
u v
uv
=
2
4
=
1
2
,
so that = /3.
Example 4.3.6. Suppose that u = (2, 3, 5) and v = (1, 1, 1). Then by the formula (5), we have
The following result is the analogue of Proposition 4D. The proof is similar.
3
and c R. Then
(a) u v = v u;
(b) u (v +w) = (u v) + (u w);
(c) c(u v) = (cu) v = u (cv);
(d) u u 0; and
Suppose now that a and u are two vectors in R
3
. Then since two vectors are always coplanar, we can
draw the following diagram which represents the plane they lie on:
(6)
P
R A
Q
O
a
v
w
u
Note that this diagram is essentially the same as the diagram (3), the only dierence being that while
the diagram (3) shows the whole of R
2
, the diagram (6) only shows part of R
3
. As before, we represent
the two vectors a and u by

OA and

OP respectively. If we project the vector u on to the line OA, then
the image of the projection is the vector w, represented by

OQ. On the other hand, if we project the
vector u on to a line perpendicular to the line OA, then the image of the projection is the vector v,
represented by

OR.
a
u.
(6)
Note that this diagram is essentially the same as the diagram (3), the only dierence being that while
the diagram (3) shows the whole of R
2
, the diagram (6) only shows part of R
3
. As before, we represent
the two vectors a and u by

OA and

OP respectively. If we project the vector u on to the line OA, then
the image of the projection is the vector w, represented by

OQ. On the other hand, if we project the
vector u on to a line perpendicular to the line OA, then the image of the projection is the vector v,
represented by

OR.
a
u.
The following result is the analogue of Proposition 4E. The proof is similar.
3
. Then
proj
a
u =
u a
a
2
a.

u proj
a
u = u
u a
a
2
a.
4.4. Vector Products
In this section, we shall discuss a product of vectors unique to R
3
. The idea of vector products has wide
applications in geometry, physics and engineering, and is motivated by the wish to nd a vector that is
perpendicular to two given vectors.
We shall use the right hand rule. In other words, if we hold the thumb on the right hand upwards
and close the remaining four ngers, then the ngers point from the x-direction towards the y-direction,
while the thumb points towards the z-direction. Alternatively, if we imagine Columbus had never lived
and that the earth were at, then taking the x-direction as east and the y-direction as north, then the
z-direction is upwards!
We shall frequently use the three vectors i = (1, 0, 0), j = (0, 1, 0) and k = (0, 0, 1) in R
3
.
1
, u
2
, u
3
) and v = (v
1
, v
2
, v
3
) are two vectors in R
3
. Then the vector
product u v is dened by the determinant
u v = det
i j k
u
1
u
2
u
3
v
1
v
2
v
3
.
Remarks. (1) Note that
i j = (j i) = k,
j k = (k j) = i,
k i = (i k) = j.
(2) Using cofactor expansion by row 1, we have
u v = det
u
2
u
3
v
2
v
3
i det
u
1
u
3
v
1
v
3
j + det
u
1
u
2
v
1
v
2
k
=
det
u
2
u
3
v
2
v
3
, det
u
1
u
3
v
1
v
3
, det
u
1
u
2
v
1
v
2
= (u
2
v
3
u
3
v
2
, u
3
v
1
u
1
v
3
, u
1
v
2
u
2
v
1
).
We shall rst of all show that the vector product u v is orthogonal to both u and v.
PROPOSITION 4G. Suppose that u = (u
1
, u
2
, u
3
) and v = (v
1
, v
2
, v
3
) are two vectors in R
3
. Then
(a) u (u v) = 0; and
(b) v (u v) = 0.
Proof. Note rst of all that
u (u v) = (u
1
, u
2
, u
3
)
det
u
2
u
3
v
2
v
3
, det
u
1
u
3
v
1
v
3
, det
u
1
u
2
v
1
v
2
= u
1
det
u
2
u
3
v
2
v
3
u
2
det
u
1
u
3
v
1
v
3
+u
3
det
u
1
u
2
v
1
v
2
= det
u
1
u
2
u
3
u
1
u
2
u
3
v
1
v
2
v
3
,
in view of cofactor expansion by row 1. On the other hand, clearly
det
u
1
u
2
u
3
u
1
u
2
u
3
v
1
v
2
v
3
= 0.
This proves part (a). The proof of part (b) is similar.
Example 4.4.1. Suppose that u = (1, 1, 2) and v = (3, 0, 2). Then
u v = det
i j k
1 1 2
3 0 2
det
1 2
0 2
, det
1 2
3 2
, det
1 1
3 0
= (2, 4, 3).
Note that (1, 1, 2) (2, 4, 3) = 0 and (3, 0, 2) (2, 4, 3) = 0.
PROPOSITION 4H. (VECTOR PRODUCT) Suppose that u, v, w R
3
and c R. Then
(a) u v = (v u);
(b) u (v +w) = (u v) + (u w);
(c) (u +v) w = (u w) + (v w);
(d) c(u v) = (cu) v = u (cv);
(e) u 0 = 0; and
(f ) u u = 0.
Proof. Write u = (u
1
, u
2
, u
3
), v = (v
1
, v
2
, v
3
) and w = (w
1
, w
2
, w
3
). To check part (a), note that
det
i j k
u
1
u
2
u
3
v
1
v
2
v
3
= det
i j k
v
1
v
2
v
3
u
1
u
2
u
3
.
To check part (b), note that
det
i j k
u
1
u
2
u
3
v
1
+w
1
v
2
+w
2
v
3
+w
3
= det
i j k
u
1
u
2
u
3
v
1
v
2
v
3
+ det
i j k
u
1
u
2
u
3
w
1
w
2
w
3
.
Part (c) is similar. To check part (d), note that
c det
i j k
u
1
u
2
u
3
v
1
v
2
v
3
= det
i j k
cu
1
cu
2
cu
3
v
1
v
2
v
3
= det
i j k
u
1
u
2
u
3
cv
1
cv
2
cv
3
.
To check parts (e) and (f), note that
u 0 = det
i j k
u
1
u
2
u
3
0 0 0
= 0 and u u = det
i j k
u
1
u
2
u
3
u
1
u
2
u
3
= 0
as required.
Next, we shall discuss an application of vector product to the evaluaton of the area of a parallelogram.
To do this, we shall rst establish the following result.
PROPOSITION 4J. Suppose that u = (u
1
, u
2
, u
3
) and v = (v
1
, v
2
, v
3
3
,
(a) u v
2
= u
2
v
2
(u v)
2
; and
(b) u v = uv sin .
Proof. Note that
u v
2
= (u
2
v
3
u
3
v
2
)
2
+ (u
3
v
1
u
1
v
3
)
2
+ (u
1
v
2
u
2
v
1
)
2
(7)
and
u
2
v
2
(u v)
2
= (u
2
1
+u
2
2
+u
2
3
)(v
2
1
+v
2
2
+v
2
3
) (u
1
v
1
+u
2
v
2
+u
3
v
3
)
2
. (8)
Part (a) follows on expanding the right hand sides of (7) and (8) and checking that they are equal. To
prove part (b), recall that
u v = uv cos .
Combining with part (a), we obtain
u v
2
= u
2
v
2
u
2
v
2
cos
2
= u
2
v
2
sin
2
.
Part (b) follows.
Consider now a parallelogram with vertices O, A, B, C. Suppose that u and v are represented by

OA
and

OC respectively. If we imagine the side OA to represent the base of the parallelogram, so that
the base has length u, then the height of the the parallelogram is given by v sin , as shown in the
diagram below:
Next, we shall discuss an application of vector product to the evaluaton of the area of a parallelogram.
To do this, we shall rst establish the following result.
PROPOSITION 4J. Suppose that u = (u
1
, u
2
, u
3
) and v = (v
1
, v
2
, v
3
3
,
(a) u v
2
= u
2
v
2
(u v)
2
; and
(b) u v = uv sin .
Proof. Note that
(7) u v
2
= (u
2
v
3
u
3
v
2
)
2
+ (u
3
v
1
u
1
v
3
)
2
+ (u
1
v
2
u
2
v
1
)
2
and
(8) u
2
v
2
(u v)
2
= (u
2
1
+u
2
2
+u
2
3
)(v
2
1
+v
2
2
+v
2
3
) (u
1
v
1
+u
2
v
2
+u
3
v
3
)
2
.
Part (a) follows on expanding the right hand sides of (7) and (8) and checking that they are equal. To
prove part (b), recall that
u v = uv cos .
Combining with part (a), we obtain
u v
2
= u
2
v
2
u
2
v
2
cos
2
= u
2
v
2
sin
2
.
Part (b) follows.
Consider now a parallelogram with vertices O, A, B, C. Suppose that u and v are represented by

OA
and

OC respectively. If we imagine the side OA to represent the base of the parallelogram, so that
the base has length u, then the height of the the parallelogram is given by v sin , as shown in the
diagram below:
C B
O A
v sin
v
u
u
It follows from Proposition 4J that the area of the parallelogram is given by u v. We have proved
PROPOSITION 4K. Suppose that u, v R
3
. Then the parallelogram with u and v as two of its sides
has area u v.
We conclude this section by making a remark on the vector product u v of two vectors in R
3
.
Recall that the vector product is perpendicular to both u and v. Furthermore, it can be shown that the
direction of uv satises the right hand rule, in the sense that if we hold the thumb on the right hand
outwards and close the remaining four ngers, then the thumb points towards the u v-direction when
the ngers point from the u-direction towards the v-direction. Also, we showed in Proposition 4J that
the magnitude of uv depends only on the norm of u and v and the angle between the two vectors. It
It follows from Proposition 4J that the area of the parallelogram is given by u v. We have proved
PROPOSITION 4K. Suppose that u, v R
3
. Then the parallelogram with u and v as two of its sides
has area u v.
We conclude this section by making a remark on the vector product u v of two vectors in R
3
.
Recall that the vector product is perpendicular to both u and v. Furthermore, it can be shown that the
direction of uv satises the right hand rule, in the sense that if we hold the thumb on the right hand
outwards and close the remaining four ngers, then the thumb points towards the u v-direction when
the ngers point from the u-direction towards the v-direction. Also, we showed in Proposition 4J that
the magnitude of uv depends only on the norm of u and v and the angle between the two vectors. It
follows that the vector product is unchanged as long as we keep a right hand coordinate system. This is
an important consideration in physics and engineering, where we may use dierent coordinate systems
on the same problem.
4.5. Scalar Triple Products
Suppose that u, v, w R
3
do not all lie on the same plane. Consider the parallelepiped with u, v, w as
three of its edges. We are interested in calculating the volume of this parallelepiped. Suppose that u, v
and w are represented by

OA,

OB and

OC respectively. Consider the diagram below:
follows that the vector product is unchanged as long as we keep a right hand coordinate system. This is
an important consideration in physics and engineering, where we may use dierent coordinate systems
on the same problem.
4.5. Scalar Triple Products
Suppose that u, v, w R
3
do not all lie on the same plane. Consider the parallelepiped with u, v, w as
three of its edges. We are interested in calculating the volume of this parallelepiped. Suppose that u, v
and w are represented by

OA,

OB and

OC respectively. Consider the diagram below:
P A
C
O B
vw
u
w
v
By Proposition 4K, the base of this parallelepiped, with O, B, C as three of the vertices, has area vw.
Next, note that if OP is perpendicular to the base of the parallelepiped, then

OP is in the direction of
v w. If PA is perpendicular to OP, then the height of the parallelepiped is equal to the norm of the
orthogonal projection of u on v w. In other words, the parallelepiped has height
proj
vw
u =
u (v w)
v w
2
(v w)
=
|u (v w)|
v w
.
Hence the volume of the parallelepiped is given by
V = |u (v w)|.
PROPOSITION 4L. Suppose that u, v, w R
3
. Then the parallelepiped with u, v and w as three of
its edges has volume |u (v w)|.
Definition. Suppose that u, v, w R
3
. Then u (v w) is called the scalar triple product of u, v and
w.
Remarks. (1) It follows immediately from Proposition 4L that three vectors in R
3
are coplanar if and
only if their scalar triple product is zero.
(2) Note that
u (v w) = (u
1
, u
2
, u
3
)
det
v
2
v
3
w
2
w
3
, det
v
1
v
3
w
1
w
3
, det
v
1
v
2
w
1
w
2
(9)
= u
1
det
v
2
v
3
w
2
w
3
u
2
det
v
1
v
3
w
1
w
3
+u
3
det
v
1
v
2
w
1
w
2
= det
u
1
u
2
u
3
v
1
v
2
v
3
w
1
w
2
w
3
,
in view of cofactor expansion by row 1.
By Proposition 4K, the base of this parallelepiped, with O, B, C as three of the vertices, has area vw.
Next, note that if OP is perpendicular to the base of the parallelepiped, then

OP is in the direction of
v w. If PA is perpendicular to OP, then the height of the parallelepiped is equal to the norm of the
orthogonal projection of u on v w. In other words, the parallelepiped has height
proj
vw
u =
u (v w)
v w
2
(v w)
=
|u (v w)|
v w
.
Hence the volume of the parallelepiped is given by
V = |u (v w)|.
PROPOSITION 4L. Suppose that u, v, w R
3
. Then the parallelepiped with u, v and w as three of
its edges has volume |u (v w)|.
Definition. Suppose that u, v, w R
3
. Then u (v w) is called the scalar triple product of u, v
and w.
Remarks. (1) It follows immediately from Proposition 4L that three vectors in R
3
are coplanar if and
only if their scalar triple product is zero.
(2) Note that
u (v w) = (u
1
, u
2
, u
3
)
det
v
2
v
3
w
2
w
3
, det
v
1
v
3
w
1
w
3
, det
v
1
v
2
w
1
w
2
= u
1
det
v
2
v
3
w
2
w
3
u
2
det
v
1
v
3
w
1
w
3
+u
3
det
v
1
v
2
w
1
w
2
= det
u
1
u
2
u
3
v
1
v
2
v
3
w
1
w
2
w
3
, (9)
in view of cofactor expansion by row 1.
(3) It follows from identity (9) that
u (v w) = v (wu) = w (u v).
Note that each of the determinants can be obtained from the other two by twice interchanging two rows.
Example 4.5.1. Suppose that u = (1, 0, 1), v = (2, 1, 3) and w = (0, 1, 1). Then
u (v w) = det
1 0 1
2 1 3
0 1 1
= 0,
so that u, v and w are coplanar.
Example 4.5.2. The volume of the parallelepiped with u = (1, 0, 1), v = (2, 1, 4) and w = (0, 1, 1) as
three of its edges is given by
|u (v w)| =
det
1 0 1
2 1 4
0 1 1
= | 1| = 1.
4.6. Application to Geometry in R
3
In this section, we shall study lines and planes in R
3
by using our results on vectors in R
3
.
Consider rst of all a plane in R
3
. Suppose that (x
1
, y
1
, z
1
) R
3
is a given point on this plane.
Suppose further that n = (a, b, c) is a vector perpendicular to this plane. Then for any arbitrary point
(x, y, z) R
3
on this plane, the vector
(x, y, z) (x
1
, y
1
, z
1
) = (x x
1
, y y
1
, z z
1
)
joins one point on the plane to another point on the plane, and so must be parallel to the plane and
hence perpendicular to n = (a, b, c). It follows that the scalar product
(a, b, c) (x x
1
, y y
1
, z z
1
) = 0,
and so
a(x x
1
) +b(y y
1
) +c(z z
1
) = 0. (10)
If we write d = ax
1
+by
1
+cz
1
, then (10) can be rewritten in the form
ax +by +cz +d = 0. (11)
Equation (10) is usually called the point-normal form of the equation of a plane, while equation (11) is
usually known as the general form of the equation of a plane.
Example 4.6.1. Consider the plane through the point (2, 5, 7) and perpendicular to the vector
(3, 5, 4). Here (a, b, c) = (3, 5, 4) and (x
1
, y
1
, z
1
) = (2, 5, 7). The equation of the plane is given
in point-normal form by 3(x2) +5(y +5) 4(z 7) = 0, and in general form by 3x+5y 4z +37 = 0.
Here d = 6 25 28 = 37.
Example 4.6.2. Consider the plane through the points (1, 1, 1), (2, 2, 0) and (4, 6, 2). Then the vectors
(2, 2, 0) (1, 1, 1) = (1, 1, 1) and (4, 6, 2) (1, 1, 1) = (3, 7, 1)
join the point (1, 1, 1) to the points (2, 2, 0) and (4, 6, 2) respectively and are therefore parallel to the
plane. It follows that the vector product
(1, 1, 1) (3, 7, 1) = (6, 4, 10)
is perpendicular to the plane. The equation of the plane is then given by 6(x1)4(y1)10(z1) = 0,
or 3x + 2y + 5z 10 = 0.
Consider next a line in R
3
. Suppose that (x
1
, y
1
, z
1
) R
3
is a given point on this line. Suppose further
that n = (a, b, c) is a vector parallel to this line. Then for any arbitrary point (x, y, z) R
3
on this line,
the vector
(x, y, z) (x
1
, y
1
, z
1
) = (x x
1
, y y
1
, z z
1
)
joins one point on the line to another point on the line, and so must be parallel to n = (a, b, c). It follows
that there is some number R such that
(x x
1
, y y
1
, z z
1
) = (a, b, c),
so that
x = x
1
+a,
y = y
1
+b, (12)
z = z
1
+c,
where is called a parameter. Suppose further that a, b, c are all non-zero. Then, eliminating the
parameter , we obtain
x x
1
a
=
y y
1
b
=
z z
1
c
. (13)
Equations (12) are usually called the parametric form of the equations of a line, while equations (13)
are usually known as the symmetric form of the equations of a line.
Example 4.6.3. Consider the line through the point (2, 5, 7) and parallel to the vector (3, 5, 4). Here
(a, b, c) = (3, 5, 4) and (x
1
, y
1
, z
1
) = (2, 5, 7). The equations of the line are given in parametric form
by
x = 2 + 3,
y = 5 + 5,
z = 7 4,
and in symmetric form by
x 2
3
=
y + 5
5
=
z 7
4
.
Example 4.6.4. Consider the line through the points (3, 0, 5) and (7, 0, 8). Then a vector in the direction
of the line is given by
(7, 0, 8) (3, 0, 5) = (4, 0, 3).
The equation of the line is then given in parametric form by
x = 3 + 4,
y = 0,
z = 5 + 3,
and in symmetric form by
x 3
4
=
z 5
3
and y = 0.
Consider the plane through three xed points (x
1
, y
1
, z
1
), (x
2
, y
2
, z
2
) and (x
3
, y
3
, z
3
), not lying on the
same line. Let (x, y, z) be a point on the plane. Then the vectors
(x, y, z) (x
1
, y
1
, z
1
) = (x x
1
, y y
1
, z z
1
),
(x, y, z) (x
2
, y
2
, z
2
) = (x x
2
, y y
2
, z z
2
),
(x, y, z) (x
3
, y
3
, z
3
) = (x x
3
, y y
3
, z z
3
),
each joining one point on the plane to another point on the plane, are all parallel to the plane. Using
the vector product, we see that the vector
(x x
2
, y y
2
, z z
2
) (x x
3
, y y
3
, z z
3
)
is perpendicular to the plane, and so perpendicular to the vector (x x
1
, y y
1
, z z
1
). It follows that
the scalar triple product
(x x
1
, y y
1
, z z
1
) ((x x
2
, y y
2
, z z
2
) (x x
3
, y y
3
, z z
3
)) = 0;
in other words,
det
x x
1
y y
1
z z
1
x x
2
y y
2
z z
2
x x
3
y y
3
z z
3
= 0.
This is another technique to nd the equation of a plane through three xed points.
Example 4.6.5. We return to the plane in Example 4.6.2, through the three points (1, 1, 1), (2, 2, 0)
and (4, 6, 2). The equation is given by
det
x 1 y 1 z 1
x 2 y 2 z 0
x 4 y + 6 z 2
= 0.
The determinant on the left hand side is equal to 6x 4y 10z +20. Hence the equation of the plane
is given by 6x 4y 10z + 20 = 0, or 3x + 2y + 5z 10 = 0.
We observe that the calculation for the determinant above is not very pleasant. However, the technique
can be improved in the following way by making less reference to the unknown point (x, y, z). Note that
the vectors
(x, y, z) (x
1
, y
1
, z
1
) = (x x
1
, y y
1
, z z
1
),
(x
2
, y
2
, z
2
) (x
1
, y
1
, z
1
) = (x
2
x
1
, y
2
y
1
, z
2
z
1
),
(x
3
, y
3
, z
3
) (x
1
, y
1
, z
1
) = (x
3
x
1
, y
3
y
1
, z
3
z
1
),
(x
2
x
1
, y
2
y
1
, z
2
z
1
) (x
3
x
1
, y
3
y
1
, z
3
z
1
)
1
, y y
1
, z z
1
). It follows that
(x x
1
, y y
1
, z z
1
) ((x
2
x
1
, y
2
y
1
, z
2
z
1
) (x
3
x
1
, y
3
y
1
, z
3
z
1
)) = 0;
in other words,
det
x x
1
y y
1
z z
1
x
2
x
1
y
2
y
1
z
2
z
1
x
3
x
1
y
3
y
1
z
3
z
1
= 0.
Example 4.6.6. We return to the plane in Examples 4.6.2 and 4.6.5, through the three points (1, 1, 1),
(2, 2, 0) and (4, 6, 2). The equation is given by
det
x 1 y 1 z 1
2 1 2 1 0 1
4 1 6 1 2 1
= 0.
The determinant on the left hand side is equal to
det
x 1 y 1 z 1
1 1 1
3 7 1
= 6(x 1) 4(y 1) 10(z 1) = 6x 4y 10z + 20.

Hence the equation of the plane is given by 6x 4y 10z + 20 = 0, or 3x + 2y + 5z 10 = 0.
We next consider the problem of dividing a line segment in a given ratio. Suppose that x
1
and x
2
are
two given points in R
3
.
We wish to divide the line segment joining x
1
and x
2
internally in the ratio
1
:
2
, where
1
and
2
are positive real numbers. In other words, we wish to nd the point x on the line segment joining x
1
and x
2
such that
x x
1
x x
2
=

1
2
,
as shown in the diagram below:
(x
2
x
1
, y
2
y
1
, z
2
z
1
) (x
3
x
1
, y
3
y
1
, z
3
z
1
)
1
, y y
1
, z z
1
). It follows that
(x x
1
, y y
1
, z z
1
) ((x
2
x
1
, y
2
y
1
, z
2
z
1
) (x
3
x
1
, y
3
y
1
, z
3
z
1
)) = 0;
in other words,
det
x x
1
y y
1
z z
1
x
2
x
1
y
2
y
1
z
2
z
1
x
3
x
1
y
3
y
1
z
3
z
1
= 0.
Example 4.6.6. We return to the plane in Examples 4.6.2 and 4.6.5, through the three points (1, 1, 1),
(2, 2, 0) and (4, 6, 2). The equation is given by
det
x 1 y 1 z 1
2 1 2 1 0 1
4 1 6 1 2 1
= 0.
The determinant on the left hand side is equal to
det
x 1 y 1 z 1
1 1 1
3 7 1
= 6(x 1) 4(y 1) 10(z 1) = 6x 4y 10z + 20.

Hence the equation of the plane is given by 6x 4y 10z + 20 = 0, or 3x + 2y + 5z 10 = 0.
We next consider the problem of dividing a line segment in a given ratio. Suppose that x
1
and x
2
are
two given points in R
3
.
We wish to divide the line segment joining x
1
and x
2
internally in the ratio
1
:
2
, where
1
and
2
are positive real numbers. In other words, we wish to nd the point x on the line segment joining x
1
and x
2
such that
x x
1
x x
2
=

1
2
,
as shown in the diagram below:
| | |
x
1
x x
2
x x
1
x x
2

Since x x
1
and x
2
x are both in the same direction as x
2
x
1
, we must have
2
(x x
1
) =
1
(x
2
x), or x =

1
x
2
+
2
x
1
1
+
2
.
We wish next to nd the point x on the line joining x
1
and x
2
, but not between x
1
and x
2
, such that
x x
1
x x
2
=

1
2
,
Since x x
1
and x
2
x are both in the same direction as x
2
x
1
, we must have
2
(x x
1
) =
1
(x
2
x), or x =

1
x
2
+
2
x
1
1
+
2
.
We wish next to nd the point x on the line joining x
1
and x
2
, but not between x
1
and x
2
, such that
x x
1
x x
2
=

1
2
,
where
1
and
2
are positive real numbers, as shown in the diagrams below for the cases
1
<
2
and
1
>
2
respectively:
where
1
and
2
1
<
2
and
1
>
2
respectively:
| | | | | |
x x
1
x
2
x
1
x
2
x
Since x x
1
and x x
2
are in the same direction as each other, we must have
2
(x x
1
) =
1
(x x
2
), or x =

1
x
2
2
x
1
2
.
Example 4.6.7. Let x
1
= (1, 2, 3) and x
2
= (7, 11, 6). The point
x =
2x
2
+x
1
2 + 1
=
2(7, 11, 6) + (1, 2, 3)
3
= (5, 8, 5)
divides the line segment joining (1, 2, 3) and (7, 11, 6) internally in the ratio 2 : 1, whereas the point
x =
4x
2
2x
1
4 2
=
4(7, 11, 6) 2(1, 2, 3)
2
= (13, 20, 9)
satises
x x
1
x x
2
=
4
2
.
Finally we turn our attention to the question of nding the distance of a plane from a given point.
We shall prove the following analogue of Proposition 4F.
PROPOSITION 4F. The perpendicular distance D of a plane ax + by + cz + d = 0 from a point
(x
0
, y
0
, z
0
) is given by
D =
|ax
0
+ by
0
+ cz
0
+ d|
a
2
+ b
2
+ c
2
.
Proof. Consider the following diagram:
P
n = (a, b, c)
Q
O
D
ax+by+cz=0
u
Since x x
1
and x x
2
2
(x x
1
) =
1
(x x
2
), or x =

1
x
2
2
x
1
2
.
1
= (1, 2, 3) and x
2
= (7, 11, 6). The point
x =
2x
2
+x
1
2 + 1
=
2(7, 11, 6) + (1, 2, 3)
3
= (5, 8, 5)
x =
4x
2
2x
1
4 2
=
4(7, 11, 6) 2(1, 2, 3)
2
= (13, 20, 9)
satises
x x
1
x x
2
=
4
2
.
(x
0
, y
0
, z
0
) is given by
D =
|ax
0
+by
0
+cz
0
+d|
a
2
+b
2
+c
2
.
where
1
and
2
1
<
2
and
1
>
2
respectively:
| | | | | |
x x
1
x
2
x
1
x
2
x
Since x x
1
and x x
2
2
(x x
1
) =
1
(x x
2
), or x =

1
x
2
2
x
1
2
.
1
= (1, 2, 3) and x
2
= (7, 11, 6). The point
x =
2x
2
+x
1
2 + 1
=
2(7, 11, 6) + (1, 2, 3)
3
= (5, 8, 5)
x =
4x
2
2x
1
4 2
=
4(7, 11, 6) 2(1, 2, 3)
2
= (13, 20, 9)
satises
x x
1
x x
2
=
4
2
.
(x
0
, y
0
, z
0
) is given by
D =
|ax
0
+ by
0
+ cz
0
+ d|
a
2
+ b
2
+ c
2
.
P
n = (a, b, c)
Q
O
D
ax+by+cz=0
u
Suppose that (x
1
, x
2
, x
3
) is any arbitrary point O on the plane ax+by +cz +d = 0. For any other point
(x, y, z) on the plane ax +by +cz +d = 0, the vector (x x
1
, y y
1
, z z
1
) is parallel to the plane. On
the other hand,
(a, b, c) (x x
1
, y y
1
, z z
1
) = (ax +by +cz) (ax
1
+by
1
+cz
1
) = d +d = 0,
so that the vector n = (a, b, c), in the direction

OQ, is perpendicular to the plane ax +by +cz +d = 0.
Suppose next that the point (x
0
, y
0
, z
0
) is represented by the point P in the diagram. Then the vector
u = (x
0
x
1
, y
0
y
1
, z
0
z
1
) is represented by
OP, and
OQ represents the orthogonal projection proj

n
u
of u on the vector n. Clearly the perpendicular distance D of the point (x
0
, y
0
, z
0
) from the plane
ax +by +cz +d = 0 satises
D = proj
n
u =
u n
n
2
n
=
|(x
0
x
1
, y
0
y
1
, z
0
z
1
) (a, b, c)|
a
2
+b
2
+c
2
=
|ax
0
+by
0
+cz
0
ax
1
by
1
cz
1
|
a
2
+b
2
+c
2
=
|ax
0
+by
0
+cz
0
+d|
a
2
+b
2
+c
2
as required.
A special case of Proposition 4F is when (x
0
, y
0
, z
0
) = (0, 0, 0) is the origin. This show that the
perpendicular distance of the plane ax +by +cz +d = 0 from the origin is
|d|
a
2
+b
2
+c
2
.
Example 4.6.8. Consider the plane 3x + 5y 4z + 37 = 0. The distance of the point (1, 2, 3) from the
plane is
|3 + 10 12 + 37|
9 + 25 + 16
=
38
50
=
19
2
5
.
The distance of the origin from the plane is
|37|
9 + 25 + 16
=
37
50
.
Example 4.6.9. Consider also the plane 3x+5y4z1 = 0. Note that this plane is also perpendicular to
the vector (3, 5, 4) and is therefore parallel to the plane 3x+5y4z+37 = 0. It is therefore reasonable to
nd the perpendicular distance between these two parallel planes. Note that the perpendicular distance
between the two planes is equal to the perpendicular distance of any point on 3x +5y 4z 1 = 0 from
the plane 3x+5y 4z +37 = 0. Note now that (1, 2, 3) lies on the plane 3x+5y 4z 1 = 0. It follows
from Example 4.6.8 that the distance between the two planes is 19
2/5.
4.7. Application to Mechanics
Let u = (u
x
, u
y
) denote a vector in R
2
, where the components u
x
and u
y
are functions of an independent
variable t. Then the derivative of u with respect to t is given by
du
dt
=
du
x
dt
,
du
y
dt
.
Example 4.7.1. When discussing planar particle motion, we often let r = (x, y) denote the position of
a particle at time t. Then the components x and y are functions of t. The derivative
v =
dr
dt
=
dx
dt
,
dy
dt
represents the velocity of the particle, and its derivative

a =
dv
dt
=
d
2
x
dt
2
,
d
2
y
dt
2
represents the acceleration of the particle. We often write r = r, v = v and a = a.

Suppose that w = (w
x
, w
y
) is another vector in R
2
. Then it is not dicult to see that
d
dt
(u w) = u
dw
dt
+
du
dt
w. (14)
Example 4.7.2. Consider a particle moving at constant speed along a circular path centred at the
origin. Then r = r is constant. More precisely, the position vector r = (x, y) satises x
2
+ y
2
= c
1
,
where c
1
is a positive constant, so that
r r = (x, y) (x, y) = c
1
. (15)
On the other hand, v = v is constant. More precisely, the velocity vector
v =
dx
dt
,
dy
dt
satises
dx
dt
2
+
dy
dt
2
= c
2
,
where c
2
is a positive constant, so that
v v =
dx
dt
,
dy
dt
dx
dt
,
dy
dt
= c
2
. (16)
Dierentiating (15) and (16) with respect to t, and using the identity (14), we obtain respectively
r v = 0 and v a = 0. (17)
Using the properties of the scalar product, we see that the equations in (17) show that the vector v
is perpendicular to both vectors r and a, and so a must be in the same direction as or the opposite
direction to r. Next, dierentiating the rst equation in (17), we obtain
r a +v v = 0, or r a = v
2
< 0.
Let denote the angle between a and r. Then = 0
or = 180
. Since
r a = ra cos ,
it follows that cos < 0, and so = 180
. We also obtain ra = v
2
, so that a = v
2
/r. This is a vector
proof that for circular motion at constant speed, the acceleration is towards the centre of the circle and
of magnitude v
2
/r.
Let u = (u
x
.u
y
, u
z
) denote a vector in R
3
, where the components u
x
, u
y
and u
z
are functions of an
independent variable t. Then the derivative of u with respect to t is given by
du
dt
=
du
x
dt
,
du
y
dt
,
du
z
dt
.
Suppose that w = (w
x
, w
y
, w
z
) is another vector in R
3
. Then it is not dicult to see that
d
dt
(u w) = u
dw
dt
+
du
dt
w. (18)
Example 4.7.3. When discussing particle motion in 3-dimensional space, we often let r = (x, y, z)
denote the position of a particle at time t. Then the components x, y and z are functions of t. The
derivative
v =
dr
dt
=
dx
dt
,
dy
dt
,
dz
dt
= ( x, y, z)
represents the velocity of the particle, and its derivative
a =
dv
dt
=
d
2
x
dt
2
,
d
2
y
dt
2
,
d
2
z
dt
2
= ( x, y, z)
represents the acceleration of the particle.
Example 4.7.4. For a particle of mass m, the kinetic energy is given by
T =
1
2
m( x
2
+ y
2
+ z
2
) =
1
2
m( x, y, z) ( x, y, z) =
1
2
mv v.
Using the identity (18), we have
dT
dt
= ma v = F v,
where F = ma denotes the force. On the other hand, suppose that the potential energy is given by V .
Using knowledge on functions of several real variables, we can show that
dV
dt
=
V
x
dx
dt
+
V
y
dy
dt
+
V
z
dz
dt
=
V
x
,
V
y
,
V
z
v = V v,
where
V =
V
x
,
V
y
,
V
z
is called the gradient of V . The law of conservation of energy says that T +V is constant, so that
dT
dt
+
dV
dt
= (F +V ) v = 0
holds for all vectors v, so that F(r) = V (r) for all vectors r.
Example 4.7.5. If a force acts on a moving particle, then the work done is dened as the product of
the distance moved and the magnitude of the force in the direction of motion. Suppose that a force F
acts on a particle with displacement r. Then the component of the force in the direction of the motion
is given by F u, where
u =
r
r
is a unit vector in the direction of the vector r. It follows that the work done is given by
r
F
r
r
= F r.
For instance, we see that the work done in moving a particle along a vector r = (3, 2, 4) with applied
force F = (2, 1, 1) is F r = (2, 1, 1) (3, 2, 4) = 12.
Example 4.7.6. We can also resolve a force into components. Consider a weight of mass m hanging
from the ceiling on a rope as shown in the picture below:
force F = (2, 1, 1) is F r = (2, 1, 1) (3, 2, 4) = 12.
m
Here the rope makes an angle of 60
with the vertical. We wish to nd the tension T on the rope. To

nd this, note that the tension on the rope is a force, and we have the following picture of forces:
T
1
60
T
2
magnitude mg
The force T
1
has magnitude T
1
= T. Let z be a unit vector pointing vertically upwards. Using scalar
products, we see that the component of the force T
1
in the vertical direction is
T
1
z = T
1
z cos 60
=
1
2
T.
Similarly, the force T
2
has magnitude T
2
= T, and the component of it in the vertical direction is
T
2
z = T
2
z cos 60
=
1
2
T.
Since the weight is stationary, he total force upwards on it is
1
2
T +
1
2
T mg = 0. Hence T = mg.

force F = (2, 1, 1) is F r = (2, 1, 1) (3, 2, 4) = 12.
m

T
1
60
T
2
magnitude mg
The force T
1
has magnitude T
1
1
T
1
z = T
1
z cos 60
=
1
2
T.
2
has magnitude T
2
T
2
z = T
2
z cos 60
=
1
2
T.
1
2
T +
1
2
The force T
1
has magnitude T
1
1
T
1
z = T
1
z cos 60
=
1
2
T.
2
has magnitude T
2
T
2
z = T
2
z cos 60
=
1
2
T.
1
2
T +
1
2
1. For each of the following pairs of vectors in R
2
, calculate u + 3v, u v, u v and nd the angle
between u and v:
a) u = (1, 1) and v = (5, 0) b) u = (1, 2) and v = (2, 1)
2
, calculate 2u 5v, u 2v, u v and the angle
between u and v (to the nearest degree):
a) u = (1, 3) and v = (2, 1) b) u = (2, 0) and v = (1, 2)
3. For the two vectors u = (2, 3) and v = (5, 1) in the 2-dimensional euclidean space R
2
, determine
each of the following:
a) u v b) u
c) u (u v) d) the angle between u and u v
3
, calculate u + 3v, u v, u v, nd the angle
between u and v, and nd a unit vector perpendicular to both u and v:
a) u = (1, 1, 1) and v = (5, 0, 5) b) u = (1, 2, 3) and v = (3, 2, 1)
5. Find vectors v and w such that v is parallel to (1, 2, 3), v + w = (7, 3, 5) and w is orthogonal to
(1, 2, 3).
6. Let ABCD be a quadrilateral. Show that the quadrilateral obtained by joining the midpoints of
adjacent sides of ABCD is a parallelogram.
[Hint: Let a, b, c and d be vectors representing the four sides of ABCD.]
7. Suppose that u, v and w are vectors in R
3
such that the scalar triple porduct u (v w) = 0. Let
u
=
v w
u (v w)
, v
=
wu
u (v w)
, w
=
u v
u (v w)
.
a) Show that u
u = 1.
b) Show that u
v = u
w = 0.
c) Use the properties of the scalar triple product to nd v
v and w
w, as well as v
u, v
w,
w
u and w
v.
8. Suppose that u, v, w, u
, v
and w
are vectors in R
3
such that u
u = v
v = w
w = 1 and
u
v = u
w = v
u = v
w = w
u = w
v = 0. Show that if u (v w) = 0, then

u
=
v w
u (v w)
, v
=
wu
u (v w)
, w
=
u v
u (v w)
.
9. Suppose that u, v and w are vectors in R
3
.
a) Show that u (v w) = (u w)v (u v)w.
b) Deduce that (u v) w = (u w)v (v w)u.
10. Consider the three points P(2, 3, 1), Q(4, 2, 5) and R(1, 6, 3).
a) Find the equation of the line through P and Q.
b) Find the equation of the plane perpendicular to the line in part (a) and passing through R.
c) Find the distance between R and the line in part (a).
d) Find the area of the parallelogram with the three points as vertices.
e) Find the equation of the plane through the three points.
f) Find the distance of the origin (0, 0, 0) from the plane in part (e).
g) Are the planes in parts (b) and (e) perpendicular? Justify your assertion.
11. Consider the points (1, 2, 3), (0, 2, 4) and (2, 1, 3) in R
3
.
a) Find the area of a parallelogram with these points as three of its vertices.
b) Find the perpendicular distance between (1, 2, 3) and the line passing through (0, 2, 4) and (2, 1, 3).
3
.
a) Find a vector perpendicular to the plane containing these points.
b) Find the equation of this plane and its perpendicular distance from the origin.
c) Find the equation of the line perpendicular to this plane and passing through the point (3, 6, 9).
13. Find the equation of the plane through the points (1, 2, 3), (2, 3, 4) and (3, 1, 2).
15. Find the volume of a parallelepiped with the points (1, 2, 3), (0, 2, 4), (2, 1, 3) and (3, 6, 9) as four of
its vertices.
16. Consider a weight of mass m hanging from the ceiling supported by two ropes as shown in the
picture below:
3
.
a) Find the area of a parallelogram with these points as three of its vertices.
b) Find the perpendicular distance between (1, 2, 3) and the line passing through (0, 2, 4) and (2, 1, 3).
3
.
a) Find a vector perpendicular to the plane containing these points.
b) Find the equation of this plane and its perpendicular distance from the origin.
c) Find the equation of the line perpendicular to this plane and passing through the point (3, 6, 9).
15. Find the volume of a parallelepiped with the points (1, 2, 3), (0, 2, 4), (2, 1, 3) and (3, 6, 9) as four of
its vertices.
16. Consider a weight of mass m hanging from the ceiling supported by two ropes as shown in the
picture below:
m
Here the rope on the left makes an angle of 45
with the vertical, while the rope on the right makes

an angle of 60
with the vertical. Find the tension on the two ropes.

Here the rope on the left makes an angle of 45
with the vertical, while the rope on the right makes

an angle of 60
with the vertical. Find the tension on the two ropes.

LINEAR ALGEBRA
W W L CHEN
c W W L Chen, 1994, 2008.
This chapter is available free to all individuals, on the understanding that it is not to be used for nancial gain,
Chapter 5
INTRODUCTION TO
VECTOR SPACES
5.1. Real Vector Spaces
Before we give any formal denition of a vector space, we shall consider a few concrete examples of such
an abstract object. We rst study two examples from the theory of vectors which we rst discussed in
Chapter 4.
Example 5.1.1. Consider the set R
2
of all vectors of the form u = (u
1
, u
2
), where u
1
, u
2
R. Consider
vector addition and also multiplication of vectors by real numbers. It is easy to check that we have the
following properties:
(1.1) For every u, v R
2
, we have u +v R
2
.
(1.2) For every u, v, w R
2
, we have u + (v +w) = (u +v) +w.
(1.3) For every u R
2
, we have u +0 = 0 +u = u.
(1.4) For every u R
2
, we have u + (u) = 0.
(1.5) For every u, v R
2
(2.1) For every c R and u R
2
, we have cu R
2
.
(2.2) For every c R and u, v R
2
(2.3) For every a, b R and u R
2
(2.4) For every a, b R and u R
2
(2.5) For every u R
2
, we have 1u = u.
Example 5.1.2. Consider the set R
3
1
, u
2
, u
3
), where u
1
, u
2
, u
3
R.
Consider vector addition and also multiplication of vectors by real numbers. It is easy to check that we
have properties analogous to (1.1)(1.5) and (2.1)(2.5) in the previous example, with reference to R
2
being replaced by R
3
.
We next turn to an example from the theory of matrices which we rst discussed in Chapter 2.
Chapter 5 : Introduction to Vector Spaces page 1 of 17
Example 5.1.3. Consider the set M
2,2
(R) of all 2 2 matrices with entries in R. Consider matrix
addition and also multiplication of matrices by real numbers. Denote by O the 2 2 null matrix. It is
easy to check that we have the following properties:
(1.1) For every P, Q M
2,2
(R), we have P +Q M
2,2
(R).
(1.2) For every P, Q, R M
2,2
(R), we have P + (Q+R) = (P +Q) +R.
(1.3) For every P M
2,2
(R), we have P +O = O +P = P.
(1.4) For every P M
2,2
(R), we have P + (P) = O.
(1.5) For every P, Q M
2,2
(R), we have P +Q = Q+P.
(2.1) For every c R and P M
2,2
(R), we have cP M
2,2
(R).
(2.2) For every c R and P, Q M
2,2
(R), we have c(P +Q) = cP +cQ.
(2.3) For every a, b R and P M
2,2
(R), we have (a +b)P = aP +bP.
(2.4) For every a, b R and P M
2,2
(R), we have (ab)P = a(bP).
(2.5) For every P M
2,2
(R), we have 1P = P.
We also turn to an example from the theory of functions.
Example 5.1.4. Consider the set A of all functions of the form f : R R. For any two functions
f, g A, dene the function f + g : R R by writing (f + g)(x) = f(x) + g(x) for every x R. For
every function f A and every number c R, dene the function cf : R R by writing (cf)(x) = cf(x)
for every x R. Denote by : R R the function where (x) = 0 for every x R. Then it is easy to
check that we have the following properties:
(1.1) For every f, g A, we have f +g A.
(1.2) For every f, g, h A, we have f + (g +h) = (f +g) +h.
(1.3) For every f A, we have f + = +f = f.
(1.4) For every f A, we have f + (f) = .
(1.5) For every f, g A, we have f +g = g +f.
(2.1) For every c R and f A, we have cf A.
(2.2) For every c R and f, g A, we have c(f +g) = cf +cg.
(2.3) For every a, b R and f A, we have (a +b)f = af +bf.
(2.4) For every a, b R and f A, we have (ab)f = a(bf).
(2.5) For every f A, we have 1f = f.
There are many more examples of sets where properties analogous to (1.1)(1.5) and (2.1)(2.5) in
the four examples above hold. This apparent similarity leads us to consider an abstract object which
will incorporate all these individual cases as examples. We say that these examples are all vector spaces
over R.
Definition. A vector space V over R, or a real vector space V , is a set of objects, known as vectors,
together with vector addition + and multiplication of vectors by element of R, and satisfying the following
properties:
(VA1) For every u, v V , we have u +v V .
(VA2) For every u, v, w V , we have u + (v +w) = (u +v) +w.
(VA3) There exists an element 0 V such that for every u V , we have u +0 = 0 +u = u.
(VA4) For every u V , there exists u V such that u + (u) = 0.
(VA5) For every u, v V , we have u +v = v +u.
(SM1) For every c R and u V , we have cu V .
(SM2) For every c R and u, v V , we have c(u +v) = cu +cv.
(SM3) For every a, b R and u V , we have (a +b)u = au +bu.
(SM4) For every a, b R and u V , we have (ab)u = a(bu).
(SM5) For every u V , we have 1u = u.
Remark. The elements a, b, c R discussed in (SM1)(SM5) are known as scalars. Multiplication of
vectors by elements of R is sometimes known as scalar multiplication.
Example 5.1.5. Let n N. Consider the set R
n
1
, . . . , u
n
), where
u
1
, . . . , u
n
R. For any two vectors u = (u
1
, . . . , u
n
) and v = (v
1
, . . . , v
n
) in R
n
and any number c R,
write
u +v = (u
1
+v
1
, . . . , u
n
+v
n
) and cu = (cu
1
, . . . , cu
n
).
To check (VA1), simply note that u
1
+v
1
, . . . , u
n
+v
n
R. To check (VA2), note that if w = (w
1
, . . . , w
n
),
then
u + (v +w) = (u
1
, . . . , u
n
) + (v
1
+w
1
, . . . , v
n
+w
n
) = (u
1
+ (v
1
+w
1
), . . . , u
n
+ (v
n
+w
n
))
= ((u
1
+v
1
) +w
1
, . . . , (u
n
+v
n
) +w
n
) = (u
1
+v
1
, . . . , u
n
+v
n
) + (w
1
, . . . , w
n
)
= (u +v) +w.
If we take 0 to be the zero vector (0, . . . , 0), then u + 0 = 0 + u = u, giving (VA3). Next, writing
u = (u
1
, . . . , u
n
), we have u + (u) = 0, giving (VA4). To check (VA5), note that
u +v = (u
1
+v
1
, . . . , u
n
+v
n
) = (v
1
+u
1
, . . . , v
n
+u
n
) = v +u.
To check (SM1), simply note that cu
1
, . . . , cu
n
R. To check (SM2), note that
c(u +v) = c(u
1
+v
1
, . . . , u
n
+v
n
) = (c(u
1
+v
1
), . . . , c(u
n
+v
n
))
= (cu
1
+cv
1
, . . . , cu
n
+cv
n
) = (cu
1
, . . . , cu
n
) + (cv
1
, . . . , cv
n
) = cu +cv.
To check (SM3), note that
(a +b)u = ((a +b)u
1
, . . . , (a +b)u
n
) = (au
1
+bu
1
, . . . , au
n
+bu
n
)
= (au
1
, . . . , au
n
) + (bu
1
, . . . , bu
n
) = au +bu.
(ab)u = ((ab)u
1
, . . . , (ab)u
n
) = (a(bu
1
), . . . , a(bu
n
)) = a(bu
1
, . . . , bu
n
) = a(bu).
Finally, to check (SM5), note that
1u = (1u
1
, . . . , 1u
n
) = (u
1
, . . . , u
n
) = u.
It follows that R
n
is a vector space over R. This is known as the n-dimensional euclidean space.
Example 5.1.6. Let k N. Consider the set P
k
of all polynomials of the form
p(x) = p
0
+p
1
x +. . . +p
k
x
k
, where p
0
, p
1
, . . . , p
k
R.
In other words, P
k
is the set of all polynomials of degree at most k and with coecients in R. For any
two polynomials p(x) = p
0
+p
1
x+. . . +p
k
x
k
and q(x) = q
0
+q
1
x+. . . +q
k
x
k
in P
k
and for any number
c R, write
p(x) +q(x) = (p
0
+q
0
) + (p
1
+q
1
)x +. . . + (p
k
+q
k
)x
k
and cp(x) = cp
0
+cp
1
x +. . . +cp
k
x
k
.
To check (VA1), simply note that p
0
+ q
0
, . . . , p
k
+ q
k
R. To check (VA2), note that if we write
r(x) = r
0
+r
1
x +. . . +r
k
x
k
, then we have
p(x) + (q(x) +r(x)) = (p
0
+p
1
x +. . . +p
k
x
k
) + ((q
0
+r
0
) + (q
1
+r
1
)x +. . . + (q
k
+r
k
)x
k
)
= (p
0
+ (q
0
+r
0
)) + (p
1
+ (q
1
+r
1
))x +. . . + (p
k
+ (q
k
+r
k
))x
k
= ((p
0
+q
0
) +r
0
) + ((p
1
+q
1
) +r
1
)x +. . . + ((p
k
+q
k
) +r
k
)x
k
= ((p
0
+q
0
) + (p
1
+q
1
)x +. . . + (p
k
+q
k
)x
k
) + (r
0
+r
1
x +. . . +r
k
x
k
)
= (p(x) +q(x)) +r(x).
If we take 0 to be the zero polynomial 0 +0x+. . . +0x
k
, then p(x) +0 = 0+p(x) = p(x), giving (VA3).
Next, writing p(x) = p
0
p
1
x . . . p
k
x
k
, we have p(x) + (p(x)) = 0, giving (VA4). To check
(VA5), note that
p(x) +q(x) = (p
0
+q
0
) + (p
1
+q
1
)x +. . . + (p
k
+q
k
)x
k
= (q
0
+p
0
) + (q
1
+p
1
)x +. . . + (q
k
+p
k
)x
k
= q(x) +p(x).
To check (SM1), simply note that cp
0
, . . . , cp
k
R. To check (SM2), note that
c(p(x) +q(x)) = c((p
0
+q
0
) + (p
1
+q
1
)x +. . . + (p
k
+q
k
)x
k
)
= c(p
0
+q
0
) +c(p
1
+q
1
)x +. . . +c(p
k
+q
k
)x
k
= (cp
0
+cq
0
) + (cp
1
+cq
1
)x +. . . + (cp
k
+cq
k
)x
k
= (cp
0
+cp
1
x +. . . +cp
k
x
k
) + (cq
0
+cq
1
x +. . . +cq
k
x
k
)
= cp(x) +cq(x).
(a +b)p(x) = (a +b)p
0
+ (a +b)p
1
x +. . . + (a +b)p
k
x
k
= (ap
0
+bp
0
) + (ap
1
+bp
1
)x +. . . + (ap
k
+bp
k
)x
k
= (ap
0
+ap
1
x +. . . +ap
k
x
k
) + (bp
0
+bp
1
x +. . . +bp
k
x
k
)
= ap(x) +bp(x).
(ab)p(x) = (ab)p
0
+ (ab)p
1
x +. . . + (ab)p
k
x
k
= a(bp
0
) +a(bp
1
)x +. . . +a(bp
k
)x
k
= a(bp
0
+bp
1
x +. . . +bp
k
x
k
) = a(bp(x)).
Finally, to check (SM5), note that
1p(x) = 1p
0
+ 1p
1
x +. . . + 1p
k
x
k
= p
0
+p
1
x +. . . +p
k
x
k
= p(x).
It follows that P
k
is a vector space over R. Note also that the vectors are the polynomials.
There are a few simple properties of vector spaces that we can deduce easily from the denition.
PROPOSITION 5A. Suppose that V is a vector space over R, and that u V and c R.
(a) We have 0u = 0.
(b) We have c0 = 0.
(c) We have (1)u = u.
(d) If cu = 0, then c = 0 or u = 0.
Proof. (a) By (SM1), we have 0u V . Hence
0u + 0u = (0 + 0)u (by (SM3)),
= 0u (since 0 R).
It follows that
0u = 0u +0 (by (VA3)),
= 0u + (0u + ((0u))) (by (VA4)),
= (0u + 0u) + ((0u)) (by (VA2)),
= 0u + ((0u)) (from above),
= 0 (by (VA4)).
(b) By (SM1), we have c0 V . Hence
c0 +c0 = c(0 +0) (by (SM2)),
= c0 (by (VA3)).
It follows that
c0 = c0 +0 (by (VA3)),
= c0 + (c0 + ((c0))) (by (VA4)),
= (c0 +c0) + ((c0)) (by (VA2)),
= c0 + ((c0)) (from above),
= 0 (by (VA4)).
(c) We have
(1)u = (1)u +0 (by (VA3)),
= (1)u + (u + (u)) (by (VA4)),
= ((1)u +u) + (u) (by (VA2)),
= ((1)u + 1u) + (u) (by (SM5)),
= ((1) + 1)u + (u) (by (SM3)),
= 0u + (u) (since 1 R),
= 0 + (u) (from (a)),
= u (by (VA3)).
(d) Suppose that cu = 0 and c = 0. Then c
1
R and
u = 1u (by (SM5)),
= (c
1
c)u (since c R \ {0}),
= c
1
(cu) (by (SM4)),
= c
1
0 (assumption),
= 0 (from (b)),
as required.
5.2. Subspaces
Example 5.2.1. Consider the vector space R
2
of all points (x, y), where x, y R. Let L be a line
through the origin 0 = (0, 0). Suppose that L is represented by the equation x + y = 0; in other
words,
L = {(x, y) R
2
: x +y = 0}.
Note rst of all that 0 = (0, 0) L, so that (VA3) and (VA4) clearly hold in L. Also (VA2) and (VA5)
clearly hold in L. To check (VA1), note that if (x, y), (u, v) L, then x +y = 0 and u +v = 0, so
that (x +u) +(y +v) = 0, whence (x, y) +(u, v) = (x +u, y +v) L. Next, note that (SM2)(SM5)
clearly hold in L. To check (SM1), note that if (x, y) L, then x +y = 0, so that (cx) +(cy) = 0,
whence c(x, y) = (cx, cy) L. It follows that L forms a vector space over R. In fact, we have shown
that every line in R
2
through the origin is a vector space over R.
Definition. Suppose that V is a vector space over R, and that W is a subset of V . Then we say that W
is a subspace of V if W forms a vector space over R under the vector addition and scalar multiplication
dened in V .
Example 5.2.2. We have just shown in Example 5.2.1 that every line in R
2
through the origin is a
subspace of R
2
. On the other hand, if we work through the example again, then it is clear that we have
really only checked conditions (VA1) and (SM1) for L, and that 0 = (0, 0) L.
PROPOSITION 5B. Suppose that V is a vector space over R, and that W is a non-empty subset
of V . Then W is a subspace of V if the following conditions are satised:
(SP1) For every u, v W, we have u +v W.
(SP2) For every c R and u W, we have cu W.
Proof. To show that W is a vector space over R, it remains to check that W satises (VA2)(VA5)
and (SM2)(SM5). To check (VA3) and (VA4) for W, it clearly suces to check that 0 W. Since W
is non-empty, there exists u W. Then it follows from (SP2) and Proposition 5A(a) that 0 = 0u W.
The remaining conditions (VA2), (VA5) and (SM2)(SM5) hold for all vectors in V , and hence also for
all vectors in W.
Example 5.2.3. Consider the vector space R
3
of all points (x, y, z), where x, y, z R. Let P be a plane
through the origin 0 = (0, 0, 0). Suppose that P is represented by the equation x + y + z = 0; in
other words,
P = {(x, y, z) R
2
: x +y +z = 0}.
To check (SP1), note that if (x, y, z), (u, v, w) P, then x + y + z = 0 and u + v + w = 0, so
that (x + u) + (y + v) + (z + w) = 0, whence (x, y, z) + (u, v, w) = (x + u, y + v, z + w) P. To
check (SP2), note that if (x, y, z) P, then x+y +z = 0, so that (cx) +(cy) +(cz) = 0, whence
c(x, y, z) = (cx, cy, cz) P. It follows that P is a subspace of R
3
. Next, let L be a line through the
origin 0 = (0, 0, 0). Suppose that (, , ) R
3
is a non-zero point on L. Then we can write
L = {t(, , ) : t R}.
Suppose that u = t(, , ) L and v = s(, , ) L, and that c R. Then
u +v = t(, , ) +s(, , ) = (t +s)(, , ) L,
giving (SP1). Also, cu = c(t(, , )) = (ct)(, , ) L, giving (SP2). It follows that L is a subspace
of R
3
. Finally, it is not dicult to see that both {0} and R
3
are subspaces of R
3
.
Example 5.2.4. Note that R
2
is not a subspace of R
3
. First of all, R
2
is not a subset of R
3
. Note also
that vector addition and scalar multiplication are dierent in R
2
and R
3
.
Example 5.2.5. Suppose that A is an mn matrix and 0 is the m1 zero column matrix. Consider
the system Ax = 0 of m homogeneous linear equations in the n unknowns x
1
, . . . , x
n
, where
x =
x
1
.
.
.
x
n
is interpreted as an element of the vector space R

n
, with usual vector addition and scalar multiplication.
Let S denote the set of all solutions of the system. Suppose that x, y S and c R. Then
A(x +y) = Ax +Ay = 0 +0 = 0,
giving (SP1). Also, A(cx) = c(Ax) = c0 = 0, giving (SP2). It follows that S is a subspace of R
n
. To
summarize, the space of solutions of a system of m homogeneous linear equations in n unknowns is a
subspace of R
n
.
Example 5.2.6. As a special case of Example 5.2.5, note that if we take two non-parallel planes in R
3
through the origin 0 = (0, 0, 0), then the intersection of these two planes is clearly a line through the
origin. However, each plane is a homogeneous equation in the three unknowns x, y, z R. It follows that
the intersection of the two planes is the collection of all solutions (x, y, z) R
3
of the system formed by
the two homogeneous equations in the three unknowns x, y, z representing these two planes. We have
already shown in Example 5.2.3 that the line representing all these solutions is a subspace of R
3
.
Example 5.2.7. We showed in Example 5.1.3 that the set M
2,2
(R) of all 2 2 matrices with entries
in R forms a vector space over R. Consider the subset
W =
a
11
a
12
a
21
0
: a
11
, a
12
, a
21
R
of M
2,2
(R). Since
a
11
a
12
a
21
0
b
11
b
12
b
21
0
a
11
+b
11
a
12
+b
12
a
21
+b
21
0
and c
a
11
a
12
a
21
0
ca
11
ca
12
ca
21
0
,
it follows that (SP1) and (SP2) are satised. Hence W is a subspace of M
2,2
(R).
Example 5.2.8. We showed in Example 5.1.4 that the set A of all functions of the form f : R R forms
a vector space over R. Let C
0
denote the set of all functions of the form f : R R which are continuous
at x = 2, and let C
1
denote the set of all functions of the form f : R R which are dierentiable at
x = 2. Then it follows from the arithmetic of limits and the arithmetic of derivatives that C
0
and C
1
are
both subspaces of A. Furthermore, C
1
is a subspace of C
0
(why?). On the other hand, let k N. Recall
from Example 5.1.6 the vector space P
k
of all polynomials of the form
p(x) = p
0
+p
1
x +. . . +p
k
x
k
, where p
0
, p
1
, . . . , p
k
R.
In other words, P
k
is the set of all polynomials of degree at most k and with coecients in R. Clearly
P
k
is a subspace of C
1
.
5.3. Linear Combination
In this section and the next two, we shall study ways of describing the vectors in a vector space V . Our
ultimate goal is to be able to determine a subset B of vectors in V and describe every element of V in
terms of elements of B in a unique way. The rst step in this direction is summarized below.
Definition. Suppose that v
1
, . . . , v
r
are vectors in a vector space V over R. By a linear combination
of the vectors v
1
, . . . , v
r
, we mean an expression of the type
c
1
v
1
+. . . +c
r
v
r
,
where c
1
, . . . , c
r
R.
Example 5.3.1. In R
2
, every vector (x, y) is a linear combination of the two vectors i = (1, 0) and
j = (0, 1), for clearly (x, y) = xi +yj.
Example 5.3.2. In R
3
, every vector (x, y, z) is a linear combination of the three vectors i = (1, 0, 0),
j = (0, 1, 0) and k = (0, 0, 1), for clearly (x, y, z) = xi +yj +zk.
Example 5.3.3. In R
4
, the vector (1, 4, 2, 6) is a linear combination of the two vectors (1, 2, 0, 4) and
(1, 1, 1, 3), for we have (1, 4, 2, 6) = 3(1, 2, 0, 4) 2(1, 1, 1, 3). On the other hand, the vector (2, 6, 0, 9)
is not a linear combination of the two vectors (1, 2, 0, 4) and (1, 1, 1, 3), for
(2, 6, 0, 9) = c
1
(1, 2, 0, 4) +c
2
(1, 1, 1, 3)
would lead to the system of four equations
c
1
+c
2
= 2,
2c
1
+c
2
= 6,
c
2
= 0,
4c
1
+ 3c
2
= 9.
It is easily checked that this system has no solutions.
Example 5.3.4. In the vector space A of all functions of the form f : R R described in Example
5.1.4, the function cos 2x is a linear combination of the three functions cos
2
x, cosh
2
x and sinh
2
x. It is
not too dicult to check that
cos 2x = 2 cos
2
x + sinh
2
x cosh
2
x,
noting that cos 2x = 2 cos
2
x 1 and cosh
2
x sinh
2
x = 1.
We observe that in Example 5.3.1, every vector in R
2
is a linear combination of the two vectors i and j.
Similarly, in Example 5.3.2, every vector in R
3
is a linear combination of the three vectors i, j and k.
On the other hand, we observe that in Example 5.3.3, not every vector in R
4
is a linear combination of
the two vectors (1, 2, 0, 4) and (1, 1, 1, 3).
Let us therefore investigate the collection of all vectors in a vector space that can be represented as
linear combinations of a given set of vectors in V .
1
, . . . , v
r
are vectors in a vector space V over R. The set
span{v
1
, . . . , v
r
} = {c
1
v
1
+. . . +c
r
v
r
: c
1
, . . . , c
r
R}
is called the span of the vectors v
1
, . . . , v
r
. We also say that the vectors v
1
, . . . , v
r
span V if
span{v
1
, . . . , v
r
} = V ;
in other words, if every vector in V can be expressed as a linear combination of the vectors v
1
, . . . , v
r
.
Example 5.3.5. The two vectors i = (1, 0) and j = (0, 1) span R
2
.
Example 5.3.6. The three vectors i = (1, 0, 0), j = (0, 1, 0) and k = (0, 0, 1) span R
3
.
Example 5.3.7. The two vectors (1, 2, 0, 4) and (1, 1, 1, 3) do not span R
4
.
PROPOSITION 5C. Suppose that v
1
, . . . , v
r
are vectors in a vector space V over R.
(a) Then span{v
1
, . . . , v
r
} is a subspace of V .
(b) Suppose further that W is a subspace of V and v
1
, . . . , v
r
W. Then span{v
1
, . . . , v
r
} W.
Proof. (a) Suppose that u, w span{v
1
, . . . , v
r
} and c R. There exist a
1
, . . . , a
r
, b
1
, . . . , b
r
R such
that
u = a
1
v
1
+. . . +a
r
v
r
and w = b
1
v
1
+. . . +b
r
v
r
.
Then
u +w = (a
1
v
1
+. . . +a
r
v
r
) + (b
1
v
1
+. . . +b
r
v
r
)
= (a
1
+b
1
)v
1
+. . . + (a
r
+b
r
)v
r
span{v
1
, . . . , v
r
}
and
cu = c(a
1
v
1
+. . . +a
r
v
r
) = (ca
1
)v
1
+. . . + (ca
r
)v
r
span{v
1
, . . . , v
r
}.
It follows from Proposition 5B that span{v
1
, . . . , v
r
} is a subspace of V .
(b) Suppose that c
1
, . . . , c
r
R and u = c
1
v
1
+ . . . + c
r
v
r
span{v
1
, . . . , v
r
}. If v
1
, . . . , v
r
W,
then it follows from (SM1) for W that c
1
v
1
, . . . , c
r
v
r
W. It then follows from (VA1) for W that
u = c
1
v
1
+. . . +c
r
v
r
W.
Example 5.3.8. In R
2
, any non-zero vector v spans the subspace {cv : c R}. This is clearly a line
through the origin. Also, try to draw a picture to convince yourself that any two non-zero vectors that
are not on the same line span R
2
.
Example 5.3.9. In R
3
, try to draw pictures to convince yourself that any non-zero vector spans a
subspace which is a line through the origin; any two non-zero vectors that are not on the same line span
a subspace which is a plane through the origin; and any three non-zero vectors that do not lie on the
same plane span R
3
.
5.4. Linear Independence
We rst study two simple examples.
Example 5.4.1. Consider the three vectors v
1
= (1, 2, 3), v
2
= (3, 2, 1) and v
3
= (3, 3, 3) in R
3
. Then
span{v
1
, v
2
, v
3
} = {c
1
(1, 2, 3) +c
2
(3, 2, 1) +c
3
(3, 3, 3) : c
1
, c
2
, c
3
R}
= {(c
1
+ 3c
2
+ 3c
3
, 2c
1
+ 2c
2
+ 3c
3
, 3c
1
+c
2
+ 3c
3
) : c
1
, c
2
, c
3
R}.
Write (x, y, z) = (c
1
+ 3c
2
+ 3c
3
, 2c
1
+ 2c
2
+ 3c
3
, 3c
1
+c
2
+ 3c
3
). Then it is not dicult to see that
x
y
z
1 3 3
2 2 3
3 1 3
c
1
c
2
c
3
,
and so (do not worry if you cannot understand why we take this next step)
( 1 2 1 )
x
y
z
= ( 1 2 1 )
1 3 3
2 2 3
3 1 3
c
1
c
2
c
3
= ( 0 0 0 )
c
1
c
2
c
3
= ( 0 ) ,
so that x 2y +z = 0. It follows that span{v
1
, v
2
, v
3
} is a plane through the origin and not R
3
. Note,
in fact, that 3v
1
+ 3v
2
4v
3
= 0. Note also that
det
1 3 3
2 2 3
3 1 3
= 0.
1
= (1, 1, 0), v
2
= (5, 1, 3) and v
3
= (2, 7, 4) in R
3
. Then
span{v
1
, v
2
, v
3
} = {c
1
(1, 1, 0) +c
2
(5, 1, 3) +c
3
(2, 7, 4) : c
1
, c
2
, c
3
R}
= {(c
1
+ 5c
2
+ 2c
3
, c
1
+c
2
+ 7c
3
, 3c
2
+ 4c
3
) : c
1
, c
2
, c
3
R}.
Write (x, y, z) = (c
1
+ 5c
2
+ 2c
3
, c
1
+c
2
+ 7c
3
, 3c
2
+ 4c
3
). Then it is not dicult to see that
x
y
z
1 5 2
1 1 7
0 3 4
c
1
c
2
c
3
,
so that
25 26 33
4 4 5
3 3 4
x
y
z
25 26 33
4 4 5
3 3 4
1 5 2
1 1 7
0 3 4
c
1
c
2
c
3
1 0 0
0 1 0
0 0 1
c
1
c
2
c
3
c
1
c
2
c
3
.
It follows that for every (x, y, z) R
3
, we can nd c
1
, c
2
, c
3
R such that (x, y, z) = c
1
v
1
+c
2
v
2
+c
3
v
3
.
Hence span{v
1
, v
2
, v
3
} = R
3
. Note that
det
1 5 2
1 1 7
0 3 4
= 0,
and that the only solution for
(0, 0, 0) = c
1
v
1
+c
2
v
2
+c
3
v
3
is c
1
= c
2
= c
3
= 0.
1
, . . . , v
r
are vectors in a vector space V over R.
(LD) We say that v
1
, . . . , v
r
are linearly dependent if there exist c
1
, . . . , c
r
R, not all zero, such that
c
1
v
1
+. . . +c
r
v
r
= 0.
(LI) We say that v
1
, . . . , v
r
are linearly independent if they are not linearly dependent; in other words,
if the only solution of c
1
v
1
+. . . +c
r
v
r
= 0 in c
1
, . . . , c
r
R is given by c
1
= . . . = c
r
= 0.
Example 5.4.3. Let us return to Example 5.4.1 and consider again the three vectors v
1
= (1, 2, 3),
v
2
= (3, 2, 1) and v
3
= (3, 3, 3) in R
3
. Consider the equation c
1
v
1
+ c
2
v
2
+ c
3
v
3
= 0. This can be
rewritten in matrix form as
1 3 3
2 2 3
3 1 3
c
1
c
2
c
3
0
0
0
.
Since
det
1 3 3
2 2 3
3 1 3
= 0,
the system has non-trivial solutions; for example, (c
1
, c
2
, c
3
) = (3, 3, 4), so that 3v
1
+ 3v
2
4v
3
= 0.
Hence v
1
, v
2
, v
3
are linearly dependent.
Example 5.4.4. Let us return to Example 5.4.2 and consider again the three vectors v
1
= (1, 1, 0),
v
2
= (5, 1, 3) and v
3
= (2, 7, 4) in R
3
. Consider the equation c
1
v
1
+ c
2
v
2
+ c
3
v
3
= 0. This can be
rewritten in matrix form as
1 5 2
1 1 7
0 3 4
c
1
c
2
c
3
0
0
0
.
Since
det
1 5 2
1 1 7
0 3 4
= 0,
the only solution is c
1
= c
2
= c
3
= 0. Hence v
1
, v
2
, v
3
are linearly independent.
Example 5.4.5. In the vector space A of all functions of the form f : R R described in Example 5.1.4,
the functions x, x
2
and sin x are linearly independent. To see this, note that for every c
1
, c
2
, c
3
R, the
linear combination c
1
x +c
2
x
2
+c
3
sin x is never identically zero unless c
1
= c
2
= c
3
= 0.
Example 5.4.6. In R
n
, the vectors e
1
, . . . , e
n
, where
e
j
= (0, . . . , 0

j1
, 1, 0, . . . , 0

nj
) for every j = 1, . . . , n,
are linearly independent (why?).
We observe in Examples 5.4.35.4.4 that the determination of whether a collection of vectors in R
3
are linearly dependent is based on whether a system of homogeneous linear equations has non-trivial
solutions. The same idea can be used to prove the following result concerning R
n
.
PROPOSITION 5D. Suppose that v
1
, . . . , v
r
are vectors in the vector space R
n
. If r > n, then
v
1
, . . . , v
r
Proof. For every j = 1, . . . , r, write
v
j
= (a
1j
, . . . , a
nj
).
Then the equation c
1
v
1
+. . . +c
r
v
r
= 0 can be rewritten in matrix form as
a
11
. . . a
1r
.
.
.
.
.
.
a
n1
. . . a
nr
c
1
.
.
.
c
r
0
.
.
.
0
.
If r > n, then there are more variables than equations. It follows that there must be non-trivial solutions
c
1
, . . . , c
r
R. Hence v
1
, . . . , v
r
Remarks. (1) Consider two vectors v
1
= (a
11
, a
21
) and v
2
= (a
12
, a
22
) in R
2
. To study linear indepen-
dence, we consider the equation c
1
v
1
+c
2
v
2
= 0, which can be written in matrix form as
a
11
a
12
a
21
a
22
c
1
c
2
0
0
.
The vectors v
1
and v
2
are linearly independent precisely when
det
a
11
a
12
a
21
a
22
= 0.
This can be interpreted geometrically in the following way: The area of the parallelogram formed by
the two vectors v
1
and v
2
is in fact equal to the absolute value of the determinant of the matrix formed
with v
1
and v
2
as the columns; in other words,
det
a
11
a
12
a
21
a
22
.
It follows that the two vectors are linearly dependent precisely when the parallelogram has zero area;
in other words, when the two vectors lie on the same line. On the other hand, if the parallelogram has
positive area, then the two vectors are linearly independent.
(2) Consider three vectors v
1
= (a
11
, a
21
, a
31
), v
2
= (a
12
, a
22
, a
32
), and v
3
= (a
13
, a
23
, a
33
) in R
3
. To
study linear independence, we consider the equation c
1
v
1
+ c
2
v
2
+ c
3
v
3
= 0, which can be written in
matrix form as
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
c
1
c
2
c
3
0
0
0
.
The vectors v
1
, v
2
and v
3
are linearly independent precisely when
det
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
= 0.
This can be interpreted geometrically in the following way: The volume of the parallelepiped formed by
the three vectors v
1
, v
2
and v
3
is in fact equal to the absolute value of the determinant of the matrix
formed with v
1
, v
2
and v
3
as the columns; in other words,
det
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
.
It follows that the three vectors are linearly dependent precisely when the parallelepiped has zero volume;
in other words, when the three vectors lie on the same plane. On the other hand, if the parallelepiped
has positive volume, then the three vectors are linearly independent.
(3) What is the geometric interpretation of two linearly independent vectors in R
3
? Well, note that
if v
1
and v
2
are non-zero and linearly dependent, then there exist c
1
, c
2
R, not both zero, such that
c
1
v
1
+ c
2
v
2
= 0. This forces the two vectors to be multiples of each other, so that they lie on the
same line, whence the parallelogram they form has zero area. It follows that if two vectors in R
3
form
a parallelogram with positive area, then they are linearly independent.
5.5. Basis and Dimension
In this section, we complete the task of describing uniquely every element of a vector space V in terms
of the elements of a suitable subset B. To motivate the ideas, we rst consider an example.
Example 5.5.1. Let us consider the three vectors v
1
= (1, 1, 0), v
2
= (5, 1, 3) and v
3
= (2, 7, 4) in R
3
,
as in Examples 5.4.2 and 5.4.4. We have already shown that span{v
1
, v
2
, v
3
} = R
3
, and that the vectors
v
1
, v
2
, v
3
are linearly independent. Furthermore, we have shown that for every u = (x, y, z) R
3
, we
can write u = c
1
v
1
+c
2
v
2
+c
3
v
3
, where c
1
, c
2
, c
3
R are determined uniquely by
c
1
c
2
c
3
25 26 33
4 4 5
3 3 4
x
y
z
.
1
, . . . , v
r
are vectors in a vector space V over R. We say that {v
1
, . . . , v
r
}
is a basis for V if the following two conditions are satised:
(B1) We have span{v
1
, . . . , v
r
} = V .
(B2) The vectors v
1
, . . . , v
r
Example 5.5.2. Consider two vectors v
1
= (a
11
, a
21
) and v
2
= (a
12
, a
22
) in R
2
. Suppose that
det
a
11
a
12
a
21
a
22
= 0;
in other words, suppose that the parallelogram formed by the two vectors has non-zero area. Then it
follows from Remark (1) in Section 5.4 that v
1
and v
2
are linearly independent. Furthermore, for every
u = (x, y) R
2
, there exist c
1
, c
2
R such that u = c
1
v
1
+ c
2
v
2
. Indeed, c
1
and c
2
are determined as
the unique solution of the system
a
11
a
12
a
21
a
22
c
1
c
2
x
y
.
Hence span{v
1
, v
2
} = R
2
. It follows that {v
1
, v
2
} is a basis for R
2
.
Example 5.5.3. Consider three vectors of the type v
1
= (a
11
, a
21
, a
31
), v
2
= (a
12
, a
22
, a
32
) and v
3
=
(a
13
, a
23
, a
33
) in R
3
. Suppose that
det
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
= 0;
in other words, suppose that the parallelepiped formed by the three vectors has non-zero volume. Then
it follows from Remark (2) in Section 5.4 that v
1
, v
2
and v
3
are linearly independent. Furthermore, for
every u = (x, y, z) R
3
, there exist c
1
, c
2
, c
3
R such that u = c
1
v
1
+ c
2
v
2
+ c
3
v
3
. Indeed, c
1
, c
2
and
c
3
are determined as the unique solution of the system
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
c
1
c
2
c
3
x
y
z
.
Hence span{v
1
, v
2
, v
3
} = R
3
. It follows that {v
1
, v
2
, v
3
} is a basis for R
3
.
Example 5.5.4. In R
n
, the vectors e
1
, . . . , e
n
, where
e
j
= (0, . . . , 0

j1
, 1, 0, . . . , 0

nj
) for every j = 1, . . . , n,
are linearly independent and span R
n
. Hence {e
1
, . . . , e
n
} is a basis for R
n
. This is known as the
standard basis for R
n
.
Example 5.5.5. In the vector space M
2,2
(R) of all 2 2 matrices with entries in R as discussed in
Example 5.1.3, the set
1 0
0 0
0 1
0 0
0 0
1 0
0 0
0 1
is a basis.
Example 5.5.6. In the vector space P
k
of polynomials of degree at most k and with coecients in R
as discussed in Example 5.1.6, the set {1, x, x
2
, . . . , x
k
} is a basis.
PROPOSITION 5E. Suppose that {v
1
, . . . , v
r
} is a basis for a vector space V over R. Then every
element u V can be expressed uniquely in the form
u = c
1
v
1
+. . . +c
r
v
r
, where c
1
, . . . , c
r
R.
Proof. Since u V = span{v
1
, . . . , v
r
}, there exist c
1
, . . . , c
r
R such that u = c
1
v
1
+ . . . + c
r
v
r
.
Suppose now that b
1
, . . . , b
r
R such that
c
1
v
1
+. . . +c
r
v
r
= b
1
v
1
+. . . +b
r
v
r
.
Then
(c
1
b
1
)v
1
+. . . + (c
r
b
r
)v
r
= 0.
Since v
1
, . . . , v
r
are linearly independent, it follows that c
1
b
1
= . . . = c
r
b
r
= 0. Hence c
1
, . . . , c
r
are uniquely determined.
We have shown earlier that a vector space can have many bases. For example, any collection of three
vectors not on the same plane is a basis for R
3
. In the following discussion, we attempt to nd out some
properties of bases. However, we shall restrict our discussion to the following simple case.
Definition. A vector space V over R is said to be nite-dimensional if it has a basis containing only
nitely many elements.
Example 5.5.7. The vector spaces R
n
, M
2,2
(R) and P
k
that we have discussed earlier are all nite-
dimensional.
Recall that in R
n
, the standard basis has exactly n elements. On the other hand, it follows from
Proposition 5D that any basis for R
n
cannot contain more than n elements. However, can a basis for R
n
contain fewer than n elements?
We shall answer this question by showing that all bases for a given vector space have the same number
of elements. As a rst step, we establish the following generalization of Proposition 5D.
PROPOSITION 5F. Suppose that {v
1
, . . . , v
n
} is a basis for a vector space V over R. Suppose further
that r > n, and that the vectors u
1
, . . . , u
r
V . Then the vectors u
1
, . . . , u
r
Proof. Since {v
1
, . . . , v
n
} is a basis for the vector space V , we can write
u
1
= a
11
v
1
+. . . +a
n1
v
n
,
.
.
.
u
r
= a
1r
v
1
+. . . +a
nr
v
n
,
where a
ij
R for every i = 1, . . . , n and j = 1, . . . , r. Let c
1
, . . . , c
r
R. Since v
1
, . . . , v
n
are linearly
independent, it follows that if
c
1
u
1
+. . . +c
r
u
r
= c
1
(a
11
v
1
+. . . +a
n1
v
n
) +. . . +c
r
(a
1r
v
1
+. . . +a
nr
v
n
)
= (a
11
c
1
+. . . +a
1r
c
r
)v
1
+. . . + (a
n1
c
1
+. . . +a
nr
c
r
)v
n
= 0,
then a
11
c
1
+. . . +a
1r
c
r
= . . . = a
n1
c
1
+. . . +a
nr
c
r
= 0; in other words, we have the homogeneous system
a
11
. . . a
1r
.
.
.
.
.
.
a
n1
. . . a
nr
c
1
.
.
.
c
r
0
.
.
.
0
.
If r > n, then there are more variables than equations. It follows that there must be non-trivial solutions
c
1
, . . . , c
r
R. Hence u
1
, . . . , u
r
PROPOSITION 5G. Suppose that V is a nite-dimensional vector space V over R. Then any two
bases for V have the same number of elements.
Proof. Note simply that by Proposition 5F, the vectors in the basis with more elements must be
linearly dependent, and so cannot be a basis.
We are now in a position to make the following denition.
Definition. Suppose that V is a nite-dimensional vector space over R. Then we say that V is of
dimension n if a basis for V contains exactly n elements.
Example 5.5.8. The vector space R
n
has dimension n.
Example 5.5.9. The vector space M
2,2
(R) of all 2 2 matrices with entries in R, as discussed in
Example 5.1.3, has dimension 4.
Example 5.5.10. The vector space P
k
of all polynomials of degree at most k and with coecients in R,
as discussed in Example 5.1.6, has dimension (k + 1).
Example 5.5.11. Recall Example 5.2.5, where we showed that the set of solutions to a system of m
homogeneous linear equations in n unknowns is a subspace of R
n
. Consider now the homogeneous system
1 3 5 1 5
1 4 7 3 2
1 5 9 5 9
0 3 6 2 1
x
1
x
2
x
3
x
4
x
5
0
0
0
0
.
The solutions can be described in the form
x = c
1
1
2
1
0
0
+c
2
1
3
0
5
1
,
where c
1
, c
2
R (the reader must check this). It can be checked that (1, 2, 1, 0, 0) and (1, 3, 0, 5, 1)
are linearly independent and so form a basis for the space of solutions of the system. It follows that the
space of solutions of the system has dimension 2.
Suppose that V is an n-dimensional vector space over R. Then any basis for V consists of exactly n
linearly independent vectors in V . Suppose now that we have a set of n linearly independent vectors
in V . Will this form a basis for V ?
We have already answered this question in the armative in the cases when the vector space is R
2
or R
3
. To seek an answer to the general case, we rst establish the following result.
PROPOSITION 5H. Suppose that V is a nite-dimensional vector space over R. Then any nite set
of linearly independent vectors in V can be expanded, if necessary, to a basis for V .
Proof. Let S = {v
1
, . . . , v
k
} be a nite set of linearly independent vectors in V . If S spans V , then
the proof is complete. If S does not span V , then there exists v
k+1
V that is not a linear combination
of the elements of S. The set T = {v
1
, . . . , v
k
, v
k+1
} is a nite set of linearly independent vectors in V ;
for otherwise, there exist c
1
, . . . , c
k
, c
k+1
, not all zero, such that
c
1
v
1
+. . . +c
k
v
k
+c
k+1
v
k+1
= 0.
If c
k+1
= 0, then c
1
v
1
+ . . . + c
k
v
k
= 0, contradicting the assumption that S is a nite set of linearly
independent vectors in V . If c
k+1
= 0, then
v
k+1
=
c
1
c
k+1
v
1
. . .
c
k
c
k+1
v
k
,
contradicting the assumption that v
k+1
is not a linear combination of the elements of S. We now study
the nite set T of linearly independent vectors in V . If T spans V , then the proof is complete. If T does
not span V , then we repeat the argument. Note that the number of vectors in a linearly independent
expansion of S cannot exceed the dimension of V , in view of Proposition 5F. So eventually some linearly
independent expansion of S will span V .
PROPOSITION 5J. Suppose that V is an n-dimensional vector space over R. Then any set of n
linearly independent vectors in V is a basis for V .
Proof. Let S be a set of n linearly independent vectors in V . By Proposition 5H, S can be expanded,
if necessary, to a basis for V . By Proposition 5F, any expansion of S will result in a linearly dependent
set of vectors in V . It follows that S is already a basis for V .
1
= (1, 2, 3), v
2
= (3, 2, 1) and v
3
= (3, 3, 3) in R
3
, as
in Examples 5.4.1 and 5.4.3. We showed that these three vectors are linearly dependent, and span the
plane x 2y +z = 0. Note that
v
3
=
3
4
v
1
+
3
4
v
2
,
and that v
1
and v
2
are linearly independent. Consider now the vector v
4
= (0, 0, 1). Note that v
4
does
not lie on the plane x 2y +z = 0, so that {v
1
, v
2
, v
4
} form a linearly independent set. It follows that
{v
1
, v
2
, v
4
} is a basis for R
3
.
1. Determine whether each of the following subsets of R
3
is a subspace of R
3
:
a) {(x, y, z) R
3
: x = 0} b) {(x, y, z) R
3
: x +y = 0}
c) {(x, y, z) R
3
: xz = 0} d) {(x, y, z) R
3
: y 0}
e) {(x, y, z) R
3
: x = y = z}
2. For each of the following collections of vectors, determine whether the rst vector is a linear com-
bination of the remaining ones:
a) (1, 2, 3); (1, 0, 1), (2, 1, 0) in R
3
b) x
3
+ 2x
2
+ 3x + 1; x
3
, x
2
+ 3x, x
2
+ 1 in P
4
c) (1, 3, 5, 7); (1, 0, 1, 0), (0, 1, 0, 1), (0, 0, 1, 1) in R
4
3. For each of the following collections of vectors, determine whether the vectors are linearly indepen-
dent:
a) (1, 2, 3), (1, 0, 1), (2, 1, 0) in R
3
b) (1, 2), (3, 5), (1, 3) in R
2
c) (2, 5, 3, 6), (1, 0, 0, 1), (4, 0, 9, 6) in R
4
d) x
2
+ 1, x + 1, x
2
+x in P
3
4. Find the volume of the parallelepiped in R
3
formed by the vectors (1, 2, 3), (1, 0, 1) and (3, 0, 2).
5. Let S be the set of all functions y that satisfy the dierential equation
2
d
2
y
dx
2
3
dy
dx
+y = 0.
Show that S is a subspace of the vector space A described in Example 5.1.4.
6. For each of the sets in Problem 1 which is a subspace of R
3
, nd a basis for the subspace, and then
extend it to a basis for R
3
.
LINEAR ALGEBRA
W W L CHEN
c W W L Chen, 1994, 2008.
Chapter 6
VECTOR SPACES
ASSOCIATED WITH MATRICES
6.1. Introduction
Consider an mn matrix
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
, (1)
with entries in R. Then the rows of A can be described as vectors in R
n
as
r
1
= (a
11
, . . . .a
1n
), . . . , r
m
= (a
m1
, . . . .a
mn
), (2)
while the columns of A can be described as vectors in R
m
as
c
1
=
a
11
.
.
.
a
m1
, . . . , c
n
=
a
1n
.
.
.
a
mn
. (3)
For simplicity, we sometimes write
A =
r
1
.
.
.
r
m
and A = ( c
1
. . . c
n
) .
We also consider the system of homogeneous equations Ax = 0.
Chapter 6 : Vector Spaces Associated with Matrices page 1 of 9
In this chapter, we shall be concerned with three vector spaces that arise from the matrix A.
Definition. Suppose that A is an mn matrix of the form (1), with entries in R.
(RS) The subspace span{r
1
, . . . , r
m
} of R
n
, where r
1
, . . . , r
m
are given by (2) and are the rows of the
matrix A, is called the row space of A.
(CS) The subspace span{c
1
, . . . , c
n
} of R
m
, where c
1
, . . . , c
n
are given by (3) and are the columns of
the matrix A, is called the column space of A.
(NS) The solution space of the system of homogeneous linear equations Ax = 0 is called the nullspace
of A.
Remarks. (1) To see that span{r
1
, . . . , r
m
} is a subspace of of R
n
and that span{c
1
, . . . , c
n
} is a
subspace of of R
m
, recall Proposition 5C.
(2) To see that the nullspace of A is a subspace of R
n
, recall Example 5.2.5.
6.2. Row Spaces
Our aim in this section is to nd a basis for the row space of a given matrix A with entries in R. This
task is made considerably easier by the following result.
PROPOSITION 6A. Suppose that the matrix B can be obtained from the matrix A by elementary row
operations. Then the row space of B is identical to the row space of A.
Proof. Clearly the rows of B are linear combinations of the rows of A, so that any linear combination
of the rows of B is a linear combination of the rows of A. Hence the row space of B is contained in the
row space of A. On the other hand, the rows of A are linear combinations of the rows of B, so a similar
argument shows that the row space of A is contained in the row space of B.
To nd a basis for the row space of A, we can now reduce A to row echelon form, and consider the
non-zero rows that result from this reduction. It is easily seen that these non-zero rows are linearly
independent.
Example 6.2.1. Let
A =
1 3 5 1 5
1 4 7 3 2
1 5 9 5 9
0 3 6 2 1
.
Then
r
1
= (1, 3, 5, 1, 5),
r
2
= (1, 4, 7, 3, 2),
r
3
= (1, 5, 9, 5, 9),
r
4
= (0, 3, 6, 2, 1).
Also the matrix A can be reduced to row echelon form as
1 3 5 1 5
0 1 2 2 7
0 0 0 1 5
0 0 0 0 0
.
It follows that
v
1
= (1, 3, 5, 1, 5), v
2
= (0, 1, 2, 2, 7), v
3
= (0, 0, 0, 1, 5)
form a basis for the row space of A.
Remark. Naturally, it is not necessary that the rst non-zero entry of a basis element has to be 1.
6.3. Column Spaces
Our aim in this section is to nd a basis for the column space of a given matrix A with entries in R.
Naturally, we can consider the transpose A
t
of A, and use the technique in Section 6.2 to nd a basis
for the row space of A
t
. This basis naturally gives rise to a basis for the column space of A.
Example 6.3.1. Let
A =
1 3 5 1 5
1 4 7 3 2
1 5 9 5 9
0 3 6 2 1
.
Then
A
t
=
1 1 1 0
3 4 5 3
5 7 9 6
1 3 5 2
5 2 9 1
.
The matrix A
t
can be reduced to row echelon form as
1 1 1 0
0 1 2 2
0 0 0 1
0 0 0 0
0 0 0 0
.
It follows that
w
1
= (1, 1, 1, 0), w
2
= (0, 1, 2, 2), w
3
= (0, 0, 0, 1)
form a basis for the row space of A
t
, and so a basis for the column space of A.
Alternatively, we may pursue the following argument, which shows that elementary row operations do
not aect the linear dependence relations among the columns of a matrix.
PROPOSITION 6B. Suppose that the matrix B can be obtained from the matrix A by elementary row
operations. Then any collection of columns of A are linearly independent if and only if the corresponding
collection of columns of B are linearly independent.
Proof. Let A
be a matrix made up of a collection of columns of A, and let B
be the matrix made

up of the corresponding collection of columns of B. Consider the two systems of homogeneous linear
equations
A
x = 0 and B
x = 0.
Since B
can be obtained from the matrix A
by elementary row operations, the two systems have the

same solution set. On the other hand, the columns of A
are linearly independent precisely when the

system A
x = 0 has only the trivial solution, precisely when the system B
x = 0 has only the trivial

solution, precsiely when the columns of B

To nd a basis for the column space of A, we can now reduce A to row echelon form, and consider
the pivot columns that result from this reduction. It is easily seen that these pivot columns are linearly
independent, and that any non-pivot column is a linear combination of the pivot columns.
Example 6.3.2. Let
A =
1 3 5 1 5
1 4 7 3 2
1 5 9 5 9
0 3 6 2 1
.
Then A can be reduced to row echelon form as
1 3 5 1 5
0 1 2 2 7
0 0 0 1 5
0 0 0 0 0
.
It follows that the pivot columns of A are the rst, second and fourth columns. Hence
u
1
= (1, 1, 1, 0), u
2
= (3, 4, 5, 3), u
3
= (1, 3, 5, 2)
form a basis for the column space of A.
6.4. Rank of a Matrix
For the matrix
A =
1 3 5 1 5
1 4 7 3 2
1 5 9 5 9
0 3 6 2 1
,
we have shown that the row space has dimension 3, and so does the column space. In fact, we have the
following important result.
PROPOSITION 6C. For any matrix A with entries in R, the dimension of the row space is the same
as the dimension of the column space.
Proof. For any matrix A, we can reduce A to row echelon form. Then the dimension of the row space
of A is equal to the number of non-zero rows in the row echelon form. On the other hand, the dimension
of the column space of A is equal to the number of pivot columns in the row echelon form. However, the
number of non-zero rows in the row echelon form is the same as the number of pivot columns.
Definition. The rank of a matrix A, denoted by rank(A), is equal to the common value of the dimension
of its row space and the dimension of its column space.
A =
1 3 5 1 5
1 4 7 3 2
1 5 9 5 9
0 3 6 2 1
has rank 3.
6.5. Nullspaces
A =
1 3 5 1 5
1 4 7 3 2
1 5 9 5 9
0 3 6 2 1
.
We showed in Example 5.5.11 that the space of solutions of Ax = 0 has dimension 2. In other words,
the nullspace of A has dimension 2. Note that in this particular case, the dimension of the nullspace of
A and the dimension of the column space of A have a sum of 5, the number of columns of A.
Recall now that the nullspace of A is a subspace of R
n
, where n is the number of columns of the
matrix A.
PROPOSITION 6D. For any matrix A with entries in R, the sum of the dimension of its column
space and the dimension of its nullspace is equal to the number of columns of A.
Sketch of Proof. We consider the system of homogeneous linear equations Ax = 0, and reduce A
to row echelon form. The number of leading variables is now equal to the dimension of the row space
of A, and so equal to the dimension of the column space of A. On the other hand, the number of free
variables is equal to the dimension of the space of solutions, which is the nullspace. Note now that the
total number of variables is equal to the number of columns of A.
Remark. Proposition 6D is sometimes known as the Rank-nullity theorem, where the nullity of a matrix
is the dimension of its nullspace.
We conclude this section by stating the following result for square matrices.
PROPOSITION 6E. Suppose that A is an n n matrix with entries in R. Then the following
statements are equivalent:
(a) A can be reduced to I
n
by elementary row operations.
(b) A is invertible.
(c) det A = 0.
(d) The system Ax = 0 has only the trivial solution.
(e) The system Ax = b is soluble for every b R
n
.
(f ) The rows of A are linearly independent.
(g) The columns of A are linearly independent.
(h) A has rank n.
6.6. Solution of Non-Homogeneous Systems
Consider now a non-homogeneous system of equations
Ax = b, (4)
where
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
and b =
b
1
.
.
.
b
m
, (5)
with entries in R.
Our aim here is to determine whether a given system (4) has a solution without making any attempt
to actually nd any solution.
Note rst of all that
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
x
1
.
.
.
x
n
a
11
x
1
+ . . . + a
1n
x
n
.
.
.
a
m1
x
1
+ . . . + a
mn
x
n
= x
1
a
11
.
.
.
a
m1
+ . . . + x
n
a
1n
.
.
.
a
mn
.
It follows that Ax can be described by
Ax = x
1
c
1
+ . . . + x
n
c
n
,
where c, . . . , c
n
are dened by (3) and are the columns of A. In other words, Ax is a linear combination
of the columns of A. It follows that if the system (4) has a solution, then b must be a linear combination
of the columns of A. This means that b must belong to the column space of A, so that the two matrices
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
and (A|b) =
a
11
. . . a
1n
b
1
.
.
.
.
.
.
.
.
.
a
m1
. . . a
mn
b
m
(6)
must have the same (column) rank.
On the other hand, if the two matrices A and (A|b) have the same rank, then b must be a linear
combination of the columns of A, so that
b = x
1
c
1
+ . . . + x
n
c
n
for some x
1
, . . . , x
n
R. This gives a solution of the system (4).
We have just proved the following result.
PROPOSITION 6F. For any matrix A with entries in R, the non-homogeneous system of equations
Ax = b has a solution if and only if the matrices A and (A|b) have the same rank. Here (A|b) is dened
by (5) and (6).
A =
1 3 5 1 5
1 4 7 3 2
1 5 9 5 9
0 3 6 2 1
and b =
1
2
3
3
.
We have already shown that rank(A) = 3. Now
(A|b) =
1 3 5 1 5 1
1 4 7 3 2 2
1 5 9 5 9 3
0 3 6 2 1 3
1 3 5 1 5 1
0 1 2 2 7 1
0 0 0 1 5 0
0 0 0 0 0 0
,
so that rank(A|b) = 3. It follows that the system has a solution.
A =
1 3 5 1 5
1 4 7 3 2
1 5 9 5 9
0 3 6 2 1
and b =
1
2
4
3
.
We have already shown that rank(A) = 3. Now
(A|b) =
1 3 5 1 5 1
1 4 7 3 2 2
1 5 9 5 9 4
0 3 6 2 1 3
1 3 5 1 5 1
0 1 2 2 7 1
0 0 0 1 5 0
0 0 0 0 0 1
,
so that rank(A|b) = 4. It follows that the system has no solution.
Remark. The matrix (A|b) is sometimes known as the augmented matrix.
We conclude this chapter by describing the set of all solutions of a non-homogeneous system of equa-
tions.
PROPOSITION 6G. Suppose that A is a matrix with entries in R. Suppose further that x
0
is a
solution of the non-homogeneous system of equations Ax = b, and that {v
1
, . . . , v
r
} is a basis for the
nullspace of A. Then every solution of the system Ax = b can be written in the form
x = x
0
+ c
1
v
1
+ . . . + c
r
v
r
, where c
1
, . . . , c
r
R. (7)
On the other hand, every vector of the form (7) is a solution to the system Ax = b.
Proof. Let x be any solution of the system Ax = b. Since Ax
0
= b, it follows that A(x x
0
) = 0.
Hence there exist c
1
, . . . , c
r
R such that
x x
0
= c
1
v
1
+ . . . + c
r
v
r
,
giving (7). On the other hand, it follows from (7) that
Ax = A(x
0
+ c
1
v
1
+ . . . + c
r
v
r
) = Ax
0
+ c
1
Av
1
+ . . . + c
r
Av
r
= b +0 + . . . +0 = b.
Hence every vector of the form (7) is a solution to the system Ax = b.
A =
1 3 5 1 5
1 4 7 3 2
1 5 9 5 9
0 3 6 2 1
and b =
1
2
3
3
.
We have already shown in Example 5.5.11 that v
1
= (1, 2, 1, 0, 0) and v
2
= (1, 3, 0, 5, 1) form a
basis for the nullspace of A. On the other hand, x
0
= (4, 0, 1, 5, 1) is a solution of the non-homogeneous
system. It follows that the solutions of the non-homogeneous system are given by
x = (4, 0, 1, 5, 1) + c
1
(1, 2, 1, 0, 0) + c
2
(1, 3, 0, 5, 1) where c
1
, c
2
R.
Example 6.6.4. Consider the non-homogeneous system x 2y + z = 2 in R
3
. Note that this system
has only one equation. The corresponding homogeneous system is given by x 2y + z = 0, and this
represents a plane through the origin. It is easily seen that (1, 1, 1) and (2, 1, 0) form a basis for the
solution space of x 2y + z = 0. On the other hand, note that (1, 0, 1) is a solution of x 2y + z = 2.
It follows that the solutions of x 2y + z = 2 are of the form
(x, y, z) = (1, 0, 1) + c
1
(1, 1, 1) + c
2
(2, 1, 0), where c
1
, c
2
R.
Try to draw a picture for this problem.
1. For each of the following matrices, nd a basis for the row space and a basis for the column space
by rst reducing the matrix to row echelon form:
a)
5 9 3
3 5 6
1 5 3
b)
1 2 4 1 5
1 2 3 1 3
1 2 0 4 3
c)
1 1 2
1 3 8
4 3 7
1 12 3
d)
1 2 3 4
0 1 1 5
3 4 11 2
2. For each of the following matrices, determine whether the non-homogeneous system of linear equa-
tions Ax = b has a solution:
a) A =
5 9 3
3 5 6
1 5 3
and b =
4
2
6
b) A =
1 2 4 1 5
1 2 3 1 3
1 2 0 4 3
and b =
3
5
7
c) A =
1 1 2
1 3 8
4 3 7
1 12 3
and b =
0
1
2
4
d) A =
1 2 3 4
0 1 1 5
3 4 11 2
and b =
1
0
1

LINEAR ALGEBRA
W W L CHEN
c W W L Chen, 1982, 2008.
Chapter 7
EIGENVALUES AND EIGENVECTORS
7.1. Introduction
Example 7.1.1. Consider a function f : R
2
R
2
, dened for every (x, y) R
2
by f(x, y) = (s, t),
where
s
t
3 3
1 5
x
y
.
Note that
3 3
1 5
3
1
6
2
= 2
3
1
and
3 3
1 5
1
1
6
6
= 6
1
1
.
On the other hand, note that
v
1
=
3
1
and v
2
=
1
1
form a basis for R

2
. It follows that every u R
2
can be written uniquely in the form u = c
1
v
1
+ c
2
v
2
,
where c
1
, c
2
R, so that
Au = A(c
1
v
1
+ c
2
v
2
) = c
1
Av
1
+ c
2
Av
2
= 2c
1
v
1
+ 6c
2
v
2
.
Note that in this case, the function f : R
2
R
2
can be described easily in terms of the two special
vectors v
1
and v
2
and the two special numbers 2 and 6. Let us now examine how these special vectors
and numbers arise. We hope to nd numbers R and non-zero vectors v R
2
such that
3 3
1 5
v = v.
Chapter 7 : Eigenvalues and Eigenvectors page 1 of 13
Since
v =
1 0
0 1
v =
0
0
v,
we must have
3 3
1 5
0
0
v = 0.
In other words, we must have
3 3
1 5
v = 0. (1)
In order to have non-zero v R
2
, we must therefore ensure that
det
3 3
1 5
= 0.
Hence (3 )(5 ) 3 = 0, with roots
1
= 2 and
2
= 6. Substituting = 2 into (1), we obtain
1 3
1 3
v = 0, with root v
1
=
3
1
.
Substituting = 6 into (1), we obtain
3 3
1 1
v = 0, with root v
2
=
1
1
.
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
(2)
is an n n matrix with entries in R. Suppose further that there exist a number R and a non-zero
vector v R
n
such that Av = v. Then we say that is an eigenvalue of the matrix A, and that v is
an eigenvector corresponding to the eigenvalue .
Suppose that is an eigenvalue of the n n matrix A, and that v is an eigenvector corresponding to
the eigenvalue . Then Av = v = Iv, where I is the n n identity matrix, so that (A I)v = 0.
Since v R
n
is non-zero, it follows that we must have
det(AI) = 0. (3)
det
a
11
a
12
. . . a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
a
n1
a
n2
. . . a
nn
= 0.
Note that (3) is a polynomial equation. Solving this equation (3) gives the eigenvalues of the matrix A.
On the other hand, for any eigenvalue of the matrix A, the set
{v R
n
: (AI)v = 0} (4)
is the nullspace of the matrix AI, a subspace of R
n
.
Definition. The polynomial (3) is called the characteristic polynomial of the matrix A. For any root
of (3), the space (4) is called the eigenspace corresponding to the eigenvalue .
3 3
1 5
has characteristic polynomial (3 )(5 ) 3 = 0; in other words,

2
8 + 12 = 0. Hence the
eigenvalues are
1
= 2 and
2
= 6, with corresponding eigenvectors
v
1
=
3
1
and v
2
=
1
1
respectively. The eigenspace corresponding to the eigenvalue 2 is
v R
2
:
1 3
1 3
v = 0
3
1
: c R
.
The eigenspace corresponding to the eigenvalue 6 is
v R
2
:
3 3
1 1
v = 0
1
1
: c R
.
A =
1 6 12
0 13 30
0 9 20
.
To nd the eigenvalues of A, we need to nd the roots of
det
1 6 12
0 13 30
0 9 20
= 0;
in other words, ( + 1)( 2)( 5) = 0. The eigenvalues are therefore
1
= 1,
2
= 2 and
3
= 5.
An eigenvector corresponding to the eigenvalue 1 is a solution of the system
(A + I)v =
0 6 12
0 12 30
0 9 21
v = 0, with root v
1
=
1
0
0
.
(A2I)v =
3 6 12
0 15 30
0 9 18
v = 0, with root v
2
=
0
2
1
.
(A5I)v =
6 6 12
0 18 30
0 9 15
v = 0, with root v
3
=
1
5
3
.
Note that the three eigenspaces are all lines through the origin. Note also that the eigenvectors v
1
, v
2
and v
3
are linearly independent, and so form a basis for R
3
.
A =
17 10 5
45 28 15
30 20 12
.
det
17 10 5
45 28 15
30 20 12
= 0;
in other words, ( + 3)( 2)
2
= 0. The eigenvalues are therefore
1
= 3 and
2
= 2. An eigenvector
corresponding to the eigenvalue 3 is a solution of the system
(A + 3I)v =
20 10 5
45 25 15
30 20 15
v = 0, with root v
1
=
1
3
2
.
(A2I)v =
15 10 5
45 30 15
30 20 10
v = 0, with roots v
2
=
1
0
3
and v
3
=
2
3
0
.
Note that the eigenspace corresponding to the eigenvalue 3 is a line through the origin, while the
eigenspace corresponding to the eigenvalue 2 is a plane through the origin. Note also that the eigenvectors
v
1
, v
2
and v
3
are linearly independent, and so form a basis for R
3
.
A =
2 1 0
1 0 0
0 0 3
.
det
2 1 0
1 0 0
0 0 3
= 0;
in other words, ( 3)( 1)
2
1
= 3 and
2
= 1. An eigenvector
corresponding to the eigenvalue 3 is a solution of the system
(A3I)v =
1 1 0
1 3 0
0 0 0
v = 0, with root v
1
=
0
0
1
.
(AI)v =
1 1 0
1 1 0
0 0 2
v = 0, with root v
2
=
1
1
0
.
Note that the eigenspace corresponding to the eigenvalue 3 is a line through the origin. On the other
hand, the matrix
1 1 0
1 1 0
0 0 2

has rank 2, and so the eigenspace corresponding to the eigenvalue 1 is of dimension 1 and so is also a
line through the origin. We can therefore only nd two linearly independent eigenvectors, so that R
3
does not have a basis consisting of linearly independent eigenvectors of the matrix A.
A =
3 3 2
1 1 2
1 3 4
.
det
3 3 2
1 1 2
1 3 4
= 0;
in other words, ( 2)
3
= 0. The eigenvalue is therefore = 2. An eigenvector corresponding to the
eigenvalue 2 is a solution of the system
(A2I)v =
1 3 2
1 3 2
1 3 2
v = 0, with roots v
1
=
2
0
1
and v
2
=
3
1
0
.
Note now that the matrix
1 3 2
1 3 2
1 3 2
has rank 1, and so the eigenspace corresponding to the eigenvalue 2 is of dimension 2 and so is a plane
through the origin. We can therefore only nd two linearly independent eigenvectors, so that R
3
does
not have a basis consisting of linearly independent eigenvectors of the matrix A.
Example 7.1.7. Suppose that is an eigenvalue of a matrix A, with corresponding eigenvector v. Then
A
2
v = A(Av) = A(v) = (Av) = (v) =
2
v.
Hence
2
is an eigenvalue of the matrix A
2
, with corresponding eigenvector v. In fact, it can be proved by
induction that for every natural number k N,
k
is an eigenvalue of the matrix A
k
, with corresponding
eigenvector v.
1 5 4
0 2 6
0 0 3
.
det
1 5 4
0 2 6
0 0 3
= 0;
in other words, ( 1)( 2)( 3) = 0. It follows that the eigenvalues of the matrix A are given by
the entries on the diagonal. In fact, this is true for all triangular matrices.
7.2. The Diagonalization Problem
Example 7.2.1. Let us return to Examples 7.1.1 and 7.1.2, and consider again the matrix
A =
3 3
1 5
.
We have already shown that the matrix A has eigenvalues
1
= 2 and
2
= 6, with corresponding
eigenvectors
v
1
=
3
1
and v
2
=
1
1
respectively. Since the eigenvectors form a basis for R

2
, every u R
2
can be written uniquely in the
form
u = c
1
v
1
+ c
2
v
2
, where c
1
, c
2
R, (5)
and
Au = 2c
1
v
1
+ 6c
2
v
2
. (6)
Write
c =
c
1
c
2
, u =
x
y
, Au =
s
t
.
Then (5) and (6) can be rewritten as
x
y
3 1
1 1
c
1
c
2
(7)
and
s
t
3 1
1 1
2c
1
6c
2
3 1
1 1
2 0
0 6
c
1
c
2
(8)
respectively. If we write
P =
3 1
1 1
and D =
2 0
0 6
,
then (7) and (8) become u = Pc and Au = PDc respectively, so that APc = PDc. Note that c R
2
is
arbitrary. This implies that (AP PD)c = 0 for every c R
2
. Hence we must have AP = PD. Since
P is invertible, we conclude that
P
1
AP = D.
Note here that
P = ( v
1
v
2
) and D =
1
0
0
2
.
Note also the crucial point that the eigenvectors of A form a basis for R
2
.
We now consider the problem in general.
PROPOSITION 7A. Suppose that A is an nn matrix, with entries in R. Suppose further that A has
eigenvalues
1
, . . . ,
n
R, not necessarily distinct, with corresponding eigenvectors v
1
, . . . , v
n
R
n
,
and that v
1
, . . . , v
n
are linearly independent. Then
P
1
AP = D,
where
P = ( v
1
. . . v
n
) and D =
1
.
.
.
.
Proof. Since v
1
, . . . , v
n
are linearly independent, they form a basis for R
n
, so that every u R
n
can
be written uniquely in the form
u = c
1
v
1
+ . . . + c
n
v
n
, where c
1
, . . . , c
n
R, (9)
and
Au = A(c
1
v
1
+ . . . + c
n
v
n
) = c
1
Av
1
+ . . . + c
n
Av
n
=
1
c
1
v
1
+ . . . +
n
c
n
v
n
. (10)
Writing
c =
c
1
.
.
.
c
n
,
we see that (9) and (10) can be rewritten as
u = Pc and Au = P
1
c
1
.
.
.
n
c
n
= PDc
respectively, so that
APc = PDc.
Note that c R
n
is arbitrary. This implies that (AP PD)c = 0 for every c R
n
. Hence we must
have AP = PD. Since the columns of P are linearly independent, it follows that P is invertible. Hence
P
1
AP = D as required.
A =
1 6 12
0 13 30
0 9 20
,
as in Example 7.1.3. We have P
1
AP = D, where
P =
1 0 1
0 2 5
0 1 3
and D =
1 0 0
0 2 0
0 0 5
.
A =
17 10 5
45 28 15
30 20 12
,
as in Example 7.1.4. We have P
1
AP = D, where
P =
1 1 2
3 0 3
2 3 0
and D =
3 0 0
0 2 0
0 0 2
.
Definition. Suppose that A is an n n matrix, with entries in R. We say that A is diagonalizable
if there exists an invertible matrix P, with entries in R, such that P
1
AP is a diagonal matrix, with
entries in R.
It follows from Proposition 7A that an n n matrix A with entries in R is diagonalizable if its
eigenvectors form a basis for R
n
. In the opposite direction, we establish the following result.
PROPOSITION 7B. Suppose that A is an nn matrix, with entries in R. Suppose further that A is
diagonalizable. Then A has n linearly independent eigenvectors in R
n
.
Proof. Suppose that A is diagonalizable. Then there exists an invertible matrix P, with entries in R,
such that D = P
1
AP is a diagonal matrix, with entries in R. Denote by v
1
, . . . , v
n
the columns of P;
in other words, write
P = ( v
1
. . . v
n
) .
Also write
D =
1
.
.
.
.
Clearly we have AP = PD. It follows that
( Av
1
. . . Av
n
) = A( v
1
. . . v
n
) = ( v
1
. . . v
n
)
1
.
.
.
= (
1
v
1
. . .
n
v
n
) .
Equating columns, we obtain
Av
1
=
1
v
1
, . . . , Av
n
=
n
v
n
.
It follows that A has eigenvalues
1
, . . . ,
n
R, with corresponding eigenvectors v
1
, . . . , v
n
R
n
. Since
P is invertible and v
1
, . . . , v
n
are the columns of P, it follows that the eigenvectors v
1
, . . . , v
n
are linearly
independent.
In view of Propositions 7A and 7B, the question of diagonalizing a matrix A with entries in R is
reduced to one of linear independence of its eigenvectors.
PROPOSITION 7C. Suppose that A is an nn matrix, with entries in R. Suppose further that A has
distinct eigenvalues
1
, . . . ,
n
R, with corresponding eigenvectors v
1
, . . . , v
n
R
n
. Then v
1
, . . . , v
n
Proof. Suppose that v
1
, . . . , v
n
are linearly dependent. Then there exist c
1
, . . . , c
n
R, not all zero,
such that
c
1
v
1
+ . . . + c
n
v
n
= 0. (11)
Then
A(c
1
v
1
+ . . . + c
n
v
n
) = c
1
Av
1
+ . . . + c
n
Av
n
=
1
c
1
v
1
+ . . . +
n
c
n
v
n
= 0. (12)
Since v
1
, . . . , v
n
are all eigenvectors and hence non-zero, it follows that at least two numbers among
c
1
, . . . , c
n
are non-zero, so that c
1
, . . . , c
n1
are not all zero. Multiplying (11) by
n
and subtracting
from (12), we obtain
(
1
n
)c
1
v
1
+ . . . + (
n1
n
)c
n1
v
n1
= 0.
Note that since
1
, . . . ,
n
are distinct, the numbers
1
n
, . . . ,
n1
n
are all non-zero. It follows
that v
1
, . . . , v
n1
are linearly dependent. To summarize, we can eliminate one eigenvector and the
remaining ones are still linearly dependent. Repeating this argument a nite number of times, we arrive
at a linearly dependent set of one eigenvector, clearly an absurdity.
We now summarize our discussion in this section.
DIAGONALIZATION PROCESS. Suppose that A is an n n matrix with entries in R.
(1) Determine whether the n roots of the characteristic polynomial det(AI) are real.
(2) If not, then A is not diagonalizable. If so, then nd the eigenvectors corresponding to these eigen-
values. Determine whether we can nd n linearly independent eigenvectors.
(3) If not, then A is not diagonalizable. If so, then write
P = ( v
1
. . . v
n
) and D =
1
.
.
.
,
where
1
, . . . ,
n
R are the eigenvalues of A and where v
1
, . . . , v
n
R
n
are respectively their
corresponding eigenvectors. Then P
1
AP = D.
7.3. Some Remarks
In all the examples we have discussed, we have chosen matrices A such that the characteristic polynomial
det(AI) has only real roots. However, there are matrices A where the characteristic polynomial has
non-real roots. If we permit
1
, . . . ,
n
to take values in C and permit eigenvectors to have entries in
C, then we may be able to diagonalize the matrix A, using matrices P and D with entries in C. The
details are similar.
A =
1 5
1 1
.
det
1 5
1 1
= 0;
in other words,
2
+4 = 0. Clearly there are no real roots, so the matrix A has no eigenvalues in R. Try
to show, however, that the matrix A can be diagonalized to the matrix
D =
2i 0
0 2i
.
We also state without proof the following useful result which will guarantee many examples where the
characteristic polynomial has only real roots.
PROPOSITION 7D. Suppose that A is an nn matrix, with entries in R. Suppose further that A is
symmetric. Then the characteristic polynomial det(AI) has only real roots.
We conclude this section by discussing an application of diagonalization. We illustrate this by an
example.
A =
17 10 5
45 28 15
30 20 12
,
as in Example 7.2.3. Suppose that we wish to calculate A
98
. Note that P
1
AP = D, where
P =
1 1 2
3 0 3
2 3 0
and D =
3 0 0
0 2 0
0 0 2
.
It follows that A = PDP
1
, so that
A
98
= (PDP
1
) . . . (PDP
1
)

98
= PD
98
P
1
= P
3
98
0 0
0 2
98
0
0 0 2
98
P
1
.
This is much simpler than calculating A
98
directly.
7.4. An Application to Genetics
In this section, we discuss very briey the problem of autosomal inheritance. Here we consider a set
of two genes designated by G and g. Each member of the population inherits one from each parent,
resulting in possible genotypes GG, Gg and gg. Furthermore, the gene G dominates the gene g, so
that in the case of human eye colours, for example, people with genotype GG or Gg have brown eyes
while people with genotype gg have blue eyes. It is also believed that each member of the population
has equal probability of inheriting one or the other gene from each parent. The table below gives these
peobabilities in detail. Here the genotypes of the parents are listed on top, and the genotypes of the
ospring are listed on the left.
GGGG GGGg GGgg Gg Gg Gg gg gg gg
GG 1
1
2
0
1
4
0 0
Gg 0
1
2
1
1
2
1
2
0
gg 0 0 0
1
4
1
2
1
Example 7.4.1. Suppose that a plant breeder has a large population consisting of all three genotypes.
At regular intervals, each plant he owns is fertilized with a plant known to have genotype GG, and
is then disposed of and replaced by one of its osprings. We would like to study the distribution of
the three genotypes after n rounds of fertilization and replacements, where n is an arbitrary positive
integer. Suppose that GG(n), Gg(n) and gg(n) denote the proportion of each genotype after n rounds
of fertilization and replacements, and that GG(0), Gg(0) and gg(0) denote the initial proportions. Then
clearly we have
GG(n) + Gg(n) + gg(n) = 1 for every n = 0, 1, 2, . . . .
On the other hand, the left hand half of the table above shows that for every n = 1, 2, 3, . . . , we have
GG(n) = GG(n 1) +
1
2
Gg(n 1),
Gg(n) =
1
2
Gg(n 1) + gg(n 1),
and
gg(n) = 0,
so that
GG(n)
Gg(n)
gg(n)
1 1/2 0
0 1/2 1
0 0 0
GG(n 1)
Gg(n 1)
gg(n 1)
.
It follows that
GG(n)
Gg(n)
gg(n)
= A
n
GG(0)
Gg(0)
gg(0)
for every n = 1, 2, 3, . . . ,
where the matrix
A =
1 1/2 0
0 1/2 1
0 0 0
has eigenvalues
1
= 1,
2
= 0,
3
= 1/2, with respective eigenvectors
v
1
=
1
0
0
, v
2
=
1
2
1
, v
3
=
1
1
0
.
We therefore write
P =
1 1 1
0 2 1
0 1 0
and D =
1 0 0
0 0 0
0 0 1/2
, with P
1
=
1 1 1
0 0 1
0 1 2
.
Then P
1
AP = D, so that A = PDP
1
, and so
A
n
= PD
n
P
1
=
1 1 1
0 2 1
0 1 0
1 0 0
0 0 0
0 0 1/2
n
1 1 1
0 0 1
0 1 2
1 1 1/2
n
1 1/2
n1
0 1/2
n
1/2
n1
0 0 0
.
It follows that
GG(n)
Gg(n)
gg(n)
1 1 1/2
n
1 1/2
n1
0 1/2
n
1/2
n1
0 0 0
GG(0)
Gg(0)
gg(0)
GG(0) + Gg(0) + gg(0) Gg(0)/2

n
gg(0)/2
n1
Gg(0)/2
n
+ gg(0)/2
n1
0
1 Gg(0)/2
n
gg(0)/2
n1
Gg(0)/2
n
+ gg(0)/2
n1
0
1
0
0
as n .
This means that nearly the whole crop will have genotype GG.
1. For each of the following 2 2 matrices, nd all eigenvalues and describe the eigenspace of the
matrix; if possible, diagonalize the matrix:
a)
3 4
2 3
b)
2 1
1 0
2. For each of the following 3 3 matrices, nd all eigenvalues and describe the eigenspace of the
matrix; if possible, diagonalize the matrix:
a)
2 9 6
1 2 0
3 9 5
b)
2 1 1
0 3 2
1 1 2
c)
1 1 0
0 1 1
0 0 1
3. Consider the matrices A =
10 6 3
26 16 8
16 10 5
and B =
0 6 16
0 17 45
0 6 16
.
a) Show that A and B have the same eigenvalues.
b) Reduce A and B to the same disgonal matrix.
c) Explain why there is an invertible matrix R such that R
1
AR = B.
4. Find A
8
and B
8
, where A and B are the two matrices in Problem 3.
5. Suppose that R is not an integer multiple of . Show that the matrix
cos sin
sin cos
does
not have an eigenvector in R
2
.
6. Consider the matrix A =
cos sin
sin cos
, where R.
a) Show that A has an eigenvector in R
2
with eigenvalue 1.
b) Show that any vector v R
2
perpendicular to the eigenvector in part (a) must satisfy Av = v.
7. Let a R be non-zero. Show that the matrix
1 a
0 1
cannot be diagonalized.
LINEAR ALGEBRA
W W L CHEN
c _ W W L Chen, 1997, 2008.
Chapter 8
LINEAR TRANSFORMATIONS
8.1. Euclidean Linear Transformations
By a transformation from R
n
into R
m
, we mean a function of the type T : R
n
R
m
, with domain R
n
and codomain R
m
. For every vector x R
n
, the vector T(x) R
m
is called the image of x under the
transformation T, and the set
R(T) = T(x) : x R
n
,
of all images under T, is called the range of the transformation T.
Remark. For our convenience later, we have chosen to use R(T) instead of the usual T(R
n
) to denote
the range of the transformation T.
For every x = (x
1
, . . . , x
n
) R
n
, we can write
T(x) = T(x
1
, . . . , x
n
) = (y
1
, . . . , y
m
).
Here, for every i = 1, . . . , m, we have
y
i
= T
i
(x
1
, . . . , x
n
), (1)
where T
i
: R
n
R is a real valued function.
Definition. A transformation T : R
n
R
m
is called a linear transformation if there exists a real
matrix
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
Chapter 8 : Linear Transformations page 1 of 35

Linear Algebra c _ W W L Chen, 1997, 2008
such that for every x = (x
1
, . . . , x
n
) R
n
, we have T(x
1
, . . . , x
n
) = (y
1
, . . . , y
m
), where
y
1
= a
11
x
1
+. . . + a
1n
x
n
,
.
.
.
y
m
= a
m1
x
1
+. . . +a
mn
x
n
,
or, in matrix notation,
y
1
.
.
.
y
m
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
x
1
.
.
.
x
n
. (2)
The matrix A is called the standard matrix for the linear transformation T.
Remarks. (1) In other words, a transformation T : R
n
R
m
is linear if the equation (1) for every
i = 1, . . . , m is linear.
(2) If we write x R
n
and y R
m
as column matrices, then (2) can be written in the form y = Ax,
and so the linear transformation T can be interpreted as multiplication of x R
n
by the standard
matrix A.
Definition. A linear transformation T : R
n
R
m
is said to be a linear operator if n = m. In this case,
we say that T is a linear operator on R
n
.
Example 8.1.1. The linear transformation T : R
5
R
3
, dened by the equations
y
1
= 2x
1
+ 3x
2
+ 5x
3
+ 7x
4
9x
5
,
y
2
= 3x
2
+ 4x
3
+ 2x
5
,
y
3
= x
1
+ 3x
3
2x
4
,
can be expressed in matrix form as
y
1
y
2
y
3
2 3 5 7 9
0 3 4 0 2
1 0 3 2 0
x
1
x
2
x
3
x
4
x
5
.
If (x
1
, x
2
, x
3
, x
4
, x
5
) = (1, 0, 1, 0, 1), then
y
1
y
2
y
3
2 3 5 7 9
0 3 4 0 2
1 0 3 2 0
1
0
1
0
1
2
6
4
,
so that T(1, 0, 1, 0, 1) = (2, 6, 4).
Example 8.1.2. Suppose that A is the zero m n matrix. The linear transformation T : R
n
R
m
,
where T(x) = Ax for every x R
n
, is the zero transformation from R
n
into R
m
. Clearly T(x) = 0 for
every x R
n
.
Example 8.1.3. Suppose that I is the identity n n matrix. The linear operator T : R
n
R
n
, where
T(x) = Ix for every x R
n
, is the identity operator on R
n
. Clearly T(x) = x for every x R
n
.
PROPOSITION 8A. Suppose that T : R
n
R
m
is a linear transformation, and that e
1
, . . . , e
n
is
the standard basis for R
n
. Then the standard matrix for T is given by
A = ( T(e
1
) . . . T(e
n
) ) ,
where T(e
j
) is a column matrix for every j = 1, . . . , n.
Proof. This follows immediately from (2). _
8.2. Linear Operators on R
2
In this section, we consider the special case when n = m = 2, and study linear operators on R
2
. For
every x R
2
, we shall write x = (x
1
, x
2
).
Example 8.2.1. Consider reection across the x
2
-axis, so that T(x
1
, x
2
) = (x
1
, x
2
). Clearly we have
T(e
1
) =
1
0
and T(e
2
) =
0
1
,
and so it follows from Proposition 8A that the standard matrix is given by
A =
1 0
0 1
.
It is not dicult to see that the standard matrices for reection across the x
1
-axis and across the line
x
1
= x
2
are given respectively by
A =
1 0
0 1
and A =
0 1
1 0
.
Also, the standard matrix for reection across the origin is given by
A =
1 0
0 1
.
Linear operator Equations Standard matrix
Reection across x
2
-axis
y
1
= x
1
y
2
= x
2
1 0
0 1
Reection across x
1
-axis
y
1
= x
1
y
2
= x
2
1 0
0 1
Reection across x
1
= x
2
y
1
= x
2
y
2
= x
1
0 1
1 0
Reection across origin
y
1
= x
1
y
2
= x
2
1 0
0 1
Example 8.2.2. For orthogonal projection onto the x

1
-axis, we have T(x
1
, x
2
) = (x
1
, 0), with standard
matrix
A =
1 0
0 0
.
Similarly, the standard matrix for orthogonal projection onto the x
2
-axis is given by
A =
0 0
0 1
.
Orthogonal projection onto x
1
-axis
y
1
= x
1
y
2
= 0
1 0
0 0
Orthogonal projection onto x

2
-axis
y
1
= 0
y
2
= x
2
0 0
0 1
Example 8.2.3. For anticlockwise rotation by an angle , we have T(x

1
, x
2
) = (y
1
, y
2
), where
y
1
+ iy
2
= (x
1
+ ix
2
)(cos + i sin ),
and so
y
1
y
2
cos sin
sin cos
x
1
x
2
.
It follows that the standard matrix is given by
A =
cos sin
sin cos
.
Anticlockwise rotation by angle
y
1
= x
1
cos x
2
sin
y
2
= x
1
sin +x
2
cos
cos sin
sin cos
Example 8.2.4. For contraction or dilation by a non-negative scalar k, we have T(x

1
, x
2
) = (kx
1
, kx
2
),
with standard matrix
A =
k 0
0 k
.
The operator is called a contraction if 0 < k < 1 and a dilation if k > 1, and can be extended to negative
values of k by noting that for k < 0, we have
k 0
0 k
1 0
0 1
k 0
0 k
.
This describes contraction or dilation by non-negative scalar k followed by reection across the origin.
Contraction or dilation by factor k
y
1
= kx
1
y
2
= kx
2
k 0
0 k

Example 8.2.5. For expansion or compression in the x
1
-direction by a positive factor k, we have
T(x
1
, x
2
) = (kx
1
, x
2
), with standard matrix
A =
k 0
0 1
.
This can be extended to negative values of k by noting that for k < 0, we have
k 0
0 1
1 0
0 1
k 0
0 1
.
This describes expansion or compression in the x
1
-direction by positive factor k followed by reection
across the x
2
-axis. Similarly, for expansion or compression in the x
2
-direction by a non-zero factor k,
we have the standard matrix
A =
1 0
0 k
.
1
-direction
y
1
= kx
1
y
2
= x
2
k 0
0 1
2
-direction
y
1
= x
1
y
2
= kx
2
1 0
0 k
Example 8.2.6. For shears in the x

1
-direction with factor k, we have T(x
1
, x
2
) = (x
1
+ kx
2
, x
2
), with
standard matrix
A =
1 k
0 1
.
For the case k = 1, we have the following.
1
T(x
1
, x
2
) = (kx
1
, x
2
A =
k 0
0 1
.
k 0
0 1
1 0
0 1
k 0
0 1
.
1
across the x
2
2
A =
1 0
0 k
.
1
-direction
y
1
= kx
1
y
2
= x
2
k 0
0 1
2
-direction
y
1
= x
1
y
2
= kx
2
1 0
0 k

1
1
, x
2
) = (x
1
+ kx
2
, x
2
), with
standard matrix
A =
1 k
0 1
.

T
(k=1)

T
(k=1)
1
T(x
1
, x
2
) = (kx
1
, x
2
A =
k 0
0 1
.
k 0
0 1
1 0
0 1
k 0
0 1
.
1
across the x
2
2
A =
1 0
0 k
.
1
-direction
y
1
= kx
1
y
2
= x
2
k 0
0 1
2
-direction
y
1
= x
1
y
2
= kx
2
1 0
0 k

1
1
, x
2
) = (x
1
+ kx
2
, x
2
), with
standard matrix
A =
1 k
0 1
.

T
(k=1)

T
(k=1)
Similarly, for shears in the x
2
-direction with factor k, we have standard matrix
A =
1 0
k 1
.
Shear in x
1
-direction
y
1
= x
1
+kx
2
y
2
= x
2
1 k
0 1
Shear in x
2
-direction
y
1
= x
1
y
2
= kx
1
+x
2
1 0
k 1
Example 8.2.7. Consider a linear operator T : R

2
R
2
which consists of a reection across the x
2
-axis,
followed by a shear in the x
1
-direction with factor 3 and then reection across the x
1
-axis. To nd the
standard matrix, consider the eect of T on a standard basis e
1
, e
2
of R
2
. Note that
e
1
=
1
0
1
0
1
0
1
0
= T(e
1
),
e
2
=
0
1
0
1
3
1
3
1
= T(e
2
),
so it follows from Proposition 8A that the standard matrix for T is
A =
1 3
0 1
.
Let us summarize the above and consider a few special cases. We have the following table of invertible
linear operators with k ,= 0. Clearly, if A is the standard matrix for an invertible linear operator T, then
the inverse matrix A
1
is the standard matrix for the inverse linear operator T
1
.
Linear operator T Standard matrix A Inverse matrix A
1
Linear operator T
1
Reection across
line x
1
=x
2
0 1
1 0

0 1
1 0
Reection across
line x
1
=x
2
Expansion or compression
in x
1
direction
k 0
0 1

k
1
0
0 1
in x
1
direction
in x
2
direction
1 0
0 k

1 0
0 k
1
in x
2
direction
Shear
in x
1
direction
1 k
0 1

1 k
0 1
Shear
in x
1
direction
Shear
in x
2
direction
1 0
k 1

1 0
k 1
Shear
in x
2
direction
Next, let us consider the question of elementary row operations on 2 2 matrices. It is not dicult
to see that an elementary row operation performed on a 2 2 matrix A has the eect of multiplying the
matrix A by some elementary matrix E to give the product EA. We have the following table.
Elementary row operation Elementary matrix E
Interchanging the two rows
0 1
1 0
Multiplying row 1 by non-zero factor k
k 0
0 1
Multiplying row 2 by non-zero factor k
1 0
0 k
Adding k times row 2 to row 1
1 k
0 1
Adding k times row 1 to row 2
1 0
k 1
Now, we know that any invertible matrix A can be reduced to the identity matrix by a nite number of
elementary row operations. In other words, there exist a nite number of elementary matrices E
1
, . . . , E
s
of the types above with various non-zero values of k such that
E
s
. . . E
1
A = I,
so that
A = E
1
1
. . . E
1
s
.
PROPOSITION 8B. Suppose that the linear operator T : R
2
R
2
has standard matrix A, where A is
invertible. Then T is the product of a succession of nitely many reections, expansions, compressions
and shears.
In fact, we can prove the following result concerning images of straight lines.
PROPOSITION 8C. Suppose that the linear operator T : R
2
R
2
has standard matrix A, where A
is invertible. Then
(a) the image under T of a straight line is a straight line;
(b) the image under T of a straight line through the origin is a straight line through the origin; and
(c) the images under T of parallel straight lines are parallel straight lines.
Proof. Suppose that T(x
1
, x
2
) = (y
1
, y
2
). Since A is invertible, we have x = A
1
y, where
x =
x
1
x
2
and y =
y
1
y
2
.
The equation of a straight line is given by x
1
+x
2
= or, in matrix form, by
( )
x
1
x
2
= ( ) .
Hence
( ) A
1
y
1
y
2
= ( ) .
Let
(
) = ( ) A
1
.
Then
(
y
1
y
2
= ( ) .
1
+x
2
= is
y
1
+
y
2
= , clearly another
values of and . _
8.3. Elementary Properties of Euclidean Linear Transformations
In this section, we establish a number of simple properties of euclidean linear transformations.
PROPOSITION 8D. Suppose that T
1
: R
n
R
m
and T
2
: R
m
R
k
are linear transformations.
Then T = T
2
T
1
: R
n
R
k
is also a linear transformation.
Proof. Since T
1
and T
2
are linear transformations, they have standard matrices A
1
and A
2
respectively.
In other words, we have T
1
(x) = A
1
x for every x R
n
and T
2
(y) = A
2
y for every y R
m
. It follows
that T(x) = T
2
(T
1
(x)) = A
2
A
1
x for every x R
n
, so that T has standard matrix A
2
A
1
. _
Example 8.3.1. Suppose that T
1
: R
2
R
2
is anticlockwise rotation by /2 and T
2
: R
2
R
2
is
orthogonal projection onto the x
1
-axis. Then the respective standard matrices are
A
1
=
0 1
1 0
and A
2
=
1 0
0 0
.
It follows that the standard matrices for T
2
T
1
and T
1
T
2
are respectively
A
2
A
1
=
0 1
0 0
and A
1
A
2
=
0 0
1 0
.
Hence T
2
T
1
and T
1
T
2
are not equal.
Example 8.3.2. Suppose that T
1
: R
2
R
2
is anticlockwise rotation by and T
2
: R
2
R
2
is
anticlockwise rotation by . Then the respective standard matrices are
A
1
=
cos sin
sin cos
and A
2
=
cos sin
sin cos
.
It follows that the standard matrix for T
2
T
1
is
A
2
A
1
=
cos cos sin sin cos sin sin cos

sin cos + cos sin cos cos sin sin
cos( +) sin( +)
sin( +) cos( +)
.
Hence T
2
T
1
is anticlockwise rotation by +.
Example 8.3.3. The reader should check that in R
2
, reection across the x
1
-axis followed by reection
across the x
2
-axis gives reection across the origin.
Linear transformations that map distinct vectors to distinct vectors are of special importance.
Definition. A linear transformation T : R
n
R
m
is said to be one-to-one if for every x
, x
R
n
, we
have x
= x
whenever T(x
) = T(x
).
Example 8.3.4. If we consider linear operators T : R
2
R
2
, then T is one-to-one precisely when the
standard matrix A is invertible. To see this, suppose rst of all that A is invertible. If T(x
) = T(x
),
then Ax
= Ax
. Multiplying on the left by A

1
, we obtain x
= x
. Suppose next that A is not

invertible. Then there exists x R
2
such that x ,= 0 and Ax = 0. On the other hand, we clearly have
A0 = 0. It follows that T(x) = T(0), so that T is not one-to-one.
PROPOSITION 8E. Suppose that the linear operator T : R
n
R
n
has standard matrix A. Then the
following statements are equivalent:
(b) The linear operator T is one-to-one.
(c) The range of T is R
n
; in other words, R(T) = R
n
.
Proof. ((a)(b)) Suppose that T(x
) = T(x
). Then Ax
= Ax
. Multiplying on the left by A

1
gives
x
= x
.
((b)(a)) Suppose that T is one-to-one. Then the system Ax = 0 has unique solution x = 0 in R
n
.
It follows that A can be reduced by elementary row operations to the identity matrix I, and is therefore
invertible.
((a)(c)) For any y R
n
, clearly x = A
1
y satises Ax = y, so that T(x) = y.
((c)(a)) Suppose that e
1
, . . . , e
n
is the standard basis for R
n
. Let x
1
, . . . , x
n
R
n
be chosen to
satisfy T(x
j
) = e
j
, so that Ax
j
= e
j
, for every j = 1, . . . , n. Write
C = ( x
1
. . . x
n
) .
Then AC = I, so that A is invertible. _
Definition. Suppose that the linear operator T : R
n
R
n
has standard matrix A, where A is invertible.
Then the linear operator T
1
: R
n
R
n
, dened by T
1
(x) = A
1
x for every x R
n
, is called the
inverse of the linear operator T.
Remark. Clearly T
1
(T(x)) = x and T(T
1
(x)) = x for every x R
n
.
Example 8.3.5. Consider the linear operator T : R
2
R
2
, dened by T(x) = Ax for every x R
2
,
where
A =
1 1
1 2
.
Clearly A is invertible, and
A
1
=
2 1
1 1
.
Hence the inverse linear operator is T
1
: R
2
R
2
, dened by T
1
(x) = A
1
x for every x R
2
.
Example 8.3.6. Suppose that T : R
2
R
2
is anticlockwise rotation by angle . The reader should
check that T
1
: R
2
R
2
is anticlockwise rotation by angle 2 .
Next, we study the linearity properties of euclidean linear transformations which we shall use later to
discuss linear transformations in arbitrary real vector spaces.
PROPOSITION 8F. A transformation T : R
n
R
m
is linear if and only if the following two
n
, we have T(u +v) = T(u) +T(v).
(b) For every u R
n
and c R, we have T(cu) = cT(u).
Proof. Suppose rst of all that T : R
n
R
m
is a linear transformation. Let A be the standard matrix
for T. Then for every u, v R
n
and c R, we have
T(u +v) = A(u +v) = Au +Av = T(u) +T(v)
and
T(cu) = A(cu) = c(Au) = cT(u).
Suppose now that (a) and (b) hold. To show that T is linear, we need to nd a matrix A such that
T(x) = Ax for every x R
n
. Suppose that e
1
, . . . , e
n
n
. As suggested by
Proposition 8A, we write
A = ( T(e
1
) . . . T(e
n
) ) ,
where T(e
j
) is a column matrix for every j = 1, . . . , n. For any vector
x =
x
1
.
.
.
x
n
in R
n
, we have
Ax = ( T(e
1
) . . . T(e
n
) )
x
1
.
.
.
x
n
= x
1
T(e
1
) +. . . +x
n
T(e
n
).
Using (b) on each summand and then using (a) inductively, we obtain
Ax = T(x
1
e
1
) +. . . +T(x
n
e
n
) = T(x
1
e
1
+. . . +x
n
e
n
) = T(x)
as required. _
To conclude our study of euclidean linear transformations, we briey mention the problem of eigen-
values and eigenvectors of euclidean linear operators.
Definition. Suppose that T : R
n
R
n
is a linear operator. Then any real number R is called
an eigenvalue of T if there exists a non-zero vector x R
n
such that T(x) = x. This non-zero vector
x R
n
is called an eigenvector of T corresponding to the eigenvalue .
Remark. Note that the equation T(x) = x is equivalent to the equation Ax = x. It follows that
there is no distinction between eigenvalues and eigenvectors of T and those of the standard matrix A.
We therefore do not need to discuss this problem any further.
8.4. General Linear Transformations
Suppose that V and W are real vector spaces. To dene a linear transformation from V into W, we are
motivated by Proposition 8F which describes the linearity properties of euclidean linear transformations.
By a transformation from V into W, we mean a function of the type T : V W, with domain V
and codomain W. For every vector u V , the vector T(u) W is called the image of u under the
transformation T.
Definition. A transformation T : V W from a real vector space V into a real vector space W is
called a linear transformation if the following two conditions are satised:
(LT1) For every u, v V , we have T(u +v) = T(u) +T(v).
(LT2) For every u V and c R, we have T(cu) = cT(u).
Definition. A linear transformation T : V V from a real vector space V into itself is called a linear
operator on V .
Example 8.4.1. Suppose that V and W are two real vector spaces. The transformation T : V W,
where T(u) = 0 for every u V , is clearly linear, and is called the zero transformation from V to W.
Example 8.4.2. Suppose that V is a real vector space. The transformation I : V V , where I(u) = u
for every u V , is clearly linear, and is called the identity operator on V .
Example 8.4.3. Suppose that V is a real vector space, and that k R is xed. The transformation
T : V V , where T(u) = ku for every u V , is clearly linear. This operator is called a dilation if
k > 1 and a contraction if 0 < k < 1.
Example 8.4.4. Suppose that V is a nite dimensional vector space, with basis w
1
, . . . , w
n
. Dene a
transformation T : V R
n
as follows. For every u V , there exists a unique vector (
1
, . . . ,
n
) R
n
such that u =
1
w
1
+ . . . +
n
w
n
. We let T(u) = (
1
, . . . ,
n
). In other words, the transformation T
gives the coordinates of any vector u V with respect to the given basis w
1
, . . . , w
n
. Suppose now
that v =
1
w
1
+. . . +
n
w
n
is another vector in V . Then u +v = (
1
+
1
)w
1
+. . . +(
n
+
n
)w
n
, so
that
T(u +v) = (
1
+
1
, . . . ,
n
+
n
) = (
1
, . . . ,
n
) + (
1
, . . . ,
n
) = T(u) +T(v).
Also, if c R, then cu = c
1
w
1
+. . . +c
n
w
n
, so that
T(cu) = (c
1
, . . . , c
n
) = c(
1
, . . . ,
n
) = cT(u).
Hence T is a linear transformation. We shall return to this in greater detail in the next section.
Example 8.4.5. Suppose that P
n
denotes the vector space of all polynomials with real coecients and
degree at most n. Dene a transformation T : P
n
P
n
as follows. For every polynomial
p = p
0
+p
1
x +. . . +p
n
x
n
in P
n
, we let
T(p) = p
n
+p
n1
x +. . . +p
0
x
n
.
Suppose now that q = q
0
+q
1
x +. . . +q
n
x
n
is another polynomial in P
n
. Then
p +q = (p
0
+q
0
) + (p
1
+q
1
)x +. . . + (p
n
+q
n
)x
n
,
so that
T(p +q) = (p
n
+q
n
) + (p
n1
+q
n1
)x +. . . + (p
0
+q
0
)x
n
= (p
n
+p
n1
x +. . . +p
0
x
n
) + (q
n
+q
n1
x +. . . +q
0
x
n
) = T(p) +T(q).
Also, for any c R, we have cp = cp
0
+cp
1
x +. . . +cp
n
x
n
, so that
T(cp) = cp
n
+cp
n1
x +. . . +cp
0
x
n
= c(p
n
+p
n1
x +. . . +p
0
x
n
) = cT(p).
Hence T is a linear transformation.
Example 8.4.6. Let V denote the vector space of all real valued functions dierentiable everywhere in R,
and let W denote the vector space of all real valued functions dened on R. Consider the transformation
T : V W, where T(f) = f
for every f V . It is easy to check from properties of derivatives that T

is a linear transformation.
Example 8.4.7. Let V denote the vector space of all real valued functions that are Riemann integrable
over the interval [0, 1]. Consider the transformation T : V R, where
T(f) =
1
0
f(x) dx
for every f V . It is easy to check from properties of the Riemann integral that T is a linear transfor-
mation.
Consider a linear transformation T : V W from a nite dimensional real vector space V into a real
vector space W. Suppose that v
1
, . . . , v
n
is a basis of V . Then every u V can be written uniquely
in the form u =
1
v
1
+. . . +
n
v
n
, where
1
, . . . ,
n
R. It follows that
T(u) = T(
1
v
1
+. . . +
n
v
n
) = T(
1
v
1
) +. . . +T(
n
v
n
) =
1
T(v
1
) +. . . +
n
T(v
n
).
We have therefore proved the following generalization of Proposition 8A.
PROPOSITION 8G. Suppose that T : V W is a linear transformation from a nite dimensional
real vector space V into a real vector space W. Suppose further that v
1
, . . . , v
n
is a basis of V . Then
T is completely determined by T(v
1
), . . . , T(v
n
).
Example 8.4.8. Consider a linear transformation T : P
2
R, where T(1) = 1, T(x) = 2 and T(x
2
) = 3.
Since 1, x, x
2
is a basis of P
2
, this linear transformation is completely determined. In particular, we
have, for example,
T(5 3x + 2x
2
) = 5T(1) 3T(x) + 2T(x
2
) = 5.
Example 8.4.9. Consider a linear transformation T : R
4
R, where T(1, 0, 0, 0) = 1, T(1, 1, 0, 0) = 2,
T(1, 1, 1, 0) = 3 and T(1, 1, 1, 1) = 4. Since (1, 0, 0, 0), (1, 1, 0, 0), (1, 1, 1, 0), (1, 1, 1, 1) is a basis of R
4
,
this linear transformation is completely determined. In particular, we have, for example,
T(6, 4, 3, 1) = T(2(1, 0, 0, 0) + (1, 1, 0, 0) + 2(1, 1, 1, 0) + (1, 1, 1, 1))
= 2T(1, 0, 0, 0) +T(1, 1, 0, 0) + 2T(1, 1, 1, 0) +T(1, 1, 1, 1) = 14.
We also have the following generalization of Proposition 8D.
PROPOSITION 8H. Suppose that V, W, U are real vector spaces. Suppose further that T
1
: V W
and T
2
: W U are linear transformations. Then T = T
2
T
1
: V U is also a linear transformation.
Proof. Suppose that u, v V . Then
T(u +v) = T
2
(T
1
(u +v)) = T
2
(T
1
(u) +T
1
(v)) = T
2
(T
1
(u)) +T
2
(T
1
(v)) = T(u) +T(v).
Also, if c R, then
T(cu) = T
2
(T
1
(cu)) = T
2
(cT
1
(u)) = cT
2
(T
1
(u)) = cT(u).
Hence T is a linear transformation. _
8.5. Change of Basis
Suppose that V is a real vector space, with basis B = u
1
, . . . , u
n
. Then every vector u V can be
written uniquely as a linear combination
u =
1
u
1
+. . . +
n
u
n
, where
1
, . . . ,
n
R. (3)
It follows that the vector u can be identied with the vector (
1
, . . . ,
n
) R
n
.
Definition. Suppose that u V and (3) holds. Then the matrix
[u]
B
=
1
.
.
.
is called the coordinate matrix of u relative to the basis B = u

1
, . . . , u
n
.
Example 8.5.1. The vectors
u
1
= (1, 2, 1, 0), u
2
= (3, 3, 3, 0), u
3
= (2, 10, 0, 0), u
4
= (2, 1, 6, 2)
are linearly independent in R
4
, and so B = u
1
, u
2
, u
3
, u
4
is a basis of R
4
. It follows that for any
u = (x, y, z, w) R
4
, we can write
u =
1
u
1
+
2
u
2
+
3
u
3
+
4
u
4
.
In matrix notation, this becomes
x
y
z
w
1 3 2 2
2 3 10 1
1 3 0 6
0 0 0 2
,
so that
[u]
B
=
1 3 2 2
2 3 10 1
1 3 0 6
0 0 0 2
x
y
z
w
.
Remark. Consider a function : V R
n
, where (u) = [u]
B
for every u V . It is not dicult to see
that this function gives rise to a one-to-one correspondence between the elements of V and the elements
of R
n
. Furthermore, note that
[u +v]
B
= [u]
B
+ [v]
B
and [cu]
B
= c[u]
B
,
so that (u + v) = (u) + (v) and (cu) = c(u) for every u, v V and c R. Thus is a linear
transformation, and preserves much of the structure of V . We also say that V is isomorphic to R
n
. In
practice, once we have made this identication between vectors and their coordinate matrices, then we
can basically forget about the basis B and imagine that we are working in R
n
with the standard basis.
Clearly, if we change from one basis B = u
1
, . . . , u
n
to another basis ( = v
1
, . . . , v
n
of V , then we
also need to nd a way of calculating [u]
C
in terms of [u]
B
for every vector u V . To do this, note that
each of the vectors v
1
, . . . , v
n
can be written uniquely as a linear combination of the vectors u
1
, . . . , u
n
.
Suppose that for i = 1, . . . , n, we have
v
i
= a
1i
u
1
+. . . +a
ni
u
n
, where a
1i
, . . . , a
ni
R,
so that
[v
i
]
B
=
a
1i
.
.
.
a
ni
.
For every u V , we can write
u =
1
u
1
+. . . +
n
u
n
=
1
v
1
+. . . +
n
v
n
, where
1
, . . . ,
n
,
1
, . . . ,
n
R,
so that
[u]
B
=
1
.
.
.
and [u]
C
=
1
.
.
.
.
Clearly
u =
1
v
1
+. . . +
n
v
n
=
1
(a
11
u
1
+. . . +a
n1
u
n
) +. . . +
n
(a
1n
u
1
+. . . +a
nn
u
n
)
= (
1
a
11
+. . . +
n
a
1n
)u
1
+. . . + (
1
a
n1
+. . . +
n
a
nn
)u
n
=
1
u
1
+. . . +
n
u
n
.
Hence
1
=
1
a
11
+. . . +
n
a
1n
,
.
.
.
n
=
1
a
n1
+. . . +
n
a
nn
.
1
.
.
.
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
1
.
.
.
.
PROPOSITION 8J. Suppose that B = u
1
, . . . , u
n
and ( = v
1
, . . . , v
n
are two bases of a real
vector space V . Then for every u V , we have
[u]
B
= P[u]
C
,
where the columns of the matrix
P = ( [v
1
]
B
. . . [v
n
]
B
)
are precisely the coordinate matrices of the elements of ( relative to the basis B.
Remark. Strictly speaking, Proposition 8J gives [u]
B
in terms of [u]
C
. However, note that the matrix
P is invertible (why?), so that [u]
C
= P
1
[u]
B
.
Definition. The matrix P in Proposition 8J is sometimes called the transition matrix from the basis (
to the basis B.
Example 8.5.2. We know that with
u
1
= (1, 2, 1, 0), u
2
= (3, 3, 3, 0), u
3
= (2, 10, 0, 0), u
4
= (2, 1, 6, 2),
and with
v
1
= (1, 2, 1, 0), v
2
= (1, 1, 1, 0), v
3
= (1, 0, 1, 0), v
4
= (0, 0, 0, 2),
both B = u
1
, u
2
, u
3
, u
4
and ( = v
1
, v
2
, v
3
, v
4
are bases of R
4
. It is easy to check that
v
1
= u
1
,
v
2
= 2u
1
+u
2
,
v
3
= 11u
1
4u
2
+u
3
,
v
4
= 27u
1
+ 11u
2
2u
3
+u
4
,
so that
P = ( [v
1
]
B
[v
2
]
B
[v
3
]
B
[v
4
]
B
) =
1 2 11 27
0 1 4 11
0 0 1 2
0 0 0 1
.
Hence [u]
B
= P[u]
C
for every u R
4
. It is also easy to check that
u
1
= v
1
,
u
2
= 2v
1
+v
2
,
u
3
= 3v
1
+ 4v
2
+v
3
,
u
4
= v
1
3v
2
+ 2v
3
+v
4
,
so that
Q = ( [u
1
]
C
[u
2
]
C
[u
3
]
C
[u
4
]
C
) =
1 2 3 1
0 1 4 3
0 0 1 2
0 0 0 1
.
Hence [u]
C
= Q[u]
B
for every u R
4
. Note that PQ = I. Now let u = (6, 1, 2, 2). We can check that
u = v
1
+ 3v
2
+ 2v
3
+v
4
, so that
[u]
C
=
1
3
2
1
.
Then
[u]
B
=
1 2 11 27
0 1 4 11
0 0 1 2
0 0 0 1
1
3
2
1
10
6
0
1
.
Check that u = 10u
1
+ 6u
2
+u
4
.
Example 8.5.3. Consider the vector space P
2
. It is not too dicult to check that
u
1
= 1 +x, u
2
= 1 +x
2
, u
3
= x +x
2
form a basis of P
2
. Let u = 1 + 4x x
2
. Then u =
1
u
1
+
2
u
2
+
3
u
3
, where
1 + 4x x
2
=
1
(1 +x) +
2
(1 +x
2
) +
3
(x +x
2
) = (
1
+
2
) + (
1
+
3
)x + (
2
+
3
)x
2
,
so that
1
+
2
= 1,
1
+
3
= 4 and
2
+
3
= 1. Hence (
1
,
2
,
3
) = (3, 2, 1). If we write
B = u
1
, u
2
, u
3
, then
[u]
B
=
3
2
1
.
On the other hand, it is also not too dicult to check that
v
1
= 1, v
2
= 1 +x, v
3
= 1 +x +x
2
form a basis of P
2
. Also u =
1
v
1
+
2
v
2
+
3
v
3
, where
1 + 4x x
2
=
1
+
2
(1 +x) +
3
(1 +x +x
2
) = (
1
+
2
+
3
) + (
2
+
3
)x +
3
x
2
,
so that
1
+
2
+
3
= 1,
2
+
3
= 4 and
3
= 1. Hence (
1
,
2
,
3
) = (3, 5, 1). If we write
( = v
1
, v
2
, v
3
, then
[u]
C
=
3
5
1
.
Next, note that
v
1
=
1
2
u
1
+
1
2
u
2
1
2
u
3
,
v
2
= u
1
,
v
3
=
1
2
u
1
+
1
2
u
2
+
1
2
u
3
.
Hence
P = ( [v
1
]
B
[v
2
]
B
[v
3
]
B
) =
1/2 1 1/2
1/2 0 1/2
1/2 0 1/2
.
To verify that [u]
B
= P[u]
C
, note that
3
2
1
1/2 1 1/2
1/2 0 1/2
1/2 0 1/2
3
5
1
.
8.6. Kernel and Range
Consider rst of all a euclidean linear transformation T : R
n
R
m
. Suppose that A is the standard
matrix for T. Then the range of the transformation T is given by
R(T) = T(x) : x R
n
= Ax : x R
n
.
It follows that R(T) is the set of all linear combinations of the columns of the matrix A, and is therefore
the column space of A. On the other hand, the set
x R
n
: Ax = 0
is the nullspace of A.
Recall that the sum of the dimension of the nullspace of A and dimension of the column space of A is
equal to the number of columns of A. This is known as the Rank-nullity theorem. The purpose of this
section is to extend this result to the setting of linear transformations. To do this, we need the following
generalization of the idea of the nullspace and the column space.
Definition. Suppose that T : V W is a linear transformation from a real vector space V into a real
vector space W. Then the set
ker(T) = u V : T(u) = 0
is called the kernel of T, and the set
R(T) = T(u) : u V
is called the range of T.
Example 8.6.1. For a euclidean linear transformation T with standard matrix A, we have shown that
ker(T) is the nullspace of A, while R(T) is the column space of A.
Example 8.6.2. Suppose that T : V W is the zero transformation. Clearly we have ker(T) = V and
R(T) = 0.
Example 8.6.3. Suppose that T : V V is the identity operator on V . Clearly we have ker(T) = 0
and R(T) = V .
2
R
2
is orthogonal projection onto the x
1
-axis. Then ker(T) is
the x
2
-axis, while R(T) is the x
1
-axis.
n
R
n
is one-to-one. Then ker(T) = 0 and R(T) = R
n
, in view
of Proposition 8E.
Example 8.6.6. Consider the linear transformation T : V W, where V denotes the vector space of
all real valued functions dierentiable everywhere in R, where W denotes the space of all real valued
functions dened in R, and where T(f) = f
for every f V . Then ker(T) is the set of all dierentiable

functions with derivative 0, and so is the set of all constant functions in R.
Example 8.6.7. Consider the linear transformation T : V R, where V denotes the vector space of
all real valued functions Riemann integrable over the interval [0, 1], and where
T(f) =
1
0
f(x) dx
for every f V . Then ker(T) is the set of all Riemann integrable functions in [0, 1] with zero mean,
while R(T) = R.
PROPOSITION 8K. Suppose that T : V W is a linear transformation from a real vector space V
into a real vector space W. Then ker(T) is a subspace of V , while R(T) is a subspace of W.
Proof. Since T(0) = 0, it follows that 0 ker(T) V and 0 R(T) W. For any u, v ker(T), we
have
T(u +v) = T(u) +T(v) = 0 +0 = 0,
so that u +v ker(T). Suppose further that c R. Then
T(cu) = cT(u) = c0 = 0,
so that cu ker(T). Hence ker(T) is a subspace of V . Suppose next that w, z R(T). Then there exist
u, v V such that T(u) = w and T(v) = z. Hence
T(u +v) = T(u) +T(v) = w+z,
so that w+z R(T). Suppose further that c R. Then
T(cu) = cT(u) = cw,
so that cw R(T). Hence R(T) is a subspace of W. _
To complete this section, we prove the following generalization of the Rank-nullity theorem.
PROPOSITION 8L. Suppose that T : V W is a linear transformation from an n-dimensional real
vector space V into a real vector space W. Then
dimker(T) + dimR(T) = n.
Proof. Suppose rst of all that dimker(T) = n. Then ker(T) = V , and so R(T) = 0, and the result
follows immediately. Suppose next that dimker(T) = 0, so that ker(T) = 0. If v
1
, . . . , v
n
is a
basis of V , then it follows that T(v
1
), . . . , T(v
n
) are linearly independent in W, for otherwise there exist
c
1
, . . . , c
n
c
1
T(v
1
) +. . . +c
n
T(v
n
) = 0,
so that T(c
1
v
1
+ . . . + c
n
v
n
) = 0, a contradiction since c
1
v
1
+ . . . + c
n
v
n
,= 0. On the other hand,
elements of R(T) are linear combinations of T(v
1
), . . . , T(v
n
). Hence dimR(T) = n, and the result again
follows immediately. We may therefore assume that dimker(T) = r, where 1 r < n. Let v
1
, . . . , v
r
be a basis of ker(T). This basis can be extended to a basis v

1
, . . . , v
r
, v
r+1
, . . . , v
n
of V . It suces to
show that
T(v
r+1
), . . . , T(v
n
) (4)
is a basis of R(T). Suppose that u V . Then there exist
1
, . . . ,
n
R such that
u =
1
v
1
+. . . +
r
v
r
+
r+1
v
r+1
+. . . +
n
v
n
,
so that
T(u) =
1
T(v
1
) +. . . +
r
T(v
r
) +
r+1
T(v
r+1
) +. . . +
n
T(v
n
)
=
r+1
T(v
r+1
) +. . . +
n
T(v
n
).
It follows that (4) spans R(T). It remains to prove that its elements are linearly independent. Suppose
that c
r+1
, . . . , c
n
R and
c
r+1
T(v
r+1
) +. . . +c
n
T(v
n
) = 0. (5)
We need to show that
c
r+1
= . . . = c
n
= 0. (6)
By linearity, it follows from (5) that T(c
r+1
v
r+1
+. . . +c
n
v
n
) = 0, so that
c
r+1
v
r+1
+. . . +c
n
v
n
ker(T).
Hence there exist c
1
, . . . , c
r
R such that
c
r+1
v
r+1
+. . . +c
n
v
n
= c
1
v
1
+. . . +c
r
v
r
,
so that
c
1
v
1
+. . . +c
r
v
r
c
r+1
v
r+1
. . . c
n
v
n
= 0.
Since v
1
, . . . , v
n
is a basis of V , it follows that c
1
= . . . = c
r
= c
r+1
= . . . = c
n
= 0, so that (6) holds.
This completes the proof. _
Remark. We sometimes say that dimR(T) and dimker(T) are respectively the rank and the nullity of
the linear transformation T.
8.7. Inverse Linear Transformations
In this section, we generalize some of the ideas rst discussed in Section 8.3.
Definition. A linear transformation T : V W from a real vector space V into a real vector space W
is said to be one-to-one if for every u
, u
V , we have u
= u
whenever T(u
) = T(u
).
The result below follows immediately from our denition.
PROPOSITION 8M. Suppose that T : V W is a linear transformation from a real vector space V
into a real vector space W. Then T is one-to-one if and only if ker(T) = 0.
Proof. () Clearly 0 ker(T). Suppose that ker(T) ,= 0. Then there exists a non-zero v ker(T).
It follows that T(v) = T(0), and so T is not one-to-one.
() Suppose that ker(T) = 0. Given any u
, u
V , we have
T(u
) T(u
) = T(u
) = 0
if and only if u
= 0; in other words, if and only if u
= u
. _
We have the following generalization of Proposition 8E.
PROPOSITION 8N. Suppose that T : V V is a linear operator on a nite-dimensional real vector
space V . Then the following statements are equivalent:
(a) The linear operator T is one-to-one.
(b) We have ker(T) = 0.
(c) The range of T is V ; in other words, R(T) = V .
Proof. The equivalence of (a) and (b) is established by Proposition 8M. The equivalence of (b) and (c)
follows from Proposition 8L. _
Suppose that T : V W is a one-to-one linear transformation from a real vector space V into a real
vector space W. Then for every w R(T), there exists exactly one u V such that T(u) = w. We can
therefore dene a transformation T
1
: R(T) V by writing T
1
(w) = u, where u V is the unique
vector satisfying T(u) = w.
PROPOSITION 8P. Suppose that T : V W is a one-to-one linear transformation from a real vector
space V into a real vector space W. Then T
1
: R(T) V is a linear transformation.
Proof. Suppose that w, z R(T). Then there exist u, v V such that T
1
(w) = u and T
1
(z) = v.
It follows that T(u) = w and T(v) = z, so that T(u +v) = T(u) +T(v) = w+z, whence
T
1
(w+z) = u +v = T
1
(w) +T
1
(z).
Suppose further that c R. Then T(cu) = cw, so that
T
1
(cw) = cu = cT
1
(w).
This completes the proof. _
We also have the following result concerning compositions of linear transformations and which requires
no further proof, in view of our knowledge concerning inverse functions.
PROPOSITION 8Q. Suppose that V, W, U are real vector spaces. Suppose further that T
1
: V W
and T
2
: W U are one-to-one linear transformations. Then
(a) the linear transformation T
2
T
1
: V U is one-to-one; and
(b) (T
2
T
1
)
1
= T
1
1
T
1
2
.
8.8. Matrices of General Linear Transformations
Suppose that T : V W is a linear transformation from a real vector space V to a real vector space W.
Suppose further that the vector spaces V and W are nite dimensional, with dimV = n and dimW = m.
We shall show that if we make use of a basis B of V and a basis ( of W, then it is possible to describe
T indirectly in terms of some matrix A. The main idea is to make use of coordinate matrices relative to
the bases B and (.
Let us recall some discussion in Section 8.5. Suppose that B = v
1
, . . . , v
n
is a basis of V . Then
every vector v V can be written uniquely as a linear combination
v =
1
v
1
+. . . +
n
v
n
, where
1
, . . . ,
n
R. (7)
The matrix
[v]
B
=
1
.
.
.
(8)
is the coordinate matrix of v relative to the basis B.
Consider now a transformation : V R
n
, where (v) = [v]
B
for every v V . The proof of the
following result is straightforward.
PROPOSITION 8R. Suppose that the real vector space V has basis B = v
1
, . . . , v
n
. Then the
transformation : V R
n
, where (v) = [v]
B
satises (7) and (8) for every v V , is a one-
to-one linear transformation, with range R() = R
n
. Furthermore, the inverse linear transformation
1
: R
n
V is also one-to-one, with range R(
1
) = V .
Suppose next that ( = w
1
, . . . , w
m
is a basis of W. Then we can dene a linear transformation
: W R
m
, where (w) = [w]
C
for every w W, in a similar way. We now have the following
diagram of linear transformations.
Suppose next that C = {w
1
, . . . , w
m
} is a basis of W. Then we can dene a linear transformation
: W R
m
, where (w) = [w]
C
V W
R
n
R
m
T

1
1
Clearly the composition
S = T
1
: R
n
R
m
is a euclidean linear transformation, and can therefore be described in terms of a standard matrix A.
Our task is to determine this matrix A in terms of T and the bases B and C.
We know from Proposition 8A that
A = ( S(e
1
) . . . S(e
n
) ) ,
where {e
1
, . . . , e
n
} is the standard basis for R
n
. For every j = 1, . . . , n, we have
S(e
j
) = ( T
1
)(e
j
) = (T(
1
(e
j
))) = (T(v
j
)) = [T(v
j
)]
C
.
It follows that
(9) A = ( [T(v
1
)]
C
. . . [T(v
n
)]
C
) .
Definition. The matrix A given by (9) is called the matrix for the linear transformation T with respect
to the bases B and C.
We now have the following diagram of linear transformations.
V W
R
n
R
m
T

S
1
Hence we can write T as the composition
T =
1
S : V W.
For every v V , we have the following:
v [v]
B
A[v]
B

1
(A[v]
B
)
S

1
S = T
1
: R
n
R
m
Our task is to determine this matrix A in terms of T and the bases B and (.
A = ( S(e
1
) . . . S(e
n
) ) ,
where e
1
, . . . , e
n
n
S(e
j
) = ( T
1
)(e
j
) = (T(
1
(e
j
))) = (T(v
j
)) = [T(v
j
)]
C
.
It follows that
A = ( [T(v
1
)]
C
. . . [T(v
n
)]
C
) . (9)
to the bases B and (.
1
, . . . , w
m
: W R
m
, where (w) = [w]
C
V W
R
n
R
m
T

1
1
S = T
1
: R
n
R
m
A = ( S(e
1
) . . . S(e
n
) ) ,
where {e
1
, . . . , e
n
n
S(e
j
) = ( T
1
)(e
j
) = (T(
1
(e
j
))) = (T(v
j
)) = [T(v
j
)]
C
.
It follows that
(9) A = ( [T(v
1
)]
C
. . . [T(v
n
)]
C
) .
V W
R
n
R
m
T

S
1
T =
1
S : V W.
v [v]
B
A[v]
B

1
(A[v]
B
)
S

1
T =
1
S : V W.
1
, . . . , w
m
: W R
m
, where (w) = [w]
C
V W
R
n
R
m
T

1
1
S = T
1
: R
n
R
m
A = ( S(e
1
) . . . S(e
n
) ) ,
where {e
1
, . . . , e
n
n
S(e
j
) = ( T
1
)(e
j
) = (T(
1
(e
j
))) = (T(v
j
)) = [T(v
j
)]
C
.
It follows that
(9) A = ( [T(v
1
)]
C
. . . [T(v
n
)]
C
) .
V W
R
n
R
m
T

S
1
T =
1
S : V W.
v [v]
B
A[v]
B

1
(A[v]
B
)
S

1
More precisely, if v =
1
v
1
+. . . +
n
v
n
, then
[v]
B
=
1
.
.
.
and A[v]
B
= A
1
.
.
.
1
.
.
.
,
say, and so T(v) =
1
(A[v]
B
) =
1
w
1
+. . . +
m
w
m
. We have proved the following result.
PROPOSITION 8S. Suppose that T : V W is a linear transformation from a real vector space V
into a real vector space W. Suppose further that V and W are nite dimensional, with bases B and (
respectively, and that A is the matrix for the linear transformation T with respect to the bases B and (.
Then for every v V , we have T(v) = w, where w W is the unique vector satisfying [w]
C
= A[v]
B
.
Remark. In the special case when V = W, the linear transformation T : V W is a linear operator
on T. Of course, we may choose a basis B for the domain V of T and a basis ( for the codomain V
of T. In the case when T is the identity linear operator, we often choose B ,= ( since this represents a
change of basis. In the case when T is not the identity operator, we often choose B = ( for the sake of
convenience; we then say that A is the matrix for the linear operator T with respect to the basis B.
Example 8.8.1. Consider an operator T : P
3
P
3
on the real vector space P
3
of all polynomials with
real coecients and degree at most 3, where for every polynomial p(x) in P
3
, we have T(p(x)) = xp
(x),
the product of x with the formal derivative p
(x) of p(x). The reader is invited to check that T is a

linear operator. Now consider the basis B = 1, x, x
2
, x
3
of P
3
. The matrix for T with respect to B is
given by
A = ( [T(1)]
B
[T(x)]
B
[T(x
2
)]
B
[T(x
3
)]
B
) = ( [0]
B
[x]
B
[2x
2
]
B
[3x
3
]
B
) =
0 0 0 0
0 1 0 0
0 0 2 0
0 0 0 3
.
Suppose that p(x) = 1 + 2x + 4x
2
+ 3x
3
. Then
[p(x)]
B
=
1
2
4
3
and A[p(x)]
B
=
0 0 0 0
0 1 0 0
0 0 2 0
0 0 0 3
1
2
4
3
0
2
8
9
,
so that T(p(x)) = 2x + 8x
2
+ 9x
3
. This can be easily veried by noting that
T(p(x)) = xp
(x) = x(2 + 8x + 9x
2
) = 2x + 8x
2
+ 9x
3
.
In general, if p(x) = p
0
+p
1
x +p
2
x
2
+p
3
x
3
, then
[p(x)]
B
=
p
0
p
1
p
2
p
3
and A[p(x)]
B
=
0 0 0 0
0 1 0 0
0 0 2 0
0 0 0 3
p
0
p
1
p
2
p
3
0
p
1
2p
2
3p
3
,
so that T(p(x)) = p
1
x + 2p
2
x
2
+ 3p
3
x
3
. Observe that
T(p(x)) = xp
(x) = x(p
1
+ 2p
2
x + 3p
3
x
2
) = p
1
x + 2p
2
x
2
+ 3p
3
x
3
,
verifying our result.
2
R
2
, given by T(x
1
, x
2
) = (2x
1
+ x
2
, x
1
+ 3x
2
)
for every (x
1
, x
2
) R
2
. Consider also the basis B = (1, 0), (1, 1) of R
2
. Then the matrix for T with
respect to B is given by
A = ( [T(1, 0)]
B
[T(1, 1)]
B
) = ( [(2, 1)]
B
[(3, 4)]
B
) =
1 1
1 4
.
Suppose that (x
1
, x
2
) = (3, 2). Then
[(3, 2)]
B
=
1
2
and A[(3, 2)]

B
=
1 1
1 4
1
2
1
9
,
so that T(3, 2) = (1, 0) + 9(1, 1) = (8, 9). This can be easily veried directly. In general, we have
[(x
1
, x
2
)]
B
=
x
1
x
2
x
2
and A[(x
1
, x
2
)]
B
=
1 1
1 4
x
1
x
2
x
2
x
1
2x
2
x
1
+ 3x
2
,
so that T(x
1
, x
2
) = (x
1
2x
2
)(1, 0) + (x
1
+ 3x
2
)(1, 1) = (2x
1
+x
2
, x
1
+ 3x
2
).
n
R
m
is a linear transformation. Suppose further that B and (
are the standard bases for R
n
and R
m
respectively. Then the matrix for T with respect to B and ( is
given by
A = ( [T(e
1
)]
C
. . . [T(e
n
)]
C
) = ( T(e
1
) . . . T(e
n
) ) ,
so it follows from Proposition 8A that A is simply the standard matrix for T.
Suppose now that T
1
: V W and T
2
: W U are linear transformations, where the real vector
spaces V, W, U are nite dimensional, with respective bases B = v
1
, . . . , v
n
, ( = w
1
, . . . , w
m
and
T = u
1
, . . . , u
k
. We then have the following diagram of linear transformations.
2
R
2
, given by T(x
1
, x
2
) = (2x
1
+ x
2
, x
1
+ 3x
2
)
for every (x
1
, x
2
) R
2
. Consider also the basis B = {(1, 0), (1, 1)} of R
2
. Then the matrix for T with
respect to B is given by
A = ( [T(1, 0)]
B
[T(1, 1)]
B
) = ( [(2, 1)]
B
[(3, 4)]
B
) =
1 1
1 4
.
Suppose that (x
1
, x
2
) = (3, 2). Then
[(3, 2)]
B
=
1
2
and A[(3, 2)]

B
=
1 1
1 4
1
2
1
9
,
so that T(3, 2) = (1, 0) + 9(1, 1) = (8, 9). This can be easily veried directly. In general, we have
[(x
1
, x
2
)]
B
=
x
1
x
2
x
2
and A[(x
1
, x
2
)]
B
=
1 1
1 4
x
1
x
2
x
2
x
1
2x
2
x
1
+ 3x
2
,
so that T(x
1
, x
2
) = (x
1
2x
2
)(1, 0) + (x
1
+ 3x
2
)(1, 1) = (2x
1
+x
2
, x
1
+ 3x
2
).
n
R
m
is a linear transformation. Suppose further that B and C
are the standard bases for R
n
and R
m
respectively. Then the matrix for T with respect to B and C is
given by
A = ( [T(e
1
)]
C
. . . [T(e
n
)]
C
) = ( T(e
1
) . . . T(e
n
) ) ,
so it follows from Proposition 8A that A is simply the standard matrix for T.
Suppose now that T
1
: V W and T
2
: W U are linear transformations, where the real vector
spaces V, W, U are nite dimensional, with respective bases B = {v
1
, . . . , v
n
}, C = {w
1
, . . . , w
m
} and
D = {u
1
, . . . , u
k
}. We then have the following diagram of linear transformations.
V W U
R
n
R
m
R
k
T
1
T
2

S
1
1
S
2
1
Here : U R
k
, where (u) = [u]
D
for every u U, is a linear transformation, and
S
1
= T
1

1
: R
n
R
m
and S
2
= T
2

1
: R
m
R
k
are euclidean linear transformations. Suppose that A
1
and A
2
are respectively the standard matrices
for S
1
and S
2
, so that they are respectively the matrix for T
1
with respect to B and C and the matrix
for T
2
with respect to C and D. Clearly
S
2
S
1
= T
2
T
1

1
: R
n
R
k
.
It follows that A
2
A
1
is the standard matrix for S
2
S
1
, and so is the matrix for T
2
T
1
with respect to
the bases B and D. To summarize, we have the following result.
Here : U R
k
, where (u) = [u]
D
for every u U, is a linear transformation, and
S
1
= T
1

1
: R
n
R
m
and S
2
= T
2

1
: R
m
R
k
are euclidean linear transformations. Suppose that A
1
and A
2
are respectively the standard matrices
for S
1
and S
2
, so that they are respectively the matrix for T
1
with respect to B and ( and the matrix
for T
2
with respect to ( and T. Clearly
S
2
S
1
= T
2
T
1

1
: R
n
R
k
.
It follows that A
2
A
1
is the standard matrix for S
2
S
1
, and so is the matrix for T
2
T
1
with respect to
the bases B and T. To summarize, we have the following result.
PROPOSITION 8T. Suppose that T
1
: V W and T
2
: W U are linear transformations, where
the real vector spaces V, W, U are nite dimensional, with bases B, (, T respectively. Suppose further that
A
1
is the matrix for the linear transformation T
1
with respect to the bases B and (, and that A
2
is the
matrix for the linear transformation T
2
with respect to the bases ( and T. Then A
2
A
1
is the matrix for
the linear transformation T
2
T
1
with respect to the bases B and T.
Example 8.8.4. Consider the linear operator T
1
: P
3
P
3
, where for every polynomial p(x) in P
3
,
we have T
1
(p(x)) = xp
(x). We have already shown that the matrix for T

1
with respect to the basis
B = 1, x, x
2
, x
3
of P
3
is given by
A
1
=
0 0 0 0
0 1 0 0
0 0 2 0
0 0 0 3
.
Consider next the linear operator T
2
: P
3
P
3
, where for every polynomial q(x) = q
0
+q
1
x+q
2
x
2
+q
3
x
3
in P
3
, we have
T
2
(q(x)) = q(1 +x) = q
0
+q
1
(1 +x) +q
2
(1 +x)
2
+q
3
(1 +x)
3
.
We have T
2
(1) = 1, T
2
(x) = 1 + x, T
2
(x
2
) = 1 + 2x + x
2
and T
2
(x
3
) = 1 + 3x + 3x
2
+ x
3
, so that the
matrix for T
2
with respect to B is given by
A
2
= ( [T
2
(1)]
B
[T
2
(x)]
B
[T
2
(x
2
)]
B
[T
2
(x
3
)]
B
) =
1 1 1 1
0 1 2 3
0 0 1 3
0 0 0 1
.
Consider now the composition T = T
2
T
1
: P
3
P
3
. Let A denote the matrix for T with respect to B.
By Proposition 8T, we have
A = A
2
A
1
=
1 1 1 1
0 1 2 3
0 0 1 3
0 0 0 1
0 0 0 0
0 1 0 0
0 0 2 0
0 0 0 3
0 1 2 3
0 1 4 9
0 0 2 9
0 0 0 3
.
Suppose that p(x) = p
0
+p
1
x +p
2
x
2
+p
3
x
3
. Then
[p(x)]
B
=
p
0
p
1
p
2
p
3
and A[p(x)]
B
=
0 1 2 3
0 1 4 9
0 0 2 9
0 0 0 3
p
0
p
1
p
2
p
3
p
1
+ 2p
2
+ 3p
3
p
1
+ 4p
2
+ 9p
3
2p
2
+ 9p
3
3p
3
,
so that T(p(x)) = (p
1
+2p
2
+3p
3
) +(p
1
+4p
2
+9p
3
)x+(2p
2
+9p
3
)x
2
+3p
3
x
3
. We can check this directly
by noting that
T(p(x)) = T
2
(T
1
(p(x))) = T
2
(p
1
x + 2p
2
x
2
+ 3p
3
x
3
) = p
1
(1 +x) + 2p
2
(1 +x)
2
+ 3p
3
(1 +x)
3
= (p
1
+ 2p
2
+ 3p
3
) + (p
1
+ 4p
2
+ 9p
3
)x + (2p
2
+ 9p
3
)x
2
+ 3p
3
x
3
.
2
R
2
, given by T(x
1
, x
2
) = (2x
1
+ x
2
, x
1
+ 3x
2
)
for every (x
1
, x
2
) R
2
. We have already shown that the matrix for T with respect to the basis
B = (1, 0), (1, 1) of R
2
is given by
A =
1 1
1 4
.
Consider the linear operator T
2
: R
2
R
2
. By Proposition 8T, the matrix for T
2
with respect to B is
given by
A
2
=
1 1
1 4
1 1
1 4
0 5
5 15
.
Suppose that (x
1
, x
2
) R
2
. Then
[(x
1
, x
2
)]
B
=
x
1
x
2
x
2
and A
2
[(x
1
, x
2
)]
B
=
0 5
5 15
x
1
x
2
x
2
5x
2
5x
1
+ 10x
2
,
so that T(x
1
, x
2
) = 5x
2
(1, 0) + (5x
1
+ 10x
2
)(1, 1) = (5x
1
+ 5x
2
, 5x
1
+ 10x
2
). The reader is invited to
check this directly.
A simple consequence of Propositions 8N and 8T is the following result concerning inverse linear
transformations.
PROPOSITION 8U. Suppose that T : V V is a linear operator on a nite dimensional real vector
space V with basis B. Suppose further that A is the matrix for the linear operator T with respect to the
basis B. Then T is one-to-one if and only if A is invertible. Furthermore, if T is one-to-one, then A
1
is the matrix for the inverse linear operator T
1
: V V with respect to the basis B.
Proof. Simply note that T is one-to-one if and only if the system Ax = 0 has only the trivial solution
x = 0. The last assertion follows easily from Proposition 8T, since if A
denotes the matrix for the

inverse linear operator T
1
with respect to B, then we must have A
A = I, the matrix for the identity

operator T
1
T with respect to B. _
Example 8.8.6. Consider the linear operator T : P
3
P
3
, where for every q(x) = q
0
+q
1
x+q
2
x
2
+q
3
x
3
in P
3
, we have
T(q(x)) = q(1 +x) = q
0
+q
1
(1 +x) +q
2
(1 +x)
2
+q
3
(1 +x)
3
.
We have already shown that the matrix for T with respect to the basis B = 1, x, x
2
, x
3
is given by
A =
1 1 1 1
0 1 2 3
0 0 1 3
0 0 0 1
.
This matrix is invertible, so it follows that T is one-to-one. Furthermore, it can be checked that
A
1
=
1 1 1 1
0 1 2 3
0 0 1 3
0 0 0 1
.
Suppose that p(x) = p
0
+p
1
x +p
2
x
2
+p
3
x
3
. Then
[p(x)]
B
=
p
0
p
1
p
2
p
3
and A
1
[p(x)]
B
=
1 1 1 1
0 1 2 3
0 0 1 3
0 0 0 1
p
0
p
1
p
2
p
3
p
0
p
1
+p
2
p
3
p
1
2p
2
+ 3p
3
p
2
3p
3
p
3
,
so that
T
1
(p(x)) = (p
0
p
1
+p
2
p
3
) + (p
1
2p
2
+ 3p
3
)x + (p
2
3p
3
)x
2
+p
3
x
3
= p
0
+p
1
(x 1) +p
2
(x
2
2x + 1) +p
3
(x
3
3x
2
+ 3x 1)
= p
0
+p
1
(x 1) +p
2
(x 1)
2
+p
3
(x 1)
3
= p(x 1).
8.9. Change of Basis
Suppose that V is a nite dimensional real vector space, with one basis B = v
1
, . . . , v
n
and another
basis B
= u
1
, . . . , u
n
. Suppose that T : V V is a linear operator on V . Let A denote the matrix
for T with respect to the basis B, and let A
denote the matrix for T with respect to the basis B
. If
v V and T(v) = w, then
[w]
B
= A[v]
B
(10)
and
[w]
B
= A
[v]
B
. (11)
We wish to nd the relationship between A
and A.
Recall Proposition 8J, that if
P = ( [u
1
]
B
. . . [u
n
]
B
)
denotes the transition matrix from the basis B
to the basis B, then

[v]
B
= P[v]
B
and [w]
B
= P[w]
B
. (12)
Note that the matrix P can also be interpreted as the matrix for the identity operator I : V V with
respect to the bases B
and B. It is easy to see that the matrix P is invertible, and

P
1
= ( [v
1
]
B
. . . [v
n
]
B
)
denotes the transition matrix from the basis B to the basis B
, and can also be interpreted as the matrix

for the identity operator I : V V with respect to the bases B and B
.
Combining (10) and (12), we conclude that
[w]
B
= P
1
[w]
B
= P
1
A[v]
B
= P
1
AP[v]
B
.
Comparing this with (11), we conclude that
P
1
AP = A
. (13)
This implies that
A = PA
P
1
. (14)
Remark. We can use the notation
A = [T]
B
and A
= [T]
B
to denote that A and A
are the matrices for T with respect to the basis B and with respect to the basis
B
respectively. We can also write

P = [I]
B,B
to denote that P is the transition matrix from the basis B
to the basis B, so that

P
1
= [I]
B
,B
.
Then (13) and (14) become respectively
[I]
B
,B
[T]
B
[I]
B,B
= [T]
B
and [I]
B,B
[T]
B
[I]
B
,B
= [T]
B
.
PROPOSITION 8V. Suppose that T : V V is a linear operator on a nite dimensional real vector
space V , with bases B = v
1
, . . . , v
n
and B
= u
1
, . . . , u
n
. Suppose further that A and A
are the
matrices for T with respect to the basis B and with respect to the basis B
respectively. Then
P
1
AP = A
and A
= PAP
1
,
where
P = ( [u
1
]
B
. . . [u
n
]
B
)
to the basis B.
Remarks. (1) We have the following picture.
Then (13) and (14) become respectively
[I]
B
,B
[T]
B
[I]
B,B
= [T]
B
and [I]
B,B
[T]
B
[I]
B
,B
= [T]
B
.
PROPOSITION 8V. Suppose that T : V V is a linear operator on a nite dimensional real vector
space V , with bases B = {v
1
, . . . , v
n
} and B
= {u
1
, . . . , u
n
}. Suppose further that A and A
are the
matrices for T with respect to the basis B and with respect to the basis B
respectively. Then
P
1
AP = A
and A
= PAP
1
,
where
P = ( [u
1
]
B
. . . [u
n
]
B
)
to the basis B.
Remarks. (1) We have the following picture.
v w
v w
[v]
B
[w]
B
[v]
B
[w]
B
T
I I
T
A
P
A
P
1
(2) The idea can be extended to the case of linear transformations T : V W from a nite dimensional
real vector space into another, with a change of basis in V and a change of basis in W.
3
of all polynomials with real coecients and degree at
most 3, with bases B = {1, x, x
2
, x
3
} and B
= {1, 1 + x, 1 + x + x
2
, 1 + x + x
2
+ x
3
}. Consider also
the linear operator T : P
3
P
3
, where for every polynomial p(x) = p
0
+ p
1
x + p
2
x
2
+ p
3
x
3
, we have
T(p(x)) = (p
0
+p
1
) +(p
1
+p
2
)x+(p
2
+p
3
)x
2
+(p
0
+p
3
)x
3
. Let A denote the matrix for T with respect
to the basis B. Then T(1) = 1 +x
3
, T(x) = 1 +x, T(x
2
) = x +x
2
and T(x
3
) = x
2
+x
3
, and so
A = ( [T(1)]
B
[T(x)]
B
[T(x
2
)]
B
[T(x
3
)]
B
) =
1 1 0 0
0 1 1 0
0 0 1 1
1 0 0 1
.
Next, note that the transition matrix from the basis B
to the basis B is given by

P = ( [1]
B
[1 +x]
B
[1 +x +x
2
]
B
[1 +x +x
2
+x
3
]
B
) =
1 1 1 1
0 1 1 1
0 0 1 1
0 0 0 1
.
(2) The idea can be extended to the case of linear transformations T : V W from a nite dimensional
real vector space into another, with a change of basis in V and a change of basis in W.
3
most 3, with bases B = 1, x, x
2
, x
3
and B
= 1, 1 + x, 1 + x + x
2
, 1 + x + x
2
+ x
3
. Consider also
the linear operator T : P
3
P
3
0
+ p
1
x + p
2
x
2
+ p
3
x
3
, we have
T(p(x)) = (p
0
+p
1
) +(p
1
+p
2
)x+(p
2
+p
3
)x
2
+(p
0
+p
3
)x
3
. Let A denote the matrix for T with respect
to the basis B. Then T(1) = 1 +x
3
, T(x) = 1 +x, T(x
2
) = x +x
2
and T(x
3
) = x
2
+x
3
, and so
A = ( [T(1)]
B
[T(x)]
B
[T(x
2
)]
B
[T(x
3
)]
B
) =
1 1 0 0
0 1 1 0
0 0 1 1
1 0 0 1
.
Next, note that the transition matrix from the basis B

P = ( [1]
B
[1 +x]
B
[1 +x +x
2
]
B
[1 +x +x
2
+x
3
]
B
) =
1 1 1 1
0 1 1 1
0 0 1 1
0 0 0 1
.
It can be checked that
P
1
=
1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1
,
and so
A
= P
1
AP =
1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1
1 1 0 0
0 1 1 0
0 0 1 1
1 0 0 1
1 1 1 1
0 1 1 1
0 0 1 1
0 0 0 1
1 1 0 0
0 1 1 0
1 1 0 0
1 1 1 2
is the matrix for T with respect to the basis B
. It follows that
T(1) = 1 (1 +x +x
2
) + (1 +x +x
2
+x
3
) = 1 +x
3
,
T(1 +x) = 1 + (1 +x) (1 +x +x
2
) + (1 +x +x
2
+x
3
) = 2 +x +x
3
,
T(1 +x +x
2
) = (1 +x) + (1 +x +x
2
+x
3
) = 2 + 2x +x
2
+x
3
,
T(1 +x +x
2
+x
3
) = 2(1 +x +x
2
+x
3
) = 2 + 2x + 2x
2
+ 2x
3
.
These can be veried directly.
8.10. Eigenvalues and Eigenvectors
Definition. Suppose that T : V V is a linear operator on a nite dimensional real vector space V .
Then any real number R is called an eigenvalue of T if there exists a non-zero vector v V such that
T(v) = v. This non-zero vector v V is called an eigenvector of T corresponding to the eigenvalue .
The purpose of this section is to show that the problem of eigenvalues and eigenvectors of the linear
operator T can be reduced to the problem of eigenvalues and eigenvectors of the matrix for T with
respect to any basis B of V . The starting point of our argument is the following theorem, the proof of
which is left as an exercise.
PROPOSITION 8W. Suppose that T : V V is a linear operator on a nite dimensional real vector
space V , with bases B and B
. Suppose further that A and A
are the matrices for T with respect to the

basis B and with respect to the basis B
respectively. Then
(a) det A = det A
;
(b) A and A
have the same rank;

(c) A and A
have the same characteristic polynomial;

(d) A and A
have the same eigenvalues; and

(e) the dimension of the eigenspace of A corresponding to an eigenvalue is equal to the dimension of
the eigenspace of A
corresponding to .
We also state without proof the following result.
PROPOSITION 8X. Suppose that T : V V is a linear operator on a nite dimensional real vector
space V . Suppose further that A is the matrix for T with respect to a basis B of V . Then
(a) the eigenvalues of T are precisely the eigenvalues of A; and
(b) a vector u V is an eigenvector of T corresponding to an eigenvalue if and only if the coordinate
matrix [u]
B
is an eigenvector of A corresponding to the eigenvalue .
Suppose now that A is the matrix for a linear operator T : V V on a nite dimensional real
vector space V with respect to a basis B = v
1
, . . . , v
n
. If A can be diagonalized, then there exists an
invertible matrix P such that
P
1
AP = D
is a diagonal matrix. Furthermore, the columns of P are eigenvectors of A, and so are the coordinate
matrices of eigenvectors of T with respect to the basis B. In other words,
P = ( [u
1
]
B
. . . [u
n
]
B
) ,
where B
= u
1
, . . . , u
n
is a basis of V consiting of eigenvectors of T. Furthermore, P is the transition
matrix from the basis B
to the basis B. It follows that the matrix for T with respect to the basis B
is
given by
D =
1
.
.
.
,
where
1
, . . . ,
n
are the eigenvalues of T.
2
most 2, with basis B = 1, x, x
2
. Consider also the linear operator T : P
2
P
2
, where for every
polynomial p(x) = p
0
+p
1
x +p
2
x
2
, we have T(p(x)) = (5p
0
2p
1
) +(6p
1
+2p
2
2p
0
)x +(2p
1
+7p
2
)x
2
.
Then T(1) = 5 2x, T(x) = 2 +6x+2x
2
and T(x
2
) = 2x+7x
2
, so that the matrix for T with respect
A = ( [T(1)]
B
[T(x)]
B
[T(x
2
)]
B
) =
5 2 0
2 6 2
0 2 7
.
It is a simple exercise to show that the matrix A has eigenvalues 3, 6, 9, with corresponding eigenvectors
x
1
=
2
2
1
, x
2
=
2
1
2
, x
3
=
1
2
2
,
so that writing
P =
2 2 1
2 1 2
1 2 2
,
we have
P
1
AP =
3 0 0
0 6 0
0 0 9
.
Now let B
= p
1
(x), p
2
(x), p
3
(x), where
[p
1
(x)]
B
=
2
2
1
, [p
2
(x)]
B
=
2
1
2
, [p
3
(x)]
B
=
1
2
2
.
Then P is the transition matrix from the basis B
to the basis B, and D is the matrix for T with respect

to the basis B
. Clearly p
1
(x) = 2 +2x x
2
, p
2
(x) = 2 x +2x
2
and p
3
(x) = 1 +2x +2x
2
. Note now
that
T(p
1
(x)) = T(2 + 2x x
2
) = 6 + 6x 3x
2
= 3p
1
(x),
T(p
2
(x)) = T(2 x + 2x
2
) = 12 6x + 12x
2
= 6p
2
(x),
T(p
3
(x)) = T(1 + 2x + 2x
2
) = 9 + 18x + 18x
2
= 9p
3
(x).
1. Consider the transformation T : R
3
R
4
, given by
T(x
1
, x
2
, x
3
) = (x
1
+x
2
+x
3
, x
2
+x
3
, 3x
1
+x
2
, 2x
2
+x
3
)
for every (x
1
, x
2
, x
3
) R
3
.
a) Find the standard matrix A for T.
b) By reducing A to row echelon form, determine the dimension of the kernel of T and the dimension
of the range of T.
2. Consider a linear operator T : R
3
R
3
A =
1 2 3
2 1 3
1 3 2
.
Let e
1
, e
2
, e
3
denote the standard basis for R
3
.
a) Find T(e
j
) for every j = 1, 2, 3.
b) Find T(2e
1
+ 5e
2
+ 3e
3
).
c) Is T invertible? Justify your assertion.
3. Consider the linear operator T : R
2
R
2
A =
1 1
0 1
.
a) Find the image under T of the line x
1
+ 2x
2
= 3.
b) Find the image under T of the circle x
2
1
+x
2
2
= 1.
4. For each of the following, determine whether the given transformation is linear:
a) T : V R, where V is a real inner product space and T(u) = |u|.
b) T : /
2,2
(R) /
2,3
(R), where B /
2,3
(R) is xed and T(A) = AB.
c) T : /
3,4
(R) /
4,3
(R), where T(A) = A
t
.
d) T : P
2
P
2
, where T(p
0
+p
1
x +p
2
x
2
) = p
0
+p
1
(2 +x) +p
2
(2 +x)
2
.
e) T : P
2
P
2
, where T(p
0
+p
1
x +p
2
x
2
) = p
0
+p
1
x + (p
2
+ 1)x
2
.
5. Suppose that T : R
3
R
3
is a linear transformation satisfying the conditions T(1, 0, 0) = (2, 4, 1),
T(1, 1, 0) = (3, 0, 2) and T(1, 1, 1) = (1, 4, 6).
a) Evaluate T(5, 3, 2).
b) Find T(x
1
, x
2
, x
3
) for every (x
1
, x
2
, x
3
) R
3
.
6. Suppose that T : R
3
R
3
is orthogonal projection onto the x
1
x
2
-plane.
a) Find the standard matrix A for T.
b) Find A
2
.
c) Show that T T = T.
7. Consider the bases B = u
1
, u
2
, u
3
and ( = v
1
, v
2
, v
3
of R
3
, where u
1
= (2, 1, 1), u
2
= (2, 1, 1),
u
3
= (1, 2, 1), v
1
= (3, 1, 5), v
2
= (1, 1, 3) and v
3
= (1, 0, 2).
a) Find the transition matrix from the basis ( to the basis B.
b) Find the transition matrix from the basis B to the basis (.
c) Show that the matrices in parts (a) and (b) are inverses of each other.
d) Compute the coordinate matrix [u]
C
, where u = (5, 8, 5).
e) Use the transition matrix to compute the coordinate matrix [u]
B
.
f) Compute the coordinate matrix [u]
B
directly and compare it to your answer in part (e).
8. Consider the bases B = p
1
, p
2
and ( = q
1
, q
2
of P
1
, where p
1
= 2, p
2
= 3 +2x, q
1
= 6 +3x and
q
2
= 10 + 2x.
a) Find the transition matrix from the basis ( to the basis B.
b) Find the transition matrix from the basis B to the basis (.
c) Show that the matrices in parts (a) and (b) are inverses of each other.
d) Compute the coordinate matrix [p]
C
, where p = 4 +x.
e) Use the transition matrix to compute the coordinate matrix [p]
B
.
f) Compute the coordinate matrix [p]
B
directly and compare it to your answer in part (e).
9. Let V be the real vector space spanned by the functions f
1
= sin x and f
2
= cos x.
a) Show that g
1
= 2 sin x + cos x and g
2
= 3 cos x form a basis of V .
b) Find the transition matrix from the basis ( = g
1
, g
2
to the basis B = f
1
, f
2
of V .
c) Compute the coordinate matrix [f]
C
, where f = 2 sin x 5 cos x.
d) Use the transition matrix to compute the coordinate matrix [f]
B
.
e) Compute the coordinate matrix [f]
B
directly and compare it to your answer in part (d).
10. Let P be the transition matrix from a basis ( to another basis B of a real vector space V . Explain
why P is invertible.
11. For each of the following linear transformations T, nd ker(T) and R(T), and verify the Rank-nullity
theorem:
a) T : R
3
R
3
, with standard matrix A =
1 1 3
5 6 4
7 4 2
.
b) T : P
3
P
2
, where T(p(x)) = p
(x), the formal derivative.

c) T : P
1
R, where T(p(x)) =
1
0
p(x) dx.
12. For each of the following, determine whether the linear operator T : R
n
R
n
is one-to-one. If so,
nd also the inverse linear operator T
1
: R
n
R
n
:
a) T(x
1
, x
2
, x
3
, . . . , x
n
) = (x
2
, x
1
, x
3
, . . . , x
n
)
b) T(x
1
, x
2
, x
3
, . . . , x
n
) = (x
2
, x
3
, . . . , x
n
, x
1
)
c) T(x
1
, x
2
, x
3
, . . . , x
n
) = (x
2
, x
2
, x
3
, . . . , x
n
)
13. Consider the operator T : R
2
R
2
, where T(x
1
, x
2
) = (x
1
+ kx
2
, x
2
) for every (x
1
, x
2
) R
2
.
Here k R is xed.
a) Show that T is a linear operator.
b) Show that T is one-to-one.
c) Find the inverse linear operator T
1
: R
2
R
2
.
14. Consider the linear transformation T : P
2
P
1
, where T(p
0
+p
1
x+p
2
x
2
) = (p
0
+p
2
) +(2p
0
+p
1
)x
for every polynomial p
0
+p
1
x +p
2
x
2
in P
2
.
a) Find the matrix for T with respect to the bases 1, x, x
2
and 1, x.
b) Find T(2 + 3x + 4x
2
) by using the matrix A.
c) Use the matrix A to recover the formula T(p
0
+p
1
x +p
2
x
2
) = (p
0
+p
2
) + (2p
0
+p
1
)x.
15. Consider the linear operator T : R
2
R
2
, where T(x
1
, x
2
) = (x
1
x
2
, x
1
+x
2
) for every (x
1
, x
2
) R
2
.
a) Find the matrix A for T with respect to the basis (1, 1), (1, 0) of R
2
.
b) Use the matrix A to recover the formula T(x
1
, x
2
) = (x
1
x
2
, x
1
+x
2
).
c) Is T one-to-one? If so, use the matrix A to nd the inverse linear operator T
1
: R
2
R
2
.
16. Consider the real vector space of all real sequences x = (x
1
, x
2
, x
3
, . . .) such that the series
n=1
x
n
is convergent.
a) Show that the transformation T : V R, given by
T(x) =
n=1
x
n
for every x V , is a linear transformation.
b) Is the linear transformation T one-to-one? If so, give a proof. If not, nd two distinct vectors
x, y V such that T(x) = T(y).
17. Suppose that T
1
: R
2
R
2
and T
2
: R
2
R
2
are linear operators such that
T
1
(x
1
, x
2
) = (x
1
+x
2
, x
1
x
2
) and T
2
(x
1
, x
2
) = (2x
1
+x
2
, x
1
2x
2
)
for every (x
1
, x
2
) R
2
.
a) Show that T
1
and T
2
are one-to-one.
b) Find the formulas for T
1
1
, T
1
2
and (T
2
T
1
)
1
.
c) Verify that (T
2
T
1
)
1
= T
1
1
T
1
2
.
18. Consider the transformation T : P
1
R
2
, where T(p(x)) = (p(0), p(1)) for every polynomial p(x)
in P
1
.
a) Find T(1 2x).
b) Show that T is a linear transformation.
c) Show that T is one-to-one.
d) Find T
1
(2, 3), and sketch its graph.
19. Suppose that V and W are nite dimensional real vector spaces with dimV > dimW. Suppose
further that T : V W is a linear transformation. Explain why T cannot be one-to-one.
20. Suppose that
A =
1 3 1
2 0 5
6 2 4
is the matrix for a linear operator T : P

2
P
2
with respect to the basis B = p
1
(x), p
2
(x), p
3
(x)
of P
2
, where p
1
(x) = 3x + 3x
2
, p
2
(x) = 1 + 3x + 2x
2
and p
3
(x) = 3 + 7x + 2x
2
.
a) Find [T(p
1
(x))]
B
, [T(p
2
(x))]
B
and [T(p
3
(x))]
B
.
b) Find T(p
1
(x)), T(p
2
(x)) and T(p
3
(x)).
c) Find a formula for T(p
0
+p
1
x +p
2
x
2
).
d) Use the formula in part (c) to compute T(1 +x
2
).
21. Suppose that B = v
1
, v
2
, v
3
, v
4
is a basis for a real vector space V . Suppose that T : V V is a
linear operator, with T(v
1
) = v
2
, T(v
2
) = v
4
, T(v
3
) = v
1
and T(v
4
) = v
3
.
a) Find the matrix for T with respect to the basis B.
b) Is T one-to-one? If so, describe its inverse.
22. Let P
k
denote the vector space of all polynomials with real coecients and degree at most k.
Consider P
2
with basis B = 1, x, x
2
and P
3
with basis ( = 1, x, x
2
, x
3
. We dene T
1
: P
2
P
3
and T
2
: P
3
P
2
as follows. For every polynomial p(x) = a
0
+ a
1
x + a
2
x
2
in P
2
, we have
T
1
(p(x)) = xp(x) = a
0
x +a
1
x
2
+a
2
x
3
. For every polynomial q(x) in P
3
, we have T
2
(q(x)) = q
(x),
the formal derivative of q(x) with respect to the variable x.
a) Show that T
1
: P
2
P
3
and T
2
: P
3
P
2
are linear transformations.
b) Find T
1
(1), T
1
(x), T
1
(x
2
), and compute the matrix A
1
for T
1
: P
2
P
3
with respect to the bases
B and (.
c) Find T
2
(1), T
2
(x), T
2
(x
2
), T
2
(x
3
), and compute the matrix A
2
for T
2
: P
3
P
2
with respect to
the bases ( and B.
d) Let T = T
2
T
1
. Find T(1), T(x), T(x
2
), and compute the matrix A for T : P
2
P
2
with respect
to the basis B. Verify that A = A
2
A
1
.
23. Suppose that T : V V is a linear operator on a real vector space V with basis B. Suppose that
for every v V , we have
[T(v)]
B
=
x
1
x
2
+x
3
x
1
+x
2
x
1
x
2
and [v]
B
=
x
1
x
2
x
3
.
b) Is T one-to-one? If so, describe its inverse.
24. For each of the following, let V be the subspace with basis B = f
1
(x), f
2
(x), f
3
(x) of the space
of all real valued functions dened on R. Let T : V V be dened by T(f(x)) = f
(x) for every

function f(x) in V . Find the matrix for T with respect to the basis B:
a) f
1
(x) = 1, f
2
(x) = sin x, f
3
(x) = cos x
b) f
1
(x) = e
2x
, f
2
(x) = xe
2x
, f
3
(x) = x
2
e
2x
25. Let P
2
denote the vector space of all polynomials with real coecients and degree at most 2,
with basis B = 1, x, x
2
. Consider the linear operator T : P
2
P
2
, where for every polynomial
p(x) = a
0
+a
1
x +a
2
x
2
in P
2
, we have T(p(x)) = p(2x + 1) = a
0
+a
1
(2x + 1) +a
2
(2x + 1)
2
.
a) Find T(1), T(x), T(x
2
), and compute the matrix A for T with respect to the basis B.
b) Use the matrix A to compute T(3 +x + 2x
2
).
c) Check your calculations in part (b) by computing T(3 +x + 2x
2
) directly.
d) What is the matrix for T T : P
2
P
2
with respect to the basis B?
e) Consider a new basis B
= 1 +x, 1 +x
2
, x +x
2
of P
2
. Using a change of basis matrix, compute
the matrix for T with respect to the basis B
.
f) Check your answer in part (e) by computing the matrix directly.
26. Consider the linear operator T : P
1
P
1
0
+ p
1
x in P
1
, we
have T(p(x)) = p
0
+p
1
(x + 1). Consider also the bases B = 6 + 3x, 10 + 2x and B
= 2, 3 + 2x
of P
1
.
b) Use Proposition 8V to compute the matrix for T with respect to the basis B
.
27. Suppose that V and W are nite dimensional real vector spaces. Suppose further that B and B
are
bases for V , and that ( and (
are bases for W. Show that for any linear transformation T : V W,

we have
[I]
C
,C
[T]
C,B
[I]
B,B
= [T]
C
,B
.
28. Prove Proposition 8W.
29. Prove Proposition 8X.
30. For each of the following linear transformations T : R
3
R
3
, nd a basis B of R
3
such that the
matrix for T with respect to the basis B is a diagonal matrix:
a) T(x
1
, x
2
, x
3
) = (x
2
+x
3
, x
1
+x
3
, x
1
+x
2
)
b) T(x
1
, x
2
, x
3
) = (4x
1
+x
3
, 2x
1
+ 3x
2
+ 2x
3
, x
1
+ 4x
3
)
31. Consider the linear operator T : P
2
P
2
, where
T(p
0
+p
1
x +p
2
x
2
) = (p
0
6p
1
+ 12p
2
) + (13p
1
30p
2
)x + (9p
1
20p
2
)x
2
.
a) Find the eigenvalues of T.
b) Find a basis B of P
2
such that the matrix for T with respect to B is a diagonal matrix.
LINEAR ALGEBRA
W W L CHEN
c _ W W L Chen, 1997, 2008.
Chapter 9
REAL INNER PRODUCT SPACES
9.1. Euclidean Inner Products
In this section, we consider vectors of the form u = (u
1
, . . . , u
n
) in the euclidean space R
n
. In particular,
we shall generalize the concept of dot product, norm and distance, rst developed for R
2
and R
3
in
Chapter 4.
1
, . . . , u
n
) and v = (v
1
, . . . , v
n
) are vectors in R
n
. The euclidean dot
product of u and v is dened by
u v = u
1
v
1
+ . . . + u
n
v
n
,
the euclidean norm of u is dened by
|u| = (u u)
1/2
= (u
2
1
+ . . . + u
2
n
)
1/2
,
and the euclidean distance between u and v is dened by
d(u, v) = |u v| = ((u
1
v
1
)
2
+ . . . + (u
n
v
n
)
2
)
1/2
.
PROPOSITION 9A. Suppose that u, v, w R
n
and c R. Then
(a) u v = v u;
(b) u (v +w) = (u v) + (u w);
(c) c(u v) = (cu) v; and
(d) u u 0, and u u = 0 if and only if u = 0.
Chapter 9 : Real Inner Product Spaces page 1 of 16
PROPOSITION 9B. (CAUCHY-SCHWARZ INEQUALITY) Suppose that u, v R
n
. Then
[u v[ |u| |v|.
In other words,
[u
1
v
1
+ . . . + u
n
v
n
[ (u
2
1
+ . . . + u
2
n
)
1/2
(v
2
1
+ . . . + v
2
n
)
1/2
.
PROPOSITION 9C. Suppose that u, v R
n
and c R. Then
(a) |u| 0;
(b) |u| = 0 if and only if u = 0;
(c) |cu| = [c[ |u|; and
(d) |u +v| |u| +|v|.
PROPOSITION 9D. Suppose that u, v, w R
n
. Then
(a) d(u, v) 0;
(b) d(u, v) = 0 if and only if u = v;
(c) d(u, v) = d(v, u); and
(d) d(u, v) d(u, w) + d(w, v).
Remark. Parts (d) of Propositions 9C and 9D are known as the Triangle inequality.
In R
2
and R
3
, we say that two non-zero vectors are perpendicular if their dot product is zero. We
now generalize this idea to vectors in R
n
.
Definition. Two vectors u, v R
n
are said to be orthogonal if u v = 0.
Example 9.1.1. Suppose that u, v R
n
are orthogonal. Then
|u +v|
2
= (u +v) (u +v) = u u + 2u v +v v = |u|
2
+|v|
2
.
This is an extension of Pythagorass theorem.
Remarks. (1) Suppose that we write u, v R
n
as column matrices. Then
u v = v
t
u,
where we use matrix multiplication on the right hand side.
(2) Matrix multiplication can be described in terms of dot product. Suppose that A is an m n
matrix and B is an np matrix. If we let r
1
, . . . , r
m
denote the vectors formed from the rows of A, and
let c
1
, . . . , c
p
denote the vectors formed from the columns of B, then
AB =
_
_
_
r
1
c
1
. . . r
1
c
p
.
.
.
.
.
.
r
m
c
1
. . . r
m
c
p
_
_
_.
9.2. Real Inner Products
The purpose of this section and the next is to extend our discussion to dene inner products in real
vector spaces. We begin by giving a reminder of the basics of real vector spaces or vector spaces over R.
Definition. A real vector space V is a set of objects, known as vectors, together with vector addition
+ and multiplication of vectors by elements of R, and satisfying the following properties:
(SM1) For every c R and u V , we have cu V .
(SM2) For every c R and u, v V , we have c(u +v) = cu + cv.
(SM3) For every a, b R and u V , we have (a + b)u = au + bu.
(SM4) For every a, b R and u V , we have (ab)u = a(bu).
Remark. The elements a, b, c R discussed in (SM1)(SM5) are known as scalars. Multiplication of
vectors by elements of R is sometimes known as scalar multiplication.
Definition. Suppose that V is a real vector space, and that W is a subset of V . Then we say that W
is a subspace of V if W forms a real vector space under the vector addition and scalar multiplication
dened in V .
Remark. Suppose that V is a real vector space, and that W is a non-empty subset of V . Then W is a
subspace of V if the following conditions are satised:
(SP1) For every u, v W, we have u +v W.
(SP2) For every c R and u W, we have cu W.
The reader may refer to Chapter 5 for more details and examples.
We are now in a position to dene an inner product on a real vector space V . The following denition
is motivated by Proposition 9A concerning the properties of the euclidean dot product in R
n
.
Definition. Suppose that V is a real vector space. By a real inner product on V , we mean a function
, ) : V V R which satises the following conditions:
(IP1) For every u, v V , we have u, v) = v, u).
(IP2) For every u, v, w V , we have u, v +w) = u, v) +u, w).
(IP3) For every u, v V and c R, we have cu, v) = cu, v).
(IP4) For every u V , we have u, u) 0, and u, u) = 0 if and only if u = 0.
Remarks. (1) The properties (IP1)(IP4) describe respectively symmetry, additivity, homogeneity and
positivity.
(2) We sometimes simply refer to an inner product if we know that V is a real vector space.
Definition. A real vector space with an inner product is called a real inner product space.
Our next denition is a natural extension of the idea of euclidean norm and euclidean distance.
Definition. Suppose that u and v are vectors in a real inner pruduct space V . Then the norm of u is
dened by
|u| = u, u)
1/2
,
and the distance between u and v is dened by
d(u, v) = |u v|.
Example 9.2.1. For u, v R
n
, let u, v) = u v, the euclidean dot product discussed in the last
section. This satises Proposition 9A and hence conditions (IP1)(IP4). The inner product is known as
the euclidean inner product in R
n
.
Example 9.2.2. Let w
1
, . . . , w
n
be positive real numbers. For u = (u
1
, . . . , u
n
) and v = (v
1
, . . . , v
n
) in
R
n
, let
u, v) = w
1
u
1
v
1
+ . . . + w
n
u
n
v
n
.
It is easy to check that conditions (IP1)(IP4) are satised. This inner product is called a weighted
euclidean inner product in R
n
, and the positive real numbers w
1
, . . . , w
n
are known as weights. The unit
circle with respect to this inner product is given by
u R
n
: |u| = 1 = u R
n
: u, u) = 1 = u R
n
: w
1
u
2
1
+ . . . + w
n
u
2
n
= 1.
Example 9.2.3. Let A be a xed invertible n n matrix with real entries. For u, v R
n
, interpreted
as column matrices, let
u, v) = Au Av,
the euclidean dot product of the vectors Au and Av. It can be checked that conditions (IP1)(IP4) are
satised. This inner product is called the inner product generated by the matrix A. To check conditions
(IP1)(IP4), it is useful to note that
u, v) = (Av)
t
Au = v
t
A
t
Au.
Example 9.2.4. Consider the vector space /
2,2
(R) of all 2 2 matrices with real entries. For matrices
U =
_
u
11
u
12
u
21
u
22
_
and V =
_
v
11
v
12
v
21
v
22
_
in /
2,2
(R), let
U, V ) = u
11
v
11
+ u
12
v
12
+ u
21
v
21
+ u
22
v
22
.
It is easy to check that conditions (IP1)(IP4) are satised.
2
of all polynomials with real coecients and of degree at
most 2. For polynomials
p = p(x) = p
0
+ p
1
x + p
2
x
2
and q = q(x) = q
0
+ q
1
x + q
2
x
2
in P
2
, let
p, q) = p
0
q
0
+ p
1
q
1
+ p
2
q
2
.
It can be checked that conditions (IP1)(IP4) are satised.
Example 9.2.6. It is not dicult to show that C[a, b], the collection of all real valued functions con-
tinuous in the closed interval [a, b], forms a real vector space. We also know from the theory of real
valued functions that functions continuous over a closed interval [a, b] are integrable over [a, b]. For
f, g C[a, b], let
f, g) =
_
b
a
f(x)g(x) dx.
It can be checked that conditions (IP1)(IP4) are satised.
9.3. Angles and Orthogonality
Recall that in R
2
and R
3
, we can actually dene the euclidean dot product of two vectors u and v by
the formula
u v = |u| |v| cos , (1)
where is the angle between u and v. Indeed, this is the approach taken in Chapter 4, and the
Cauchy-Schwarz inequality, as stated in Proposition 9B, follows immediately from (1), since [ cos [ 1.
The picture is not so clear in the euclidean space R
n
when n > 3, although the Cauchy-Schwarz
inequality, as given by Proposition 9B, does allow us to recover a formula of the type (1). But then the
number does not have a geometric interpretation.
We now study the case of a real inner product space. Our rst task is to establish a generalized version
of Proposition 9B.
PROPOSITION 9E. (CAUCHY-SCHWARZ INEQUALITY) Suppose that u and v are vectors in a
real inner product space V . Then
[u, v)[ |u| |v|. (2)
Proof. Our proof here looks like a trick, but it works. Suppose that u and v are vectors in a real inner
product space V . If u = 0, then since 0u = 0, it follows that
u, v) = 0, v) = 0u, v) = 0u, v) = 0,
so that (2) is clearly satised. We may suppose therefore that u ,= 0, so that u, u) , = 0. For every real
number t, it follows from (IP4) that tu +v, tu +v) 0. Hence
0 tu +v, tu +v) = t
2
u, u) + 2tu, v) +v, v).
Since u, u) , = 0, the right hand side is a quadratic polynomial in t. Since the inequality holds for every
real number t, it follows that the quadratic polynomial
t
2
u, u) + 2tu, v) +v, v)
has either repeated roots or no real root, and so the discriminant is non-positive. In other words, we
must have
0 (2u, v))
2
4u, u)v, v) = 4u, v)
2
4|u|
2
|v|
2
.
The inequality (2) follows once again. _
Example 9.3.1. Note that Proposition 9B is a special case of Proposition 9E. In fact, Proposition 9B
represents the Cauchy-Schwarz inequality for nite sums, that for u
1
, . . . , u
n
, v
1
, . . . , v
n
R, we have
i=1
u
i
v
i
_
n
i=1
u
2
i
_
1/2
_
n
i=1
v
2
i
_
1/2
.
Example 9.3.2. Applying Proposition 9E to the inner product in the vector space C[a, b] studied in
Example 9.2.6, we obtain the Cauchy-Schwarz inequality for integrals, that for f, g C[a, b], we have
_
b
a
f(x)g(x) dx
_
_
b
a
f
2
(x) dx
_
1/2
_
_
b
a
g
2
(x) dx
_
1/2
.
Next, we investigate norm and distance. We generalize Propositions 9C and 9D.
PROPOSITION 9F. Suppose that u and v are vectors in a real inner product space, and that c R.
Then
(a) |u| 0;
(b) |u| = 0 if and only if u = 0;
(c) |cu| = [c[ |u|; and
(d) |u +v| |u| +|v|.
PROPOSITION 9G. Suppose that u, v and w are vectors in a real inner product space. Then
(a) d(u, v) 0;
(b) d(u, v) = 0 if and only if u = v;
(c) d(u, v) = d(v, u); and
(d) d(u, v) d(u, w) + d(w, v).
The proofs are left as exercises.
The Cauchy-Schwarz inequality, as given by Proposition 9E, allows us to recover a formula of the type
u, v) = |u| |v| cos . (3)
Although the number does not have a geometric interpretation, we can nevertheless interpret it as the
angle between the two vectors u and v under the inner product , ). Of particular interest is the case
when cos = 0; in other words, when u, v) = 0.
Definition. Suppose that u and v are non-zero vectors in a real inner product space V . Then the
unique real number [0, ] satisfying (3) is called the angle between u and v with respect to the inner
product , ) in V .
Definition. Two vectors u and v in a real inner product space are said to be orthogonal if u, v) = 0.
Definition. Suppose that W is a subspace of a real inner product space V . A vector u V is said to
be orthogonal to W if u, w) = 0 for every w W. The set of all vectors u V which are orthogonal
to W is called the orthogonal complement of W, and denoted by W
; in other words,
W
= u V : u, w) = 0 for every w W.
Example 9.3.3. In R
3
, the non-trivial subspaces are lines and planes through the origin. Under the
euclidean inner product, two non-zero vectors are orthogonal if and only if they are perpendicular. It
follows that if W is a line through the origin, then W
is the plane through the origin and perpendicular

to the line W. Also, if W is a plane through the origin, then W
is the line through the origin and

perpendicular to the plane W.
Example 9.3.4. In R
4
, let us consider the two vectors u = (1, 1, 1, 0) and v = (1, 0, 1, 1). Under the
euclidean inner product, we have
|u| = |v| =

3 and u, v) = 2.
This veries the Cauchy-Schwarz inequality. On the other hand, if [0, ] represents the angle between
u and v with respect to the euclidean inner product, then (3) holds, and we obtain cos = 2/3, so that
= cos
1
(2/3).
Example 9.3.5. In R
4
, it can be shown that
W = (w
1
, w
2
, 0, 0) : w
1
, w
2
R
is a subspace. Consider now the euclidean inner product, and let
A = (0, 0, u
3
, u
4
) : u
3
, u
4
R.
We shall show that A W
and W
A, so that W
= A. To show that A W
, note that for

every (0, 0, u
3
, u
4
) A, we have
(0, 0, u
3
, u
4
), (w
1
, w
2
, 0, 0)) = (0, 0, u
3
, u
4
) (w
1
, w
2
, 0, 0) = 0
for every (w
1
, w
2
, 0, 0) W, so that (0, 0, u
3
, u
4
) W
. To show that W
A, note that for every

(u
1
, u
2
, u
3
, u
4
) W
, we need to have
(u
1
, u
2
, u
3
, u
4
), (w
1
, w
2
, 0, 0)) = (u
1
, u
2
, u
3
, u
4
) (w
1
, w
2
, 0, 0) = u
1
w
1
+ u
2
w
2
= 0
for every (w
1
, w
2
, 0, 0) W. The choice (w
1
, w
2
, 0, 0) = (1, 0, 0, 0) requires us to have u
1
= 0, while the
choice (w
1
, w
2
, 0, 0) = (0, 1, 0, 0) requires us to have u
2
= 0. Hence we must have u
1
= u
2
= 0, so that
(u
1
, u
2
, u
3
, u
4
) A.
Example 9.3.6. Let us consider the inner product on /
2,2
(R) discussed in Example 9.2.4. Let
U =
_
1 0
3 4
_
and V =
_
4 2
0 1
_
.
Then U, V ) = 0, so that the two matrices are orthogonal.
Example 9.3.7. Let us consider the inner product on P
2
discussed in Example 9.2.5. Let
p = p(x) = 1 + 2x + 3x
2
and q = q(x) = 4 + x 2x
2
.
Then p, q) = 0, so that the two polynomials are orthogonal.
Example 9.3.8. Let us consider the inner product on C[a, b] discussed in Example 9.2.6. In particular,
let [a, b] = [0, /2]. Suppose that
f(x) = sin x cos x and g(x) = sin x + cos x.
Then
f, g) =
_
/2
0
f(x)g(x) dx =
_
/2
0
(sin x cos x)(sin x + cos x) dx =
_
/2
0
(sin
2
x cos
2
x) dx = 0,
so that the two functions are orthogonal.
Example 9.3.9. Suppose that A is an m n matrix with real entries. Recall that if we let r
1
, . . . , r
m
denote the vectors formed from the rows of A, then the row space of A is given by
c
1
r
1
+ . . . + c
m
r
m
: c
1
, . . . , c
m
R,
and is a subspace of R
n
. On the other hand, the set
x R
n
: Ax = 0
is called the nullspace of A, and is also a subspace of R
n
. Clearly, if x belongs to the nullspace of A,
then r
i
x = 0 for every i = 1, . . . , m. In fact, the row space of A and the nullspace of A are orthogonal
complements of each other under the euclidean inner product in R
n
. On the other hand, the column
space of A is the row space of A
t
. It follows that the column space of A and the nullspace of A
t
are
orthogonal complements of each other under the euclidean inner product in R
m
.
Example 9.3.10. Suppose that u and v are orthogonal vectors in an inner product space. Then
|u +v|
2
= u +v, u +v) = u, u) + 2u, v) +v, v) = |u|
2
+|v|
2
.
This is a generalized version of Pythagorass theorem.
Remark. We emphasize here that orthogonality depends on the choice of the inner product. Very often,
a real vector space has more than one inner product. Vectors orthogonal with respect to one may not
be orthogonal with respect to another. For example, the vectors u = (1, 1) and v = (1, 1) in R
2
are
orthogonal with respect to the euclidean inner product
u, v) = u
1
v
1
+ u
2
v
2
,
but not orthogonal with respect to the weighted euclidean inner product
u, v) = 2u
1
v
1
+ u
2
v
2
.
9.4. Orthogonal and Orthonormal Bases
Suppose that v
1
, . . . , v
r
are vectors in a real vector space V . We often consider linear combinations of
the type c
1
v
1
+ . . . + c
r
v
r
, where c
1
, . . . , c
r
R. The set
spanv
1
, . . . , v
r
= c
1
v
1
+ . . . + c
r
v
r
: c
1
, . . . , c
r
R
of all such linear combinations is called the span of the vectors v
1
, . . . , v
r
. We also say that the vectors
v
1
, . . . , v
r
span V if spanv
1
, . . . , v
r
= V ; in other words, if every vector in V can be expressed as a
linear combination of the vectors v
1
, . . . , v
r
.
It can be shown that spanv
1
, . . . , v
r
is a subspace of V . Suppose further that W is a subspace of V
and v
1
, . . . , v
r
W. Then spanv
1
, . . . , v
r
W.
On the other hand, the spanning set v
1
, . . . , v
r
may contain more vectors than are necessary to
describe all the vectors in the span. This leads to the idea of linear independence.
1
, . . . , v
r
are vectors in a real vector space V .
(LD) We say that v
1
, . . . , v
r
are linearly dependent if there exist c
1
, . . . , c
r
c
1
v
1
+ . . . + c
r
v
r
= 0.
(LI) We say that v
1
, . . . , v
r
are linearly independent if they are not linearly dependent; in other words,
if the only solution of c
1
v
1
+ . . . + c
r
v
r
= 0 in c
1
, . . . , c
r
R is given by c
1
= . . . = c
r
= 0.
1
, . . . , v
r
are vectors in a real vector space V . We say that v
1
, . . . , v
r
is
a basis for V if the following two conditions are satised:
(B1) We have spanv
1
, . . . , v
r
= V .
(B2) The vectors v
1
, . . . , v
r
Suppose that v
1
, . . . , v
r
is a basis for a real vector space V . Then it can be shown that every element
u V can be expressed uniquely in the form u = c
1
v
1
+ . . . + c
r
v
r
, where c
1
, . . . , c
r
R.
We shall restrict our discussion to nite-dimensional real vector spaces. A real vector space V is said to
be nite-dimensional if it has a basis containing only nitely many elements. Suppose that v
1
, . . . , v
n

is such a basis. Then it can be shown that any collection of more than n vectors in V must be linearly
dependent. It follows that any two bases for V must have the same number of elements. This common
number is known as the dimension of V .
It can be shown that if V is a nite-dimensional real vector space, then any nite set of linearly
independent vectors in V can be expanded, if necessary, to a basis for V . This establishes the existence
of a basis for any nite-dimensional vector space. On the other hand, it can be shown that if the
dimension of V is equal to n, then any set of n linearly independent vectors in V is a basis for V .
Remark. The above is discussed in far greater detail, including examples and proofs, in Chapter 5.
The purpose of this section is to add the extra ingredient of orthogonality to the above discussion.
Definition. Suppose that V is a nite-dimensional real inner product space. A basis v
1
, . . . , v
n
of V
is said to be an orthogonal basis of V if v
i
, v
j
) = 0 for every i, j = 1, . . . , n satisfying i ,= j. It is said
to be an orthonormal basis if it satises the extra condition that |v
i
| = 1 for every i = 1, . . . , n.
Example 9.4.1. The usual basis v
1
, . . . , v
n
in R
n
, where
v
i
= (0, . . . , 0
. .
i1
, 1, 0, . . . , 0
. .
ni
)
for every i = 1, . . . , n, is an orthonormal basis of R
n
with respect to the euclidean inner product.
Example 9.4.2. The vectors v
1
= (1, 1) and v
2
= (1, 1) are linearly independent in R
2
and satisfy
v
1
, v
2
) = v
1
v
2
= 0.
It follows that v
1
, v
2
is an orthogonal basis of R
2
with respect to the euclidean inner product. Can
you nd an orthonormal basis of R
2
by normalizing v
1
and v
2
?
It is theoretically very simple to express any vector as a linear combination of the elements of an
orthogonal or orthonormal basis.
PROPOSITION 9H. Suppose that V is a nite-dimensional real inner product space. If v
1
, . . . , v
n
is an orthogonal basis of V , then for every vector u V , we have

u =
u, v
1
)
|v
1
|
2
v
1
+ . . . +
u, v
n
)
|v
n
|
2
v
n
.
Furthermore, if v
1
, . . . , v
n
is an orthonormal basis of V , then for every vector u V , we have
u = u, v
1
)v
1
+ . . . +u, v
n
)v
n
.
Proof. Since v
1
, . . . , v
n
is a basis of V , there exist unique c
1
, . . . , c
n
R such that
u = c
1
v
1
+ . . . + c
n
v
n
.
For every i = 1, . . . , n, we have
u, v
i
) = c
1
v
1
+ . . . + c
n
v
n
, v
i
) = c
1
v
1
, v
i
) + . . . + c
n
v
n
, v
i
) = c
i
v
i
, v
i
)
since v
j
, v
i
) = 0 if j ,= i. Clearly v
i
,= 0, so that v
i
, v
i
) , = 0, and so
c
i
=
u, v
i
)
v
i
, v
i
)
for every i = 1, . . . , n. The rst assertion follows immediately. For the second assertion, note that
v
i
, v
i
) = 1 for every i = 1, . . . , n. _
Collections of vectors that are orthogonal to each other are very useful in the study of vector spaces,
as illustrated by the following important result.
PROPOSITION 9J. Suppose that the non-zero vectors v
1
, . . . , v
r
in a nite-dimensional real inner
product space are pairwise orthogonal. Then they are linearly independent.
Proof. Suppose that c
1
, . . . , c
r
R and
c
1
v
1
+ . . . + c
r
v
r
= 0.
Then for every i = 1, . . . , r, we have
0 = 0, v
i
) = c
1
v
1
+ . . . + c
r
v
r
, v
i
) = c
1
v
1
, v
i
) + . . . + c
r
v
r
, v
i
) = c
i
v
i
, v
i
)
since v
j
, v
i
) = 0 if j ,= i. Clearly v
i
,= 0, so that v
i
, v
i
) , = 0, and so we must have c
i
= 0 for every
i = 1, . . . , r. It follows that c
1
= . . . = c
r
= 0. _
Of course, the above is based on the assumption that an orthogonal basis exists. Our next task is
to show that this is indeed the case. Our proof is based on a technique which orthogonalizes any given
basis of a vector space.
PROPOSITION 9K. Every nite-dimensional real inner product space has an orthogonal basis, and
hence also an orthonormal basis.
Remark. We shall prove Proposition 9K by using the Gram-Schmidt process. The central idea of this
process, in its simplest form, can be described as follows. Suppose that v
1
and u
2
are two non-zero
vectors in an inner product space, not necessarily orthogonal to each other. We shall attempt to remove
some scalar multiple
1
v
1
from u
2
so that v
2
= u
2
1
v
1
is orthogonal to v
1
; in other words, we wish
to nd a suitable real number
1
such that
v
1
, v
2
) = v
1
, u
2

1
v
1
) = 0.
The idea is illustrated in the picture below.
Collections of vectors that are orthogonal to each other are very useful in the study of vector spaces,
as illustrated by the following important result.
PROPOSITION 9J. Suppose that the non-zero vectors v
1
, . . . , v
r
in a nite-dimensional real inner
product space are pairwise orthogonal. Then they are linearly independent.
Proof. Suppose that c
1
, . . . , c
r
R and
c
1
v
1
+ . . . + c
r
v
r
= 0.
Then for every i = 1, . . . , r, we have
0 = 0, v
i
= c
1
v
1
+ . . . + c
r
v
r
, v
i
= c
1
v
1
, v
i
+ . . . + c
r
v
r
, v
i
= c
i
v
i
, v
i
since v
j
, v
i
= 0 if j = i. Clearly v
i
= 0, so that v
i
, v
i
= 0, and so we must have c
i
= 0 for every
i = 1, . . . , r. It follows that c
1
= . . . = c
r
= 0.
Of course, the above is based on the assumption that an orthogonal basis exists. Our next task is
to show that this is indeed the case. Our proof is based on a technique which orthogonalizes any given
basis of a vector space.
PROPOSITION 9K. Every nite-dimensional real inner product space has an orthogonal basis, and
hence also an orthonormal basis.
Remark. We shall prove Proposition 9K by using the Gram-Schmidt process. The central idea of this
process, in its simplest form, can be described as follows. Suppose that v
1
and u
2
are two non-zero
vectors in an inner product space, not necessarily orthogonal to each other. We shall attempt to remove
some scalar multiple
1
v
1
from u
2
so that v
2
= u
2

1
v
1
is orthogonal to v
1
; in other words, we wish
to nd a suitable real number
1
such that
v
1
, v
2
= v
1
, u
2

1
v
1
= 0.
The idea is illustrated in the picture below.
v
1
u
2
v
2
We clearly need v
1
, u
2

1
v
1
, v
1
= 0, and
1
=
v
1
, u
2
v
1
, v
1
=
v
1
, u
2
v
1
2
is a suitable choice, so that
(4) v
1
and v
2
= u
2

v
1
, u
2
v
1
2
v
1
are now orthogonal. Suppose in general that v
1
, . . . , v
s
and u
s+1
are non-zero vectors in an inner product
space, where v
1
, . . . , v
s
are pairwise orthogonal. We shall attempt to remove some linear combination
We clearly need v
1
, u
2
)
1
v
1
, v
1
) = 0, and
1
=
v
1
, u
2
)
v
1
, v
1
)
=
v
1
, u
2
)
|v
1
|
2
v
1
and v
2
= u
2

v
1
, u
2
)
|v
1
|
2
v
1
(4)
are now orthogonal. Suppose in general that v
1
, . . . , v
s
and u
s+1
are non-zero vectors in an inner product
space, where v
1
, . . . , v
s
are pairwise orthogonal. We shall attempt to remove some linear combination
1
v
1
+. . . +
s
v
s
from u
s+1
so that v
s+1
= u
s+1
1
v
1
. . .
s
v
s
is orthogonal to each of v
1
, . . . , v
s
;
in other words, we wish to nd suitable real numbers
1
, . . . ,
s
such that
v
i
, v
s+1
) = v
i
, u
s+1

1
v
1
. . .
s
v
s
) = 0
for every i = 1, . . . , s. We clearly need
v
i
, u
s+1
)
1
v
i
, v
1
) . . .
s
v
i
, v
s
) = v
i
, u
s+1
)
i
v
i
, v
i
) = 0,
and
i
=
v
i
, u
s+1
)
v
i
, v
i
)
=
v
i
, u
s+1
)
|v
i
|
2
v
1
, . . . , v
s
and v
s+1
= u
s+1

v
1
, u
s+1
)
|v
1
|
2
v
1
. . .
v
s
, u
s+1
)
|v
s
|
2
v
s
(5)
are now pairwise orthogonal.
Example 9.4.3. The vectors
u
1
= (1, 2, 1, 0), u
2
= (3, 3, 3, 0), u
3
= (2, 10, 0, 0), u
4
= (2, 1, 6, 2)
are linearly independent in R
4
, since
det
_
_
_
1 3 2 2
2 3 10 1
1 3 0 6
0 0 0 2
_
_
_ ,= 0.
Hence u
1
, u
2
, u
3
, u
4
is a basis of R
4
. Let us consider R
4
as a real inner product space with the
euclidean inner product, and apply the Gram-Schmidt process to this basis. We have
v
1
= u
1
= (1, 2, 1, 0),
v
2
= u
2

v
1
, u
2
)
|v
1
|
2
v
1
= (3, 3, 3, 0)
(1, 2, 1, 0), (3, 3, 3, 0))
|(1, 2, 1, 0)|
2
(1, 2, 1, 0)
= (3, 3, 3, 0)
12
6
(1, 2, 1, 0) = (3, 3, 3, 0) + (2, 4, 2, 0) = (1, 1, 1, 0),
v
3
= u
3

v
1
, u
3
)
|v
1
|
2
v
1

v
2
, u
3
)
|v
2
|
2
v
2
= (2, 10, 0, 0)
(1, 2, 1, 0), (2, 10, 0, 0))
|(1, 2, 1, 0)|
2
(1, 2, 1, 0)
(1, 1, 1, 0), (2, 10, 0, 0))
|(1, 1, 1, 0)|
2
(1, 1, 1, 0)
= (2, 10, 0, 0) +
18
6
(1, 2, 1, 0)
12
3
(1, 1, 1, 0)
= (2, 10, 0, 0) + (3, 6, 3, 0) + (4, 4, 4, 0) = (1, 0, 1, 0),
v
4
= u
4

v
1
, u
4
)
|v
1
|
2
v
1

v
2
, u
4
)
|v
2
|
2
v
2

v
3
, u
4
)
|v
3
|
2
v
3
= (2, 1, 6, 2)
(1, 2, 1, 0), (2, 1, 6, 2))
|(1, 2, 1, 0)|
2
(1, 2, 1, 0)
(1, 1, 1, 0), (2, 1, 6, 2))

|(1, 1, 1, 0)|
2
(1, 1, 1, 0)
(1, 0, 1, 0), (2, 1, 6, 2))
|(1, 0, 1, 0)|
2
(1, 0, 1, 0)
= (2, 1, 6, 2) +
6
6
(1, 2, 1, 0) +
9
3
(1, 1, 1, 0)
4
2
(1, 0, 1, 0)
= (2, 1, 6, 2) + (1, 2, 1, 0) + (3, 3, 3, 0) + (2, 0, 2, 0) = (0, 0, 0, 2).
It is easy to verify that the four vectors
v
1
= (1, 2, 1, 0), v
2
= (1, 1, 1, 0), v
3
= (1, 0, 1, 0), v
4
= (0, 0, 0, 2)
are pairwise orthogonal, so that v
1
, v
2
, v
3
, v
4
is an orthogonal basis of R
4
. Normalizing each of these
four vectors, we obtain the corresponding orthonormal basis
__
1
6
,
2
6
,
1
6
, 0
_
,
_
1
3
,
1
3
,
1
3
, 0
_
,
_
1
2
, 0,
1
2
, 0
_
, (0, 0, 0, 1)
_
.
Proof of Proposition 9K. Suppose that the vector space V has dimension of n. Then it has a basis
of the type u
1
, . . . , u
n
. We now let v
1
= u
1
, and dene v
2
, . . . , v
n
inductively by (4) and (5) to obtain
a set of pairwise orthogonal vectors v
1
, . . . , v
n
. Clearly none of these n vectors is zero, for if v
s+1
= 0,
then it follows from (5) that v
1
, . . . , v
s
, u
s+1
, and hence u
1
, . . . , u
s
, u
s+1
, are linearly dependent, clearly
a contradiction. It now follows from Proposition 9J that v
1
, . . . , v
n
are linearly independent, and so
must form a basis of V . This proves the rst assertion. To prove the second assertion, observe that each
of the vectors
v
1
|v
1
|
, . . . ,
v
n
|v
n
|
has norm 1. _
Example 9.4.4. Consider the real inner product space P
2
, where for polynomials
p = p(x) = p
0
+ p
1
x + p
2
x
2
and q = q(x) = q
0
+ q
1
x + q
2
x
2
,
the inner product is dened by
p, q) = p
0
q
0
+ p
1
q
1
+ p
2
q
2
.
The polynomials
u
1
= 3 + 4x + 5x
2
, u
2
= 9 + 12x + 5x
2
, u
3
= 1 7x + 25x
2
are linearly independent in P
2
, since
det
_
_
3 9 1
4 12 7
5 5 25
_
_
,= 0.
Hence u
1
, u
2
, u
3
is a basis of P
2
. Let us apply the Gram-Schmidt process to this basis. We have
v
1
= u
1
= 3 + 4x + 5x
2
,
v
2
= u
2

v
1
, u
2
)
|v
1
|
2
v
1
= (9 + 12x + 5x
2
)
3 + 4x + 5x
2
, 9 + 12x + 5x
2
)
|3 + 4x + 5x
2
|
2
(3 + 4x + 5x
2
)
= (9 + 12x + 5x
2
)
100
50
(3 + 4x + 5x
2
) = (9 + 12x + 5x
2
) + (6 8x 10x
2
) = 3 + 4x 5x
2
,
v
3
= u
3

v
1
, u
3
)
|v
1
|
2
v
1

v
2
, u
3
)
|v
2
|
2
v
2
= (1 7x + 25x
2
)
3 + 4x + 5x
2
, 1 7x + 25x
2
)
|3 + 4x + 5x
2
|
2
(3 + 4x + 5x
2
)
3 + 4x 5x
2
, 1 7x + 25x
2
)
|3 + 4x 5x
2
|
2
(3 + 4x 5x
2
)
= (1 7x + 25x
2
)
100
50
(3 + 4x + 5x
2
) +
150
50
(3 + 4x 5x
2
)
= (1 7x + 25x
2
) + (6 8x 10x
2
) + (9 + 12x 15x
2
) = 4 3x + 0x
2
.
It is easy to verify that the three polynomials
v
1
= 3 + 4x + 5x
2
, v
2
= 3 + 4x 5x
2
, v
3
= 4 3x + 0x
2
are pairwise orthogonal, so that v
1
, v
2
, v
3
is an orthogonal basis of P
2
. Normalizing each of these three
polynomials, we obtain the corresponding orthonormal basis
_
3
50
+
4
50
x +
5
50
x
2
,
3
50
+
4
50
x
5
50
x
2
,
4
5

3
5
x + 0x
2
_
.
9.5. Orthogonal Projections
The Gram-Schmidt process is an example of using orthogonal projections. The geometric interpretation
of
v
2
= u
2

v
1
, u
2
)
|v
1
|
2
v
1
is that we have removed from u
2
its orthogonal projection on v
1
; in other words, we have removed from
u
2
the component of u
2
which is parallel to v
1
, so that the remaining part must be perpendicular
to v
1
.
It is natural to consider the following question. Suppose that V is a nite-dimensional real inner
product space, and that W is a subspace of V . Given any vector u V , can we write
u = w+p,
where w W and p W
? If so, is this expression unique?

The following result answers these two questions in the armative.
PROPOSITION 9L. Suppose that V is a nite-dimensional real inner product space, and that W is
a subspace of V . Suppose further that v
1
, . . . , v
r
is an orthogonal basis of W. Then for any vector
u V ,
w =
u, v
1
)
|v
1
|
2
v
1
+ . . . +
u, v
r
)
|v
r
|
2
v
r
is the unique vector satisfying w W and u w W
.
Proof. Note that the orthogonal basis v
1
, . . . , v
r
of W can be extended to a basis
v
1
, . . . , v
r
, u
r+1
, . . . , u
n
of V which can then be orthogonalized by the Gram-Schmidt process to an orthogonal basis

v
1
, . . . , v
r
, v
r+1
, . . . , v
n
of V . Clearly v
r+1
, . . . , v
n
W
. Suppose now that u V . Then u can be expressed as a linear

combination of v
1
, . . . , v
n
in a unique way. By Proposition 9H, this unique expression is given by
u =
u, v
1
)
|v
1
|
2
v
1
+ . . . +
u, v
n
)
|v
n
|
2
v
n
= w+
u, v
r+1
)
|v
r+1
|
2
v
r+1
+ . . . +
u, v
n
)
|v
n
|
2
v
n
.
Clearly u w W
. _
Definition. The vector w in Proposition 9L is called the orthogonal projection of u on the subspace
W, and denoted by proj
W
u. The vector p = u w is called the component of u orthogonal to the
subspace W.
Example 9.5.1. Recall Example 9.4.3. Consider the subspace W = spanu
1
, u
2
. Note that v
1
and v
2
can each be expressed as a linear combination of u
1
and u
2
, and that u
1
and u
2
can each be expressed
as a linear combination of v
1
and v
2
. It follows that v
1
, v
2
is an orthogonal basis of W. This basis
can be extended to an orthogonal basis v
1
, v
2
, v
3
, v
4
of R
4
. It follows that W
= spanv
3
, v
4
.
1. In each of the following, determine whether , ) is an inner product in the given vector space by
checking whether conditions (IP1)(IP4) hold:
a) R
2
; u, v) = 2u
1
v
1
u
2
v
2
b) R
2
; u, v) = u
1
v
1
+ 2u
1
v
2
+ u
2
v
2
c) R
3
; u, v) = u
2
1
v
2
1
+ u
2
2
v
2
2
+ u
2
3
v
2
3
2. Consider the vector space R
2
. Suppose that , ) is the inner product generated by the matrix
A =
_
2 1
2 3
_
.
Evaluate each of the following:
a) (1, 2), (2, 3)) b) |(1, 2)| c) d((1, 2), (2, 3))
3. Suppose that the vectors u, v, w in an inner product space V satisfy u, v) = 2, v, w) = 3,
u, w) = 5, |u| = 1, |v| = 2 and |w| = 7. Evaluate each of the following:
a) u +v, v +w) b) 2v w, 3u + 2w) c) u v 2w, 4u +v)
d) |u +v| e) |2wv| f) |u 2v + 4w|
4. Suppose that u and v are two non-zero vectors in the real vector space R
2
. Follow the steps below
to establish the existence of a real inner product , ) on R
2
such that u, v) , = 0.
a) Explain, in terms of the euclidean inner product, why we may restrict our discussion to vectors
of the form u = (x, y) and v = (ky, kx), where x, y, k R satisfy (x, y) ,= (0, 0) and k ,= 0.
b) Explain next why we may further restrict our discussion to vectors of the form u = (x, y) and
v = (y, x), where x, y R satisfy (x, y) ,= (0, 0).
c) Let u = (x, y) and v = (y, x), where x, y R and (x, y) ,= (0, 0). Consider the inner product
on R
2
generated by the real matrix
A =
_
a b
b c
_
,
where ac ,= b
2
. Show that u, v) = (a
2
c
2
)xy + b(a + c)(y
2
x
2
).
d) Suppose that x
2
= y
2
. Show that the choice a > c > b = 0 will imply u, v) , = 0.
e) Suppose that x
2
,= y
2
. Show that the choice c = a > b > 0 will imply u, v) , = 0.
5. Consider the real vector space R
2
.
a) Find two distinct non-zero vectors u, v R
2
such that u, v) = 0 for every weighted euclidean
inner product on R
2
.
b) Find two distinct non-zero vectors u, v R
2
such that u, v) , = 0 for any inner product on R
2
.
6. For each of the following inner product spaces and subspaces W, nd W
:
a) R
2
(euclidean inner product); W = (x, y) R
2
: x + 2y = 0.
b) /
2,2
(R) (inner product discussed in Section 9.2);
W =
__
ta 0
0 tb
_
: t R
_
,
where a and b are non-zero.
7. Suppose that v
1
, . . . , v
n
is a basis for a real inner product space V . Does there exist v V which
is orthogonal to every vector in this basis?
8. Use the Cauchy-Schwarz inequality to prove that (a cos + b sin )
2
a
2
+ b
2
for every a, b, R.
[Hint: First nd a suitable real inner product space.]
9. Prove Proposition 9F.
10. Show that u, v) =
1
4
|u +v|
2
1
4
|u v|
2
for any u and v in a real inner product space.
11. Suppose that v
1
, . . . , v
n
is an orthonormal basis of a real inner product space V . Show that for
every u V , we have |u|
2
= u, v
1
)
2
+ . . . +u, v
n
)
2
.
12. Show that if v
1
, . . . , v
n
are pairwise orthogonal in a real inner product space V , then
|v
1
+ . . . +v
n
|
2
= |v
1
|
2
+ . . . +|v
n
|
2
.
13. Show that v
1
= (2, 2, 1), v
2
= (2, 1, 2) and v
3
= (1, 2, 2) form an orthogonal basis of R
3
under
the euclidean inner product. Then write u = (1, 0, 2) as a linear combination of v
1
, v
2
, v
3
.
14. Let u
1
= (2, 2, 1), u
2
= (4, 1, 1) and u
3
= (1, 10, 5). Show that u
1
, u
2
, u
3
is a basis of R
3
, and
apply the Gram-Schmidt process to this basis to nd an orthonormal basis of R
3
.
15. Show that the vectors u
1
= (0, 2, 1, 0), u
2
= (1, 1, 0, 0), u
3
= (1, 2, 0, 1) and u
4
= (1, 0, 0, 1) form
a basis of R
4
. Then apply the Gram-Schmidt process to nd an orthogonal basis of R
4
. Find also
the corresponding orthonormal basis of R
4
.
16. Consider the vector space P
2
with the inner product
p, q) =
_
1
0
p(x)q(x) dx.
Apply the Gram-Schmidt process to the basis 1, x, x
2
to nd an orthogonal basis of P
2
. Find also
the corresponding orthonormal basis of P
2
.
17. Suppose that we apply the Gram-Schmidt process to non-zero vectors u
1
, . . . , u
n
without rst
checking that these form a basis of the inner product space, and obtain v
s
= 0 for some s = 1, . . . , n.
What conclusion can we draw concerning the collection u
1
, . . . , u
n
?
LINEAR ALGEBRA
W W L CHEN
c _ W W L Chen, 1997, 2008.
Chapter 10
ORTHOGONAL MATRICES
10.1. Introduction
Definition. A square matrix A with real entries and satisfying the condition A
1
= A
t
is called an
orthogonal matrix.
Example 10.1.1. Consider the euclidean space R
2
with the euclidean inner product. The vectors
u
1
= (1, 0) and u
2
= (0, 1) form an orthonormal basis B = u
1
, u
2
. Let us now rotate u
1
and u
2
anticlockwise by an angle to obtain v
1
= (cos , sin ) and v
2
= (sin , cos ). Then ( = v
1
, v
2
is
also an orthonormal basis.
LINEAR ALGEBRA
W W L CHEN
c W W L Chen, 1997, 2006.
Chapter 10
ORTHOGONAL MATRICES
10.1. Introduction
Definition. A square matrix A with real entries and satisfying the condition A
1
= A
t
is called an
orthogonal matrix.
Example 10.1.1. Consider the euclidean space R
2
with the euclidean inner product. The vectors
u
1
= (1, 0) and u
2
= (0, 1) form an orthonormal basis B = {u
1
, u
2
}. Let us now rotate u
1
and u
2
anticlockwise by an angle to obtain v
1
= (cos , sin ) and v
2
= (sin , cos ). Then C = {v
1
, v
2
} is
also an orthonormal basis.
u
1
u
2
v
1
v
2
Chapter 10 : Orthogonal Matrices page 1 of 11
The transition matrix from the basis ( to the basis B is given by
P = ( [v
1
]
B
[v
2
]
B
) =
cos sin
sin cos
.
Clearly
P
1
= P
t
=
cos sin
sin cos
.
In fact, our example is a special case of the following general result.
PROPOSITION 10A. Suppose that B = u
1
, . . . , u
n
and ( = v
1
, . . . , v
n
are two orthonormal
bases of a real inner product space V . Then the transition matrix P from the basis ( to the basis B is
an orthogonal matrix.
A =
1/3 2/3 2/3

2/3 1/3 2/3
2/3 2/3 1/3
is orthogonal, since
A
t
A =
1/3 2/3 2/3

2/3 1/3 2/3
2/3 2/3 1/3
1/3 2/3 2/3

2/3 1/3 2/3
2/3 2/3 1/3
1 0 0
0 1 0
0 0 1
.
Note also that the row vectors of A, namely (1/3, 2/3, 2/3), (2/3, 1/3, 2/3) and (2/3, 2/3, 1/3) are
orthonormal. So are the column vectors of A.
In fact, our last observation is not a coincidence.
PROPOSITION 10B. Suppose that A is an n n matrix with real entries. Then
(a) A is orthogonal if and only if the row vectors of A form an orthonormal basis of R
n
under the
euclidean inner product; and
(b) A is orthogonal if and only if the column vectors of A form an orthonormal basis of R
n
under the
euclidean inner product.
Proof. We shall only prove (a), since the proof of (b) is almost identical. Let r
1
, . . . , r
n
denote the row
vectors of A. Then
AA
t
=
r
1
r
1
. . . r
1
r
n
.
.
.
.
.
.
r
n
r
1
. . . r
n
r
n
.
It follows that AA
t
= I if and only if for every i, j = 1, . . . , n, we have
r
i
r
j
=
1 if i = j,
0 if i ,= j,
if and only if r
1
, . . . , r
n
are orthonormal. _
PROPOSITION 10C. Suppose that A is an n n matrix with real entries. Suppose further that the
inner product in R
n
is the euclidean inner product. Then the following are equivalent:
(a) A is orthogonal.
(b) For every x R
n
, we have |Ax| = |x|.
(c) For every u, v R
n
, we have Au Av = u v.
Proof. ((a)(b)) Suppose that A is orthogonal, so that A
t
A = I. It follows that for every x R
n
, we
have
|Ax|
2
= Ax Ax = x
t
A
t
Ax = x
t
Ix = x
t
x = x x = |x|
2
.
((b)(c)) Suppose that |Ax| = |x| for every x R
n
. Then for every u, v R
n
, we have
Au Av =
1
4
|Au+Av|
2
1
4
|AuAv|
2
=
1
4
|A(u+v)|
2
1
4
|A(uv)|
2
=
1
4
|u+v|
2
1
4
|uv|
2
= u v.
((c)(a)) Suppose that Au Av = u v for every u, v R
n
. Then
Iu v = u v = Au Av = v
t
A
t
Au = A
t
Au v,
so that
(A
t
AI)u v = 0.
In particular, this holds when v = (A
t
AI)u, so that
(A
t
AI)u (A
t
AI)u = 0,
whence
(A
t
AI)u = 0, (1)
in view of Proposition 9A(d). But then (1) is a system of n homogeneous linear equations in n unknowns
satised by every u R
n
. Hence the coecient matrix A
t
AI must be the zero matrix, and so A
t
A = I.
_
Proof of Proposition 10A. For every u V , we can write
u =
1
u
1
+ . . . +
n
u
n
=
1
v
1
+ . . . +
n
v
n
, where
1
, . . . ,
n
,
1
, . . . ,
n
R,
and where B = u
1
, . . . , u
n
and ( = v
1
, . . . , v
n
are two orthonormal bases of V . Then
|u|
2
= u, u) =
1
u
1
+ . . . +
n
u
n
,
1
u
1
+ . . . +
n
u
n
) =
n
i=1
n
j=1
j
u
i
, u
j
) =
n
i=1
2
i
= (
1
, . . . ,
n
) (
1
, . . . ,
n
).
Similarly,
|u|
2
= u, u) =
1
v
1
+ . . . +
n
v
n
,
1
v
1
+ . . . +
n
v
n
) =
n
i=1
n
j=1
j
v
i
, v
j
) =
n
i=1
2
i
= (
1
, . . . ,
n
) (
1
, . . . ,
n
).
It follows that in R
n
with the euclidean norm, we have |[u]
B
| = |[u]
C
|, and so |P[u]
C
| = |[u]
C
| for
every u V . Hence |Px| = |x| holds for every x R
n
. It now follows from Proposition 10C that P
is orthogonal. _
10.2. Eigenvalues and Eigenvectors
In this section, we give a brief review on eigenvalues and eigenvectors rst discussed in Chapter 7.
Suppose that
A =
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
is an n n matrix with real entries. Suppose further that there exist a number R and a non-zero
vector v R
n
such that Av = v. Then we say that is an eigenvalue of the matrix A, and that v is
an eigenvector corresponding to the eigenvalue . In this case, we have Av = v = Iv, where I is the
n n identity matrix, so that (AI)v = 0. Since v R
n
is non-zero, it follows that we must have
det(AI) = 0. (2)
det
a
11
a
12
. . . a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
a
n1
a
n2
. . . a
nn

= 0.
Note that (2) is a polynomial equation. The polynomial det(AI) is called the characteristic polynomial
of the matrix A. Solving this equation (2) gives the eigenvalues of the matrix A.
On the other hand, for any eigenvalue of the matrix A, the set
v R
n
: (AI)v = 0 (3)
is the nullspace of the matrix AI, and forms a subspace of R
n
. This space (3) is called the eigenspace
corresponding to the eigenvalue .
Suppose now that A has eigenvalues
1
, . . . ,
n
R, not necessarily distinct, with corresponding
eigenvectors v
1
, . . . , v
n
R
n
, and that v
1
, . . . , v
n
are linearly independent. Then it can be shown that
P
1
AP = D,
where
P = ( v
1
. . . v
n
) and D =
1
.
.
.
.
In fact, we say that A is diagonalizable if there exists an invertible matrix P with real entries such
that P
1
AP is a diagonal matrix with real entries. It follows that A is diagonalizable if its eigenvectors
form a basis of R
n
. In the opposite direction, one can show that if A is diagonalizable, then it has n
linearly independent eigenvectors in R
n
. It therefore follows that the question of diagonalizing a matrix
A with real entries is reduced to one of linear independence of its eigenvectors.
We now summarize our discussion so far.
DIAGONALIZATION PROCESS. Suppose that A is an n n matrix with real entries.
(1) Determine whether the n roots of the characteristic polynomial det(AI) are real.
(2) If not, then A is not diagonalizable. If so, then nd the eigenvectors corresponding to these eigen-
values. Determine whether we can nd n linearly independent eigenvectors.
(3) If not, then A is not diagonalizable. If so, then write
P = ( v
1
. . . v
n
) and D =
1
.
.
.
,
where
1
, . . . ,
n
R are the eigenvalues of A and where v
1
, . . . , v
n
R
n
corresponding eigenvectors. Then P
1
AP = D.
In particular, it can be shown that if A has distinct eigenvalues
1
, . . . ,
n
R, with corresponding
eigenvectors v
1
, . . . , v
n
R
n
, then v
1
, . . . , v
n
are linearly independent. It follows that all such matrices
A are diagonalizable.
10.3. Orthonormal Diagonalization
We now consider the euclidean space R
n
an as inner product space with the euclidean inner product.
Given any n n matrix A with real entries, we wish to nd out whether there exists an orthonormal
basis of R
n
consisting of eigenvectors of A.
Recall that in the Diagonalization process discussed in the last section, the columns of the matrix P
are eigenvectors of A, and these vectors form a basis of R
n
. It follows from Proposition 10B that this
basis is orthonormal if and only if the matrix P is orthogonal.
Definition. An nn matrix A with real entries is said to be orthogonally diagonalizable if there exists
an orthogonal matrix P with real entries such that P
1
AP = P
t
AP is a diagonal matrix with real
entries.
First of all, we would like to determine which matrices are orthogonally diagonalizable. For those that
are, we then need to discuss how we may nd an orthogonal matrix P to carry out the diagonalization.
To study the rst question, we have the following result which gives a restriction on those matrices
that are orthogonally diagonalizable.
PROPOSITION 10D. Suppose that A is a orthogonally diagonalizable matrix with real entries. Then
A is symmetric.
Proof. Suppose that A is orthogonally diagonalizable. Then there exists an orthogonal matrix P and
a diagonal matrix D, both with real entries and such that P
t
AP = D. Since PP
t
= P
t
P = I and
D
t
= D, we have
A = PDP
t
= PD
t
P
t
,
so that
A
t
= (PD
t
P
t
)
t
= (P
t
)
t
(D
t
)
t
P
t
= PDP
t
= A,
whence A is symmetric. _
Our rst question is in fact answered by the following result which we state without proof.
PROPOSITION 10E. Suppose that A is an n n matrix with real entries. Then it is orthogonally
diagonalizable if and only if it is symmetric.
The remainder of this section is devoted to nding a way to orthogonally diagonalize a symmetric
matrix with real entries. We begin by stating without proof the following result. The proof requires
results from the theory of complex vector spaces.
PROPOSITION 10F. Suppose that A is a symmetric matrix with real entries. Then all the eigenvalues
of A are real.
Our idea here is to follow the Diagonalization process discussed in the last section, knowing that since
A is diagonalizable, we shall nd a basis of R
n
consisting of eigenvectors of A. We may then wish to
orthogonalize this basis by the Gram-Schmidt process. This last step is considerably simplied in view
of the following result.
PROPOSITION 10G. Suppose that u
1
and u
2
are eigenvectors of a symmetric matrix A with real
entries, corresponding to distinct eigenvalues
1
and
2
respectively. Then u
1
u
2
= 0. In other words,
eigenvectors of a symmetric real matrix corresponding to distinct eigenvalues are orthogonal.
Proof. Note that if we write u
1
and u
2
as column matrices, then since A is symmetric, we have
Au
1
u
2
= u
t
2
Au
1
= u
t
2
A
t
u
1
= (Au
2
)
t
u
1
= u
1
Au
2
.
It follows that
1
u
1
u
2
= Au
1
u
2
= u
1
Au
2
= u
1

2
u
2
,
so that (
1

2
)(u
1
u
2
) = 0. Since
1
,=
2
, we must have u
1
u
2
= 0. _
We can now follow the procedure below.
ORTHOGONAL DIAGONALIZATION PROCESS. Suppose that A is a symmetric nn matrix
with real entries.
(1) Determine the n real roots
1
, . . . ,
n
of the characteristic polynomial det(AI), and nd n linearly
independent eigenvectors u
1
, . . . , u
n
of A corresponding to these eigenvalues as in the Diagonaliza-
tion process.
(2) Apply the Gram-Schmidt orthogonalization process to the eigenvectors u
1
, . . . , u
n
to obtain orthogo-
nal eigenvectors v
1
, . . . , v
n
of A, noting that eigenvectors corresponding to distinct eigenvalues are
already orthogonal.
(3) Normalize the orthogonal eigenvectors v
1
, . . . , v
n
to obtain orthonormal eigenvectors w
1
, . . . , w
n
of
A. These form an orthonormal basis of R
n
. Furthermore, write
P = ( w
1
. . . w
n
) and D =
1
.
.
.
,
where
1
, . . . ,
n
R are the eigenvalues of A and where w
1
, . . . , w
n
R
n
orthogonalized and normalized eigenvectors. Then P
t
AP = D.
Remark. Note that if we apply the Gram-Schmidt orthogonalization process to eigenvectors corre-
sponding to the same eigenvalue, then the new vectors that result from this process are also eigenvectors
corresponding to this eigenvalue. Why?
A =
2 2 1
2 5 2
1 2 2
.
det
2 2 1
2 5 2
1 2 2
= 0;
in other words, (7)(1)
2
1
= 7 and (double root)
2
=
3
= 1.
An eigenvector corresponding to
1
= 7 is a solution of the system
(A7I)u =
5 2 1
2 2 2
1 2 5
u = 0, with root u
1
=
1
2
1
.
Eigenvectors corresponding to
2
=
3
= 1 are solutions of the system
(AI)u =
1 2 1
2 4 2
1 2 1
u = 0, with roots u
2
=
1
0
1
and u
3
=
2
1
0
which are linearly independent. Next, we apply the Gram-Schmidt orthogonalization process to u
2
and
u
3
, and obtain
v
2
=
1
0
1
and v
3
=
1
1
1
which are now orthogonal to each other. Note that we do not have to do anything to u
1
at this stage,
in view of Proposition 10G. We now conclude that
v
1
=
1
2
1
, v
2
=
1
0
1
, v
3
=
1
1
1
form an orthogonal basis of R

3
. Normalizing each of these, we obtain respectively
w
1
=
1/
6
2/
6
1/
, w
2
=
1/
2
0
1/
, w
3
=
1/
3
1/
3
1/
.
We now take
P = ( w
1
w
2
w
3
) =
1/
6 1/
2 1/
3
2/
6 0 1/
3
1/
6 1/
2 1/
.
Then
P
1
= P
t
=
1/
6 2/
6 1/
6
1/
2 0 1/
2
1/
3 1/
3 1/
and P
t
AP =
7 0 0
0 1 0
0 0 1
.
A =
1 6 12
0 13 30
0 9 20
.
det
1 6 12
0 13 30
0 9 20
= 0;
in other words, ( + 1)( 2)( 5) = 0. The eigenvalues are therefore
1
= 1,
2
= 2 and
3
= 5.
An eigenvector corresponding
1
(A + I)u =
0 6 12
0 12 30
0 9 21
u = 0, with root u
1
=
1
0
0
.
2
(A2I)u =
3 6 12
0 15 30
0 9 18
u = 0, with root u
2
=
0
2
1
.
3
(A5I)u =
6 6 12
0 18 30
0 9 15
u = 0, with root u
3
=
1
5
3
.
Note that while u
1
, u
2
, u
3
correspond to distinct eigenvalues of A, they are not orthogonal. The matrix
A is not symmetric, and so Proposition 10G does not apply in this case.
A =
5 2 0
2 6 2
0 2 7
.
det
5 2 0
2 6 2
0 2 7
= 0;
in other words, ( 3)( 6)( 9) = 0. The eigenvalues are therefore
1
= 3,
2
= 6 and
3
= 9. An
eigenvector corresponding
1
(A3I)u =
2 2 0
2 3 2
0 2 4
u = 0, with root u
1
=
2
2
1
.
2
(A6I)u =
1 2 0
2 0 2
0 2 1
u = 0, with root u
2
=
2
1
2
.
3
(A9I)u =
4 2 0
2 3 2
0 2 2
u = 0, with root u
3
=
1
2
2
.
Note now that the eigenvalues are distinct, so it follows from Proposition 10G that u
1
, u
2
, u
3
are or-
thogonal, so we do not have to apply Step (2) of the Orthogonal diagonalization process. Normalizing
each of these vectors, we obtain respectively
w
1
=
2/3
2/3
1/3
, w
2
=
2/3
1/3
2/3
, w
3
=
1/3
2/3
2/3
.
We now take
P = ( w
1
w
2
w
3
) =
2/3 2/3 1/3

2/3 1/3 2/3
1/3 2/3 2/3
.
Then
P
1
= P
t
=
2/3 2/3 1/3

2/3 1/3 2/3
1/3 2/3 2/3
and P
t
AP =
3 0 0
0 6 0
0 0 9
.
1. Prove Proposition 10B(b).
2. Let
A =
a + b b a
a b b + a
,
where a, b R. Determine when A is orthogonal.
3. Suppose that A is an orthogonal matrix with real entries. Prove that
a) A
1
is an orthogonal matrix; and
b) det A = 1.
4. Suppose that A and B are orthogonal matrices with real entries. Prove that AB is orthogonal.
5. Verify that for every a R, the matrix
A =
1
1 + 2a
2
1 2a 2a
2
2a 1 2a
2
2a
2a
2
2a 1
is orthogonal.
6. Suppose that is an eigenvalue of an orthogonal matrix A with real entries. Prove that 1/ is also
an eigenvalue of A.
7. Suppose that
A =
a b
c d
is an orthogonal matrix with real entries. Explain why a

2
+ b
2
= c
2
+ d
2
= 1 and ac + bd = 0, and
quote clearly any result that you use. Deduce that A has one of the two possible forms
A =
cos sin
sin cos
or A =
cos sin
sin cos
,
where [0, 2).
8. Consider the matrix
A =
6

3
6 2

2
3

2 3
.
a) Find the characteristic polynomial of A and show that A has eigenvalues 4 (twice) and 2.
b) Find an eigenvector of A corresponding to the eigenvalue 2.
c) Find two orthogonal eigenvectors of A corresponding to the eigenvalue 4.
d) Find an orthonormal basis of R
3
consisting of eigenvectors of A.
e) Using the orthonormal basis in part (d), nd a matrix P such that P
t
AP is a diagonal matrix.
9. Apply the Orthogonal diagonalization process to each of the following matrices:
a) A =
5 0 6
0 11 6
6 6 2
b) A =
0 2 0
2 0 1
0 1 0
c) A =
1 4 2
4 1 2
2 2 2
d) A =
2 0 36
0 3 0
36 0 23
e) A =
1 1 0 0
1 1 0 0
0 0 0 0
0 0 0 0
f) A =
7 24 0 0
24 7 0 0
0 0 7 24
0 0 24 7
10. Suppose that B is an m n matrix with real entries. Prove that the matrix A = B
t
B has an
orthonormal set of n eigenvectors.
LINEAR ALGEBRA
W W L CHEN
c _ W W L Chen, 1997, 2008.
Chapter 11
APPLICATIONS OF
REAL INNER PRODUCT SPACES
11.1. Least Squares Approximation
Given a continuous function f : [a, b] R, we wish to approximate f by a polynomial g : [a, b] R of
degree at most k, such that the error
_
b
a
[f(x) g(x)[
2
dx
is minimized. The purpose of this section is to study this problem using the theory of real inner product
spaces. Our argument is underpinned by the following simple result in the theory.
PROPOSITION 11A. Suppose that V is a real inner product space, and that W is a nite-dimensional
subspace of V . Given any u V , the inequality
|u proj
W
u| |u w|
holds for every w W.
In other words, the distance from u to any w W is minimized by the choice w = proj
W
u, the
orthogonal projection of u on the subspace W. Alternatively, proj
W
u can be thought of as the vector
in W closest to u.
Proof of Proposition 11A. Note that
u proj
W
u W
and proj
W
u w W.
Chapter 11 : Applications of Real Inner Product Spaces page 1 of 12
It follows from Pythagorass theorem that
|u w|
2
= |(u proj
W
u) + (proj
W
u w)|
2
= |u proj
W
u|
2
+|proj
W
u w|
2
,
so that
|u w|
2
|u proj
W
u|
2
= |proj
W
u w|
2
0.
The result follows immediately. _
Let V denote the vector space C[a, b] of all continuous real valued functions on the closed interval
[a, b], with inner product
f, g) =
_
b
a
f(x)g(x) dx.
Then
_
b
a
[f(x) g(x)[
2
dx = f g, f g) = |f g|
2
.
It follows that the least squares approximation problem is reduced to one of nding a suitable polynomial
g to minimize the norm |f g|.
Now let W = P
k
[a, b] be the collection of all polynomials g : [a, b] R with real coecients and of
degree at most k. Note that W is essentially P
k
, although the variable is restricted to the closed interval
[a, b]. It is easy to show that W is a subspace of V . In view of Proposition 11A, we conclude that
g = proj
W
f
gives the best least squares approximation among polynomials in W = P
k
[a, b]. This subspace is of
dimension k + 1. Suppose that v
0
, v
1
, . . . , v
k
is an orthogonal basis of W = P
k
[a, b]. Then by Propo-
sition 9L, we have
g =
f, v
0
)
|v
0
|
2
v
0
+
f, v
1
)
|v
1
|
2
v
1
+. . . +
f, v
k
)
|v
k
|
2
v
k
.
Example 11.1.1. Consider the function f(x) = x
2
in the interval [0, 2]. Suppose that we wish to nd a
least squares approximation by a polynomial of degree at most 1. In this case, we can take V = C[0, 2],
with inner product
f, g) =
_
2
0
f(x)g(x) dx,
and W = P
1
[0, 2], with basis 1, x. We now apply the Gram-Schmidt orthogonalization process to this
basis to obtain an orthogonal basis 1, x 1 of W, and take
g =
x
2
, 1)
|1|
2
1 +
x
2
, x 1)
|x 1|
2
(x 1).
Now
x
2
, 1) =
_
2
0
x
2
dx =
8
3
and |1|
2
= 1, 1) =
_
2
0
dx = 2,
while
x
2
, x 1) =
_
2
0
x
2
(x 1) dx =
4
3
and |x 1|
2
= x 1, x 1) =
_
2
0
(x 1)
2
dx =
2
3
.
It follows that
g =
4
3
+ 2(x 1) = 2x
2
3
.
Example 11.1.2. Consider the function f(x) = e
x
in the interval [0, 1]. Suppose that we wish to nd a
least squares approximation by a polynomial of degree at most 1. In this case, we can take V = C[0, 1],
with inner product
f, g) =
_
1
0
f(x)g(x) dx,
and W = P
1
[0, 1], with basis 1, x. We now apply the Gram-Schmidt orthogonalization process to this
basis to obtain an orthogonal basis 1, x 1/2 of W, and take
g =
e
x
, 1)
|1|
2
1 +
e
x
, x 1/2)
|x 1/2|
2
_
x
1
2
_
.
Now
e
x
, 1) =
_
1
0
e
x
dx = e 1 and e
x
, x) =
_
1
0
e
x
xdx = 1,
so that
_
e
x
, x
1
2
_
= e
x
, x)
1
2
e
x
, 1) =
3
2

e
2
.
Also
|1|
2
= 1, 1) =
_
1
0
dx = 1 and
_
_
_
_
x
1
2
_
_
_
_
2
=
_
x
1
2
, x
1
2
_
=
_
1
0
_
x
1
2
_
2
dx =
1
12
.
It follows that
g = (e 1) + (18 6e)
_
x
1
2
_
= (18 6e)x + (4e 10).
Remark. From the proof of Proposition 11A, it is clear that |uw| is minimized by the unique choice
w = proj
W
u. It follows that the least squares approximation problem posed here has a unique solution.
11.2. Quadratic Forms
A real quadratic form in n variables x
1
, . . . , x
n
is an expression of the form
n
i=1
n
j=1
ij
c
ij
x
i
x
j
, (1)
where c
ij
R for every i, j = 1, . . . , n satisfying i j.
Example 11.2.1. The expression 5x
2
1
+ 6x
1
x
2
+ 7x
2
2
is a quadratic form in two variables x
1
and x
2
. It
can be written in the form
5x
2
1
+ 6x
1
x
2
+ 7x
2
2
= ( x
1
x
2
)
_
5 3
3 7
__
x
1
x
2
_
.
Example 11.2.2. The expression 4x
2
1
+5x
2
2
+3x
2
3
+2x
1
x
2
+4x
1
x
3
+6x
2
x
3
is a quadratic form in three
variables x
1
, x
2
and x
3
. It can be written in the form
4x
2
1
+ 5x
2
2
+ 3x
2
3
+ 2x
1
x
2
+ 4x
1
x
3
+ 6x
2
x
3
= ( x
1
x
2
x
3
)
_
_
4 1 2
1 5 3
2 3 3
_
_
_
_
x
1
x
2
x
3
_
_
.
Note that in both examples, the quadratic form can be described in terms of a real symmetric matrix.
In fact, this is always possible. To see this, note that given any quadratic form (1), we can write, for
every i, j = 1, . . . , n,
a
ij
=
_
_
c
ij
if i = j,
1
2
c
ij
if i < j,
1
2
c
ji
if i > j.
(2)
Then
n
i=1
n
j=1
ij
c
ij
x
i
x
j
=
n
i=1
n
j=1
a
ij
x
i
x
j
= ( x
1
. . . x
n
)
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
_
_
_
_
x
1
.
.
.
x
n
_
_
.
The matrix
A =
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
_
_
is clearly symmetric, in view of (2).
We are interested in the case when x
1
, . . . , x
n
take real values. In this case, we can write
x =
_
_
x
1
.
.
.
x
n
_
_
.
It follows that a quadratic form can be written as
x
t
Ax,
where A is an n n real symmetric matrix and x takes values in R
n
.
Many problems in mathematics can be studied using quadratic forms. Here we shall restrict our
attention to two fundamental problems which are in fact related. The rst is the question of what
conditions the matrix A must satisfy in order that the inequality
x
t
Ax > 0
holds for every non-zero x R
n
. The second is the question of whether it is possible to have a change of
variables of the type x = Py, where P is an invertible matrix, such that the quadratic form x
t
Ax can
be represented in the alternative form y
t
Dy, where D is a diagonal matrix with real entries.
Definition. A quadratic form x
t
Ax is said to be positive denite if x
t
Ax > 0 for every non-zero x R
n
.
In this case, we say that the symmetric matrix A is a positive denite matrix.
To answer our rst question, we shall prove the following result.
PROPOSITION 11B. A quadratic form x
t
Ax is positive denite if and only if all the eigenvalues of
the symmetric matrix A are positive.
Our strategy here is to prove Proposition 11B by rst studying our second question. Since the matrix
A is real and symmetric, it follows from Proposition 10E that it is orthogonally diagonalizable. In other
words, there exists an orthogonal matrix P and a diagonal matrix D such that P
t
AP = D, and so
A = PDP
t
. It follows that
x
t
Ax = x
t
PDP
t
x,
and so, writing
y = P
t
x,
we have
x
t
Ax = y
t
Dy.
Also, since P is an orthogonal matrix, we also have x = Py. This answers our second question.
Furthermore, in view of the Orthogonal diagonalization process, the diagonal entries in the matrix D
can be taken to be the eigenvalues of A, so that
D =
_
_
1
.
.
.
n
_
_
,
where
1
, . . . ,
n
R are the eigenvalues of A. Writing
y =
_
_
_
y
1
.
.
.
y
n
_
_
_,
we have
x
t
Ax = y
t
Dy =
1
y
2
1
+. . . +
n
y
2
n
. (3)
Note now that x = 0 if and only if y = 0, since P is an invertible matrix. Proposition 11B now follows
immediately from (3).
Example 11.2.3. Consider the quadratic form 2x
2
1
+ 5x
2
2
+ 2x
2
3
+ 4x
1
x
2
+ 2x
1
x
3
+ 4x
2
x
3
. This can be
written in the form x
t
Ax, where
A =
_
_
2 2 1
2 5 2
1 2 2
_
_
and x =
_
_
x
1
x
2
x
3
_
_
.
The matrix A has eigenvalues
1
= 7 and (double root)
2
=
3
= 1; see Example 10.3.1. Furthermore,
we have P
t
AP = D, where
P =
_
_
1/
6 1/
2 1/
3
2/
6 0 1/
3
1/
6 1/
2 1/
3
_
_
and D =
_
_
7 0 0
0 1 0
0 0 1
_
_
.
Writing y = P
t
x, the quadratic form becomes 7y
2
1
+y
2
2
+y
2
3
which is clearly positive denite.
Example 11.2.4. Consider the quadratic form 5x
2
1
+ 6x
2
2
+ 7x
2
3
4x
1
x
2
+ 4x
2
x
3
. This can be written
in the form x
t
Ax, where
A =
_
_
5 2 0
2 6 2
0 2 7
_
_
and x =
_
_
x
1
x
2
x
3
_
_
.
The matrix A has eigenvalues
1
= 3,
2
= 6 and
3
= 9; see Example 10.3.3. Furthermore, we have
P
t
AP = D, where
P =
_
_
2/3 2/3 1/3
2/3 1/3 2/3
1/3 2/3 2/3
_
_
and D =
_
_
3 0 0
0 6 0
0 0 9
_
_
.
Writing y = P
t
2
1
+ 6y
2
2
+ 9y
2
3
which is clearly positive denite.
Example 11.2.5. Consider the quadratic form x
2
1
+x
2
2
+ 2x
1
x
2
. Clearly this is equal to (x
1
+x
2
)
2
and
is therefore not positive denite. The quadratic form can be written in the form x
t
Ax, where
A =
_
1 1
1 1
_
and x =
_
x
1
x
2
_
.
It follows from Proposition 11B that the eigenvalues of A are not all positive. Indeed, the matrix A has
eigenvalues
1
= 2 and
2
= 0, with corresponding eigenvectors
_
1
1
_
and
_
1
1
_
.
Hence we may take
P =
_
1/
2 1/
2
1/
2 1/
2
_
and D =
_
2 0
0 0
_
.
Writing y = P
t
2
1
which is not positive denite.
11.3. Real Fourier Series
Let E denote the collection of all functions f : [, ] R which are piecewise continuous on the
interval [, ]. This means that any f E has at most a nite number of points of discontinuity, at
each of which f need not be dened but must have one sided limits which are nite. We further adopt
the convention that any two functions f, g E are considered equal, denoted by f = g, if f(x) = g(x)
for every x [, ] with at most a nite number of exceptions.
It is easy to check that E forms a real vector space. More precisely, let E denote the function
: [, ] R, where (x) = 0 for every x [, ]. Then the following conditions hold:
For every f, g E, we have f +g E.
For every f, g, h E, we have f + (g +h) = (f +g) +h.
For every f E, we have f + = +f = f.
For every f E, we have f + (f) = .
For every f, g E, we have f +g = g +f.
For every c R and f E, we have cf E.
For every c R and f, g E, we have c(f +g) = cf +cg.
For every a, b R and f E, we have (a +b)f = af +bf.
For every a, b R and f E, we have (ab)f = a(bf).
For every f E, we have 1f = f.
We now give this vector space E more structure by introducing an inner product. For every f, g E,
write
f, g) =
1
f(x)g(x) dx.
The integral exists since the function f(x)g(x) is clearly piecewise continuous on [, ]. It is easy to
check that the following conditions hold:
For every f, g E, we have f, g) = g, f).
For every f, g, h E, we have f, g +h) = f, g) +f, h).
For every f, g E and c R, we have cf, g) = cf, g).
For every f E, we have f, f) 0, and f, f) = 0 if and only if f = .
Hence E is a real inner product space.
The diculty here is that the inner product space E is not nite-dimensional. It is not straightforward
to show that the set
_
1
2
, sin x, cos x, sin 2x, cos 2x, sin 3x, cos 3x, . . .
_
(4)
in E forms an orthonormal basis for E. The diculty is to show that the set spans E.
Remark. It is easy to check that the elements in (4) form an orthonormal system. For every k, m N,
we have
_
1
2
,
1
2
_
=
1
1
2
dx = 1;
_
1
2
, sin kx
_
=
1
2
sin kx = 0;
_
1
2
, cos kx
_
=
1
2
cos kx = 0;
as well as
sin kx, sin mx) =
1
sin kxsin mxdx =

1
1
2
(cos(k m)x cos(k +m)x) dx =
_
1 if k = m,
0 if k ,= m;
cos kx, cos mx) =
1
cos kxcos mxdx =

1
1
2
(cos(k m)x + cos(k +m)x) dx =
_
1 if k = m,
0 if k ,= m;
and
sin kx, cos mx) =
1
sin kxcos mxdx =

1
1
2
(sin(k m)x + sin(k +m)x) dx = 0.
Let us assume that we have established that the set (4) forms an orthonormal basis for E. Then a
natural extension of Proposition 9H gives rise to the following: Every function f E can be written
uniquely in the form
a
0
2
+
n=1
(a
n
cos nx +b
n
sin nx), (5)
known usually as the (trigonometric) Fourier series of the function f, with Fourier coecients
a
0
2
=
_
f,
1
2
_
=
1
f(x) dx,
and, for every n N,
a
n
= f, cos nx) =
1
f(x) cos nxdx and b

n
= f, sin nx) =
1
f(x) sin nxdx.

Note that the constant term in the Fourier series (5) is given by
_
f,
1
2
_
1
2
=
a
0
2
.
Example 11.3.1. Consider the function f : [, ] R, given by f(x) = x for every x [, ]. For
every n N 0, we have
a
n
=
1
xcos nxdx = 0,
since the integrand is an odd function. On the other hand, for every n N, we have
b
n
=
1
xsin nxdx =
2
_

0
xsin nxdx,
since the integrand is an even function. On integrating by parts, we have
b
n
=
2
_
xcos nx
n
_
0
+
_

0
cos nx
n
dx
_
=
2
_
xcos nx
n
_
0
+
_
sin nx
n
2
_
0
_
=
2(1)
n+1
n
.
We therefore have the (trigonometric) Fourier series
n=1
2(1)
n+1
n
sin nx.
Note that the function f is odd, and this plays a crucial role in eschewing the Fourier coecients a
n
corresponding to the even part of the Fourier series.
Example 11.3.2. Consider the function f : [, ] R, given by f(x) = [x[ for every x [, ]. For
a
n
=
1
[x[ cos nxdx =

2
_

0
xcos nxdx,
since the integrand is an even function. Clearly
a
0
=
2
_

0
xdx = .
Furthermore, for every n N, on integrating by parts, we have
a
n
=
2
__
xsin nx
n
_
_

0
sin nx
n
dx
_
=
2
__
xsin nx
n
_
0
+
_
cos nx
n
2
_
0
_
=
_
_
0 if n is even,
4
n
2
if n is odd.
On the other hand, for every n N, we have
b
n
=
1
[x[ sin nxdx = 0,

since the integrand is an odd function. We therefore have the (trigonometric) Fourier series
n=1
n odd
4
n
2
cos nx =

2

k=1
4
(2k 1)
2
cos(2k 1)x.
Note that the function f is even, and this plays a crucial role in eschewing the Fourier coecients b
n
corresponding to the odd part of the Fourier series.
Example 11.3.3. Consider the function f : [, ] R, given for every x [, ] by
f(x) = sgn(x) =
_
+1 if 0 < x ,
0 if x = 0,
1 if x < 0.
For every n N 0, we have
a
n
=
1
sgn(x) cos nxdx = 0,

since the integrand is an odd function. On the other hand, for every n N, we have
b
n
=
1
sgn(x) sin nxdx =

2
_

0
sin nxdx,
since the integrand is an even function. It is easy to see that
b
n
=
2
_
cos nx
n
_
0
=
_
_
0 if n is even,
4
n
if n is odd.
We therefore have the (trigonometric) Fourier series
n=1
n odd
4
n
sin nx =
k=1
4
(2k 1)
sin(2k 1)x.
Example 11.3.4. Consider the function f : [, ] R, given by f(x) = x
2
for every x [, ]. For
a
n
=
1
x
2
cos nxdx =
2
_

0
x
2
cos nxdx,
since the integrand is an even function. Clearly
a
0
=
2
_

0
x
2
dx =
2
2
3
.
Furthermore, for every n N, on integrating by parts, we have
a
n
=
2
__
x
2
sin nx
n
_
_

0
2xsin nx
n
dx
_
=
2
__
x
2
sin nx
n
_
0
+
_
2xcos nx
n
2
_
_

0
2 cos nx
n
2
dx
_
=
2
__
x
2
sin nx
n
_
0
+
_
2xcos nx
n
2
_
_
2 sin nx
n
3
_
0
_
=
4(1)
n
n
2
.
On the other hand, for every n N, we have
b
n
=
1
x
2
sin nxdx = 0,
since the integrand is an odd function. We therefore have the (trigonometric) Fourier series
2
3
+
n=1
4(1)
n
n
2
cos nx.
1. Consider the function f : [1, 1] R : x x
3
. We wish to nd a polynomial g(x) = ax +b which
minimizes the error
_
1
1
[f(x) g(x)[
2
dx.
Follow the steps below to nd this polynomial g:
a) Consider the real vector space C[1, 1]. Write down a suitable real inner product on C[1, 1] for
this problem, explaining carefully the steps that you take.
b) Consider now the subspace P
1
[1, 1] of all polynomials of degree at most 1. Describe the poly-
nomial g in terms of f and orthogonal projection with respect to the inner product in part (a).
Give a brief explanation for your choice.
c) Write down a basis of P
1
[1, 1].
d) Apply the Gram-Schmidt process to your basis in part (c) to obtain an orthogonal basis of
P
1
[1, 1].
e) Describe your polynomial in part (b) as a linear combination of the elements of your basis in part
(d), and nd the precise values of the coecients.
2. For each of the following functions, nd the best least squares approximation by linear polynomials
of the form ax +b, where a, b R:
a) f : [0, /2] R : x sin x b) f : [0, 1] R : x x
3
c) f : [0, 2] R : x e
x
3. Consider the quadratic form 2x
2
1
+x
2
2
+x
2
3
+ 2x
1
x
2
+ 2x
1
x
3
in three variables x
1
, x
2
, x
3
.
a) Write the quadratic form in the form x
t
Ax, where
x =
_
_
x
1
x
2
x
3
_
_
and where A is a symmetric matrix with real entries.
b) Apply the Orthogonal diagonalization process to the matrix A.
c) Find a transformation of the type x = Py, where P is an invertible matrix, so that the quadratic
form can be written as y
t
Dy, where
y =
_
_
y
1
y
2
y
3
_
_
and where D is a diagonal matrix with real entries. You should give the matrices P and D
explicitly.
d) Is the quadratic form positive denite? Justify your assertion both in terms of the eigenvalues of
A and in terms of your solution to part (c).
4. For each of the following quadratic forms in three variables, write it in the form x
t
Ax, nd a
substitution x = Py so that it can be written as a diagonal form in the variables y
1
, y
2
, y
3
, and
determine whether the quadratic form is positive denite:
a) x
2
1
+x
2
2
+ 2x
2
3
2x
1
x
2
+ 4x
1
x
3
+ 4x
2
x
3
b) 3x
2
1
+ 2x
2
2
+ 3x
2
3
+ 2x
1
x
3
c) 3x
2
1
+ 5x
2
2
+ 4x
2
3
+ 4x
1
x
3
4x
2
x
3
d) 5x
2
1
+ 2x
2
2
+ 5x
2
3
+ 4x
1
x
2
8x
1
x
3
4x
2
x
3
e) x
2
1
5x
2
2
x
2
3
+ 4x
1
x
2
+ 6x
2
x
3
5. Determine which of the following matrices are positive denite:
a)
_
_
0 1 1
1 0 1
1 1 0
_
_
b)
_
_
3 1 1
1 1 2
1 2 1
_
_
c)
_
_
6 1 7
1 1 2
7 2 9
_
_
d)
_
_
6 2 1
2 6 1
1 1 5
_
_
e)
_
_
3 2 4
2 6 2
4 2 3
_
_
f)
_
_
_
2 0 0 0
0 1 0 1
0 0 2 0
0 1 0 1
_
_
_
6. Find the trigonometric Fourier series for each of the following functions f : [, ] C:
a) f(x) = x[x[ for every x [, ]
b) f(x) = [ sin x[ for every x [, ]
c) f(x) = [ cos x[ for every x [, ]
d) f(x) = 0 for every x [, 0] and f(x) = x for every x (0, ]
e) f(x) = sin x for every x [, 0] and f(x) = cos x for every x (0, ]
f) f(x) = cos x for every x [, 0] and f(x) = sin x for every x (0, ]
g) f(x) = cos(x/2) for every x [, ]
h) f(x) = sin(x/2) for every x [, ]
LINEAR ALGEBRA
W W L CHEN
c W W L Chen, 1997, 2008.
Chapter 12
COMPLEX VECTOR SPACES
12.1. Complex Inner Products
Our task in this section is to dene a suitable complex inner product. We begin by giving a reminder of
the basics of complex vector spaces or vector spaces over C.
Definition. A complex vector space V is a set of objects, known as vectors, together with vector
addition + and multiplication of vectors by elements of C, and satisfying the following properties:
(SM1) For every c C and u V , we have cu V .
(SM2) For every c C and u, v V , we have c(u +v) = cu + cv.
(SM3) For every a, b C and u V , we have (a + b)u = au + bu.
(SM4) For every a, b C and u V , we have (ab)u = a(bu).
Remark. Subspaces of complex vector spaces can be dened in a similar way as for real vector spaces.
An example of a complex vector space is the euclidean space C
n
consisting of all vectors of the form
u = (u
1
, . . . , u
n
), where u
1
, . . . , u
n
C. We shall rst generalize the concept of dot product, norm and
distance, rst developed for R
n
in Chapter 9.
1
, . . . , u
n
) and v = (v
1
, . . . , v
n
) are vectors in C
n
. The complex
euclidean inner product of u and v is dened by
u v = u
1
v
1
+ . . . + u
n
v
n
,
Chapter 12 : Complex Vector Spaces page 1 of 6
the complex euclidean norm of u is dened by
u = (u u)
1/2
= (|u
1
|
2
+ . . . +|u
n
|
2
)
1/2
,
and the complex euclidean distance between u and v is dened by
d(u, v) = u v = (|u
1
v
1
|
2
+ . . . +|u
n
v
n
|
2
)
1/2
.
Corresponding to Proposition 9A, we have the following result.
PROPOSITION 12A. Suppose that u, v, w C
n
and c C. Then
(a) u v = v u;
(b) u (v +w) = (u v) + (u w);
(c) c(u v) = (cu) v; and
(d) u u 0, and u u = 0 if and only if u = 0.
The following denition is motivated by Proposition 12A.
Definition. Suppose that V is a complex vector space. By a complex inner product on V , we mean a
function , : V V C which satises the following conditions:
(IP1) For every u, v V , we have u, v = v, u.
(IP2) For every u, v, w V , we have u, v +w = u, v +u, w.
(IP3) For every u, v V and c C, we have cu, v = cu, v.
(IP4) For every u V , we have u, u 0, and u, u = 0 if and only if u = 0.
Definition. A complex vector space with an inner product is called a complex inner product space or
a unitary space.
Definition. Suppose that u and v are vectors in a complex inner product space V . Then the norm of
u is dened by
u = u, u
1/2
,
and the distance between u and v is dened by
d(u, v) = u v.
Using this inner product, we can discuss orthogonality, orthogonal and orthonormal bases, the Gram-
Schmidt orthogonalization process, as well as orthogonal projections, in a similar way as for real inner
product spaces. In particular, the results in Sections 9.4 and 9.5 can be generalized to the case of complex
inner product spaces.
12.2. Unitary Matrices
For matrices with real entries, orthogonal matrices and symmetric matrices play an important role in the
orthogonal diagonalization problem. For matrices with complex entries, the analogous roles are played
by unitary matrices and hermitian matrices respectively.
Definition. Suppose that A is a matrix with complex entries. Suppose further that the matrix A is
obtained from the matrix A by replacing each entry of A by its complex conjugate. Then the matrix
A
= A
t
is called the conjugate transpose of the matrix A.
PROPOSITION 12B. Suppose that A and B are matrices with complex entries, and that c C. Then
(a) (A
= A;
(b) (A + B)
= A
+ B
;
(c) (cA)
= cA
; and
(d) (AB)
= B
.
Definition. A square matrix A with complex entries and satisfying the condition A
1
= A
is said to
be a unitary matrix.
Corresponding to Proposition 10B, we have the following result.
PROPOSITION 12C. Suppose that A is an n n matrix with complex entries. Then
(a) A is unitary if and only if the row vectors of A form an orthonormal basis of C
n
under the complex
euclidean inner product; and
(b) A is unitary if and only if the column vectors of A form an orthonormal basis of C
n
under the
complex euclidean inner product.
12.3. Unitary Diagonalization
Corresponding to the orthogonal disgonalization problem in Section 10.3, we now discuss the following
unitary diagonalization problem.
Definition. A square matrix A with complex entries is said to be unitarily diagonalizable if there exists
a unitary matrix P with complex entries such that P
1
AP = P
AP is a diagonal matrix with complex

entries.
First of all, we would like to determine which matrices are unitarily diagonalizable. For those that
are, we then need to discuss how we may nd a unitary matrix P to carry out the diagonalization. As
before, we study the question of eigenvalues and eigenvectors of a given matrix; these are dened as for
the real case without any change.
In Section 10.3, we have indicated that a square matrix with real entries is orthogonally diagonalizable
if and only if it is symmetric. The most natural extension to the complex case is the following.
Definition. A square matrix A with complex entries is said to be hermitian if A = A
.
Unfortunately, it is not true that a square matrix with complex entries is unitarily diagonalizable
if and only if it is hermitian. While it is true that every hermitian matrix is unitarily diagonalizable,
there are unitarily diagonalizable matrices that are not hermitian. The explanation is provided by the
following.
Definition. A square matrix A with complex entries is said to be normal if AA
= A
A.
Remark. Note that every hermitian matrix is normal and every unitary matrix is normal.
Corresponding to Propositions 10E and 10G, we have the following results.
PROPOSITION 12D. Suppose that A is an n n matrix with complex entries. Then it is unitarily
diagonalizable if and only if it is normal.
PROPOSITION 12E. Suppose that u
1
and u
2
are eigenvectors of a normal matrix A with complex
entries, corresponding to distinct eigenvalues
1
and
2
respectively. Then u
1
u
2
= 0. In other words,
eigenvectors of a normal matrix corresponding to distinct eigenvalues are orthogonal.
We can now follow the procedure below.
UNITARY DIAGONALIZATION PROCESS. Suppose that A is a normal n n matrix with
complex entries.
(1) Determine the n complex roots
1
, . . . ,
n
of the characteristic polynomial det(A I), and nd
n linearly independent eigenvectors u
1
, . . . , u
n
of A corresponding to these eigenvalues as in the
Diagonalization process.
(2) Apply the Gram-Schmidt orthogonalization process to the eigenvectors u
1
, . . . , u
n
to obtain orthogo-
nal eigenvectors v
1
, . . . , v
n
of A, noting that eigenvectors corresponding to distinct eigenvalues are
already orthogonal.
(3) Normalize the orthogonal eigenvectors v
1
, . . . , v
n
to obtain orthonormal eigenvectors w
1
, . . . , w
n
of
A. These form an orthonormal basis of C
n
. Furthermore, write
P = ( w
1
. . . w
n
) and D =
1
.
.
.
,
where
1
, . . . ,
n
C are the eigenvalues of A and where w
1
, . . . , w
n
C
n
orthogonalized and normalized eigenvectors. Then P
AP = D.
We conclude this chapter by discussing the following important result which implies Proposition 10F,
that all the eigenvalues of a symmetric real matrix are real.
PROPOSITION 12F. Suppose that A is a hermitian matrix. Then all the eigenvalues of A are real.
Sketch of Proof. Suppose that A is a hermitian matrix. Suppose further that is an eigenvalue of
A, with corresponding eigenvector v. Then
Av = v.
Multiplying on the left by the conjugate transpose v
of v, we obtain
v
Av = v
v = v
v.
To show that is real, it suces to show that the 1 1 matrices v
Av and v
v both have real entries.

Now
(v
Av)
= v
(v
= v
Av
and
(v
v)
= v
(v
= v
v.
It follows that both v
Av and v
v are hermitian. It is easy to prove that hermitian matrices must have

real entries on the main diagonal. Since v
Av and v
v are 1 1, it follows that they are real.

1. Consider the set V of all matrices of the form
z 0
0 z
,
where z C, with matrix addition and scalar multiplication. Determine whether V forms a complex
vector space.
2. Is R
n
a subspace of C
n
? Justify your assertion.
3. Prove Proposition 12A.
4. Suppose that u, v, w are elements of a complex inner product space, and that c C.
a) Show that u +v, w = u, w +v, w.
b) Show that u, cv = cu, v.
5. Let V be the vector space of all continuous functions f : [0, 1] C. Show that
f, g =
1
0
f(x)g(x) dx
denes a complex inner product on V .
6. Suppose that u, v are elements of a complex inner product space, and that c C.
a) Show that u cv, u cv = u, u cu, v cu, v + ccv, v.
b) Deduce that u, u cu, v cu, v + ccv, v 0.
c) Prove the Cauchy-Schwarz inequality, that |u, v|
2
u, uv, v.
7. Generalize the results in Sections 9.4 and 9.5 to the case of complex inner product spaces. Try to
prove as many results as possible.
8. Prove Proposition 12B.
9. Prove Proposition 12C.
10. Prove that the diagonal entries on every hermitian matrix are all real.
11. Suppose that A is a square matrix with complex entries.
a) Prove that det(A) = det A.
b) Deduce that det(A
) = det A.
c) Prove that if A is hermitian, then det A is real.
d) Prove that if A is unitary, then | det A| = 1.
12. Apply the Unitary diagonalization process to each of the following matrices:
a) A =
4 1 i
1 + i 5
b) A =
3 i
i 3
c) A =
5 0 0
0 1 1 + i
0 1 i 0

13. Suppose that
1
and
2
are distinct eigenvalues of a hermitian matrix A, with eigenvectors u
1
and
u
2
respectively.
a) Show that u
1
Au
2
=
1
u
1
u
2
and u
1
Au
2
=
2
u
1
u
2
.
b) Complete the proof of Proposition 12E.
14. Suppose that A is a square matrix with complex entries, and that A
= A.
a) Show that iA is a hermitian matrix.
b) Show that A is unitarily diagonalizable but has purely imaginary eigenvalues.

Linear Algebra WWL Chen

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Linear Algebra WWL Chen

Caricato da

Copyright:

Formati disponibili

LINEAR ALGEBRA

represent the coecients and

represents the variables.

, with reduced row echelon form

, with reduced row echelon form

0.8 0.6 0.1

Chapter 1 : Linear Equations page 26 of 31

such that A+A

obtained from A by multiplying each entry of A

Example 2.9.3. Let k be a xed real number. The matrix

Example 2.9.3. Let k be a xed real number. The matrix

, and followed by translation by vector (2, 1), has matrix

, and followed by translation by vector (2, 1), has matrix

is known as an optimal strategy for player R, and the strategy q

) is known as the value of the game. Optimal

are another pair of optimal strategies,

, are optimal strategies, so that the value of

) for all strategies p and q.

is the production vector that satises an outside demand d

is the production vector that satises

d) translation by the xed vector (3, 2)

is triangular. Then det(A) = a

Chapter 3 : Determinants page 11 of 24

is called the adjoint of the matrix A.

to denote the permutation .

can be represented in cycle notation as (1 2 4 3)(5 6).

can be written in cycle notation as (1 3 5 7 4)(6 8 9). By Theorem 3P(b), we have

. Then there exist a nite sequence G

must have a zero row.

B must have a zero row, and so det(A

B), so it follows from

10. Use Cramers rule to solve the system of linear equations

3, 3). Then by the formula (2), we have

3, 3). Then by the formula (2), we have

OP respectively. If we project the vector u on to

OP respectively. If we project the vector u on to

OP respectively. If we project the vector u on to

2). Then by the formula (5), we have

2). Then by the formula (5), we have

= 6(x 1) 4(y 1) 10(z 1) = 6x 4y 10z + 20.

= 6(x 1) 4(y 1) 10(z 1) = 6x 4y 10z + 20.

OQ represents the orthogonal projection proj

represents the velocity of the particle, and its derivative

represents the acceleration of the particle. We often write r = r, v = v and a = a.

with the vertical. We wish to nd the tension T on the rope. To

with the vertical. We wish to nd the tension T on the rope. To

with the vertical. We wish to nd the tension T on the rope. To

v = 0. Show that if u (v w) = 0, then

with the vertical, while the rope on the right makes

with the vertical. Find the tension on the two ropes.

with the vertical, while the rope on the right makes

with the vertical. Find the tension on the two ropes.

is interpreted as an element of the vector space R

be a matrix made up of a collection of columns of A, and let B

be the matrix made

can be obtained from the matrix A

by elementary row operations, the two systems have the

are linearly independent precisely when the

x = 0 has only the trivial solution, precisely when the system B

x = 0 has only the trivial

are linearly independent.

can be reduced to row echelon form as

can be reduced to row echelon form as