Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Curtis
Linear Algebra
An Introductory Approach
With 37 Illustrations
Springer-Verlag
New York Berlin Heidelberg Tokyo
Charles W. Curtis
Department of Mathematics
University of Oregon
Eugene, OR 97403
U.S.A.
Editorial Board
F. W. Gehring P. R. Halmos
Department of Mathematics Department of Mathematics
University of Michigan Indiana University
Ann Arbor, MI 48109 Bloomington, IN 47405
U.S.A. U.S.A.
Previous editions of this book were published by Allyn and Bacon, Inc.: Boston.
9 8 7 6 54 3 2 I
vii
viii PREFACE
ix
x CONTENTS
5. Determinants 132
16. Definition of determinants 132
17. Existence and uniqueness of determinants 140
18. The multiplication theorem for determinants 146
19. Further properties of determinants 150
Index 334
Chapter 1
Introduction to Linear
Algebra
This section should be read quickly the first time through, with the
objective of getting some motivation, but not a thorough understanding
of the details.
1
2 INTRODUCTION TO LINEAR ALGEBRA CHAP. 1
FIGURE 1.1
(1.2)
x + 2y - Z = 1
2x + y + Z = o.
This time it isn't so easy to eliminate the unknowns. Yet we see that if z
is given a particular value, say 0, we can solve the resulting system
x + 2y = 1
(z = 0)
2x+ y=O
SEC. 1 SOME PROBLEMS WillCR LEAD TO LINEAR ALGEBRA 3
FIGURE 1.2
x + 2y - 3z + t = 1
(1.3)
x+ y+ z+t=O.
(1.5)
x + 2y - 3z +t= 0
x + y+ z +t= O.
(ii) If u = (a, fJ, y, 8) and u' = (a', fJ', y', 8') are solutions of the
homogeneous system (1.5), then so are
(a + a', fJ + fJ', y + y', 8 + 8')
and
(Aa, AfJ, AY, A8)
for an arbitrary number A.
t See the list of Greek letters on p. 332.
SEC. 1 SOME PROBLEMS WHICH LEAD TO LINEAR ALGEBRA 5
(a - a, b - (J, c - y, d - 8)
is a solution of the homogeneous system, and setting g= a - a,
7]=b-~A=C-~p,=d-~~~~
where F is some fixed function. Then we see that all three of our state-
ments about Problem A are satisfied. The difference of two solutions of
the nonhomogeneous system (1.7) is a solution of the homogeneous
system. The solutions of the homogeneous system satisfy statement (ii),
if the operations of adding functions and multiplying functions by con-
stants replace the corresponding operations on vectors. FinallYI an
arbitrary solution of the nonhomogeneous system is obtained from a
particular one 10 by adding to 10 a solution of the homogeneous system.
It is reasonable to believe that the analogous behavior of the solutions
of Problems A and B is not a pure coincidence. The common framework
underlying both problems will be provided by the concept of a vector
space, to be introduced in Chapter 2. Before beginning the study of
vector spaces, we have one more introductory section In which we
review facts about sets and numbers which will be needed.
We call it the empty set, and denote it by 0. Thus, for every object
x, x ¢ 0. For example, the set of all real numbers x for which the
inequalities x < 0 and x > i hold simultaneously is the empty set. The
reader will check that from our definition of subset it follows logically
that the empty set 0 is a subset of every set (why?).
There are two important constructions which can be applied to
subsets of a set and yield new subsets. Suppose U and V are subsets of a
given set X. We define Un V to be the set consisting of all' elements
belonging to both U and V and call U n V the intersection of U and V.
Question: What is the intersection of the set of real numbers x such that
x > 0 with the set of real numbers y such that y < 5? If we have many
subsets of X, their intersection is defined as the set of all elements which
belong to all the given subsets.
The second construction is the union U U V of U and V; this is the
subset of X consisting of all elements which belong either to U or to V.
(When we say "either ... or," it is understood that we mean "either ...
or ... or both.")
It is frequently useful to illustrate statements about sets by drawing
diagrams. Although they have no mathematical significance, they do
give us confidence that we are making sense, and sometimes they suggest
important steps in an argument. For example, the statement Xc Y is
illustrated by Figure 1.3. In Figure 1.4 the shaded portion indicates
U U V, while the cross-hatched portion denotes Un V.
FIGURE 1.3
FIGURE 1.4
8 INTRODUCTION TO LINEAR ALGEBRA CHAP. 1
number (a, 0) E R' has the properties that (a) (a,O) = (a', 0) if and only
if a = a' and that (b) it preserves the operations in the respective number
systems, according to the equations (2.3) above. A correspondence be-
tween two fields with the properties (a) and (b) is called an isomorphism,
the word coming from Greek words meaning" to have the same form."
The existence of the isomorphism a ~ (a, 0) between the fields Rand R'
means that if we ignore all other properties not given in the definition of a
field, then Rand R' are indistinguishable. Thus we shall identify the
elements of R with the corresponding elements of R', and in this sense
we have the following inclusions among the number systems defined so
far:
ZcQcRcC.
(2.6)
n(n + 1)
1+2+3+"'+n= 2 '
-_ (n+ 1) (n-+ 1) -_ (n + 1) (n + 2) ,
2 2
which is the statement (2.6) for n + 1. By the principle of mathematical
induction, we conclude that (2.6) is true for all positive integers n.
Not all of the statements given below are proved in detail. Those
appearing with starred numbers [ for example, (2.9)* ] are left as exercises
for the reader.
(2.11) -(-a) = a.
Proof The result comes from examining the equation a + (-a) = 0
from a different point of view. For we have also -(-a) + (-a) = 0;
by the cancellation law we obtain -( -a) = a.
(2.15)* (a-1)-1 = a if a i= O.
Thus' far we haven't used the distributive law [2.1 (3)]. In a way,
this is the most powerful axiom and most of the more exotic theorems in
elementary algebra, such as (- 1)( - 1) = 1, follow from the distributive
law.
(2.19) (-1)(-1) = 1.
EXERCISES
3. The binomial coefficients (Z), for n = 1,2, ... and for each n, k =~O, 1,
2, ... , n, are certain positive integers defined by induction as follows.
(~) = G) = 1. Assuming (n ~ 1) has been defined for some n, and for
k = 0, 1, ... , n - 1, the binomial coefficient (Z) is defined by (~) = (7) = 1,
and
This chapter contains the basic definitions and facts about vector spaces,
together with a thorough discussion of the application of the general
results on vector spaces to the determination of the solutions of systems of
linear equations. The chapter concludes with an optional section on the
geometrical interpretation of the theory of ~ystems of linear equations.
Some motivation for the definition of a vector space and the theorems to
be proved in this chapter was given in Section 1.
3. VECTOR SPACES
16
SEC. 3 VECTOR SPACES 17
have quite different meanings. We then defined the sum of two vectors
u and u' as above by
u + u' = (a + a', fJ + fJ', I' + 1", I> + 1>')
and also the product of a vector u = (a, fJ, 1', 1» by a real number A, by
AU = (Aa, AfJ, AI', AI».
With these definitions, and from the field properties of the real numbers,
we see that vectors satisfy the following laws with respect to the operations
u + u' and AU that we have defined.
1. u + u' = u' + u (commutative law).
2: (u + u') + u" = u + (u' + u") (associative law).
3. There exists a vector 0 such that u + 0 = u for all u.
4. For each vector u, there exists a vector -u such that u + (-u) = O.
5. a(fJu) = (afJ)u for all vectors u and real numbers a and fJ.
6. (a + fJ)u = aU + fJu } (d' 'b . l )
')
( + U=(X,U
7• au +aU' Istn utlVe aws .
8. 1· u = u for all vectors u.
The proofs of these facts are immediate from the field axioms for the real
numbers, and the definition of equality of two vectors. For example we
shall give a proof of (2). Ut
u = (a, fJ, 1',1», u' = (a', fJ', ~', 1», u" = (a", fJ", 1'", 1>").
Then
(~ + u') + u"
= «a + a') + a", (fJ + fJ') + fJ", (I' + 1") + 1''', (I> + 1>') + 1>")
and
u + (u' + u")
= (a + (a' + a"), fJ + (fJ' + fJ"), I' + (I" + 1'''), I> + (I>' + 1>"».
Since we have (a + a') + a" = a + (a' + a"), etc., by the associative law
in R, it follow~ that the vectors (u + u') + u" and u" + (u' + un) are
equal, and the associative law for vectors is proved. The other statements
can be verified in a similar way.
Now let us recall the second problem discussed in Section 1 (Problem
B). In that problem we were interested in functions on the real line.
Certain algebraic operations on functions were introduced, namely,
taking the sumJ + g of two functionsJand g and multiplying a function
by a real number. More precisely, the definition of the sum J + g is
(f + g)(x) = J(x) + g(x), x E R and for A E R, the function ,\f is defined
by (Af)(x) = A. J(x), x E R. Without much effort, it can be checked that
18 VECTOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
the operations! + g and A/satisfy the conditions (1) to (8) which we just
verified for vectors u = (a, f3, 'Y, (5).
The common properties of these operations in the two examples
leads to the following abstract concept of a vector space over a field.
The point is, if we are to investigate the problems in Section 1 as fully as
possible, we are practically forced to invent the concept of a vector space.
(3.1) DEFINITION. Let F be an arbitrary field. A vector space V over F
is a nonempty set V of objects {v}, called vectors, fogether with two
operations, one of which assigns to each pair of vectors v and w a vector
v + w called the sum of v and w, and the other of which assigns to each
element a E F and each vector v E V a vector aV called the product of v by
the element a E F. The operations are assumed to satisfy the following
axioms, for a, f3 E F and for u, v, w E V.
1. u + (v + w) = (u + v) + w, and u + v = v + u.
2. There is a vector 0 such that u + 0 = u for all u E V.t
3. For each vector u there is a vector -u such that u + (-u) = O.
4. a(u + v) = au + avo
5. (a + f3)u = au + f3u.
6. (af3)u = a(f3u).
7. lu = u.
In this text we shall generally use Roman letters {x, y, u, v, ... } to
denote vectors and Greek letters (see p. 332) {a, f3, 'Y, 15, g, "I, 0, ,\, ... } to
denote elements of the field involved. Elements of the field are often
called scalars (or numbers, when the field is R.)
We can now check that our examples do satisfy the axioms. In the
first example, of vectors u = (a, f3, 'Y, (5), there is of course nothing special
about taking quadruples. We could have taken pairs, triples, etc. To
cover all these cases, we introduce the idea of an n-tuple of real numbers,
for some positive integer n.
Notice that nothing is said about the real numbers {at} in <ai' ... , an)
being different from one another. <0,0,0) and <1,0, I) are perfectly
legitimate 3-tuples. We note also that <1,0, I) i= <0, I, I), for example.
(3.3) DEFINITION. THE VECTOR SPACE Rn. The vector space Rn over
the field of real numbers R is the algebraic system consisting of all n-tuples
a = <ai, ... ,an) with ai E R, together with the operations of addition
and multiplication of n-tuples by real numbers, to be defined below.
The n-tuples a E Rn are called vectors, and the real numbers ai are called
the components of the vector a = <ai' ... , an). Two vectors a =
<ai' ... ,an) and b = <fJl, ... ,fJn) are said to be equal, and we write
a = b if and only if ai = fJi' i = I, ... , n. The sum a + b of the vectors
a = <ai' ... , an) and b = <fJl, ... , fJn) is defined by
a +b = <al + fJl, ... , an + fJn).
The product of the vector a by the real number A is defined by
,\a = <Aal, . . . , Aan).
It is straightforward to verify that Rn is a vector space over R,
according to Definition (3.1). In particular, the vector °
is given by
0= <0, ... ,0), and if a = <al, ... , an), then -a = < -al, ... , -an).
The proofs of (i) to (vi) are identical (word for word!) with the proofs
of the corresponding facts about fields. We give the proof of (vii). We
are given that
au = aV and a =f. O.
Then a- 1 exists, and the substitution principle (which applies to vector
spaces as well as to fields) implies that
a- 1(au) = a- 1(av).
By [3.1(6)] we obtain
(a- 1a)u = (a- 1a)v,
and since a- 1a = 1 and lu = u, Iv = v by [3.1(7)], we have u = v as
required.
We proceed to derive a few other consequences of the definition
which will be needed in the next section. The associative law states that
for a1, a2, a3 in V,
(a1 + a2) + a3 = a1 + (a2 + a3).
If we have four vectors a1, a 2 , a3, a4, there are the following possible
sums we can form:
a1 + [a2 + (a3 + a4) ],
a1 + [(a2 + a3) + a4 ],
[a1 + (a2 + a3)] + a4,
[ (a1 + a2) + a3 ] + a4·
The reader may check that all these expressions represent the same vector.
More generally, it can be proved by mathematical induction that all
possible ways of adding n vectors a1, ... , an together to form a single
sum yield a uniquely determined vector which we shall denote by
n
a1 + + an = 2: al·
1=1
(3.6)
The other rules can also be generalized to sums of more than two vectors:
(i
1=1
AI)a= i
1=1
Ala,
i
A(1=1 al) = ~
1=1
Aal'
The main point is that these rules do require proof from the basic
rules (3.1), and the reader will find it an interesting exercise in the use of
mathematical induction to supply, for example, a proof of (3.6).
We conclude this introductory section on vector spaces with some
examples to show that our definition of the vector space Rn is consistent
with the interpretation of vectors sometimes used in geometry and physics.
The study of these examples can be skipped or postponed without inter-
rupting the continuity of the discussion of vector spaces.
.
B
~
o• ••A
FIGURE 2.1
C= (3) D = (6)
I I
•I
A=(-l) 0 B= (2)
FIGURE 2.2
A=(I,I) B=(3,1)
FIGURE 2.3
C D
FIGURE 2.4
length and direction, and according to Definition (3.7), they represent the
same directed line segment, because AB = B - A = (2, 0> and cD =
D - C = <2,0>.
The meaning of Definition (3.7) in general is simply that two directed
line segments are equal provided they have the same length and direction.
We can now show that addition of directed line segments is given by
the "parallelogram law."
-" -" -"
(3.8) THEOREM. Let A, B, C be points in R 2 • Then AB + BC = Ae.
(See Figure 2.5).
-"
Proof By Definition (3.7), we have AB = B - A, BC = C - B, and
hence
-" -"
AB + BC = (B - A) + (C - B) = C- A = AC,
as required.
-" -"
Theorem (3.8) shows that if AB and BC are directed line segments,
arranged so that the end point of the first is the starting point of the
second, then their sum is the directed line segment given by the diagonal
of the parallelogram with adjacent sides AB and Be.
24 VECTOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
B C
/
'fl
/
/
/
/
/
/
A /
/
", \ /
/
" ',1,/
\ /
o
FIGURE 2.5
B D
L><l
A
FIGURE 2.6
C
SEC. 3 VECTOR SPACES 25
parallelogram are defined to be the line segments AB, CD, AC, and BD;
the diagonals are the line segments AD and Be.
EXERCISES
1. v»,
In the vector space R a , compute (that is, express in the form <.\, p-,
the following vectors formed from a = <-1,2, 1>, b = (2, I, - 3>,
c = <0,1,0>:
a. a + b + c.
b. 2a - b + c.
c. -a + 2b.
d. aa + fib + yc.
2. In R a , letting a, b, c be as in Exercise I, solve the equations below, for
x = <ai, a2, aa> in Ra.
a. x + a = b.
b. 2x - 3b = c.
c. b + x = a - 2c.
3. Check that the axioms for a vector space are satisfied for the vector
spaces R n , F n , and :F"(R).
4. In R", let a = <ai, ... ,an), b = <fib ... ,fin)' Show that b - a =
<fil - al , ... , fin - an).
The following exercises are based on Example A and (3.9) - (3.12) .
.....>. .....>.
5. Let AB be a directed line segment in R 2 • Show that AB = - BA .
.....>.
6. Show that the directed line segments AB and CD are equal if and only
if there exists a vector X such that C = A + X, and D = B + X.
(The interpretation is that two directed line segments are equal if and
only if one is carried onto the other by a translation. A translation of
R:a is a rule which sends each vector A to the vector A + X, for some
fixed vector X.)
7. Find the midpoints of the line segment AB in the following cases.
a. A = <0, 0), B = <2, I).
b. A = (1, -I), B = <3, 2).
26 VECTOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. Z
This section begins with the concept needed to study the examples given
in Section I.
The axioms (3.1) for a vector space are clearly satisfied for any
subspace S of a vector space V; so we see that a subspace is simply a
vector space contained in a larger vector space, in which the operations
of addition and multiplication by scalars are the same as those in the
larger vector space.
all real numbers A. This time the solutions of the homogeneous differential
equation (4.2) form a subspace of the vector space 9'(R) defined in
Section 3, and as in Example A, the problem of finding all solutions of the
differential equation comes down to finding exactly which functions
belong to this subspace of 9'(R).
Example C. It may have occurred to the reader by now that the concept of
subspace provides the framework for other questions discussed in
elementary calculus. For example, some standard results (which ones?)
about continuous and differentiable functions show that the following
subsets of 9'(R) are actually subspaces:
(i) The set C(R) of all continuous real functions defined on the real
line.
(ii) The set D(R) consisting of all differentiable real valued functions
onR.
Another familiar example of a subspace of 9'(R) is the following
one:
(iii) The set P(R) of polynomial functions on R, where a polynomial
function is a function f E 9'(R) such that for some fixed set of real
numbers ao, al, •.. , an,
for all x E R.
(4.4) If S is a subspace of V containing the vectors aI, ... , am, then every
linear combination of aI, ... , am belongs to s.
Proof We prove the result by induction on m. If m = 1, the result is
true by the definition of a subspace. Suppose that any linear combination
of m - 1 vectors in S belongs to S, and consider a linear combination
a = AlaI + ... + Amam
of m vectors belonging to S. Letting a f
= A2a2 + ... + Amam, we have
28 VECTOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
(4.5) Let {ai' ... , am} be a set of vectors in V, for m :2: 1; then the set
of all linear combinations of the vectors ai, ... , am forms a subspace
S = S(al' ... , am). S is the smallest subspace containing {ai, ... , am}, in
the sense that if T is any subspace containing ai, ... , am, then SeT.
Proof Let a = 2:7'=1 Aial and b = 2:1"= 1P-ial . Then
can replace the set of generators a1, ... , am by a smaller set. The state-
ment that al is a linear combination of a1, ... , a i -1, a i + 1, . . . , am means
that there exist A1 , ••• , Al -1, AI+ 1, . . . , Am in F such that
a i = A1a1 + ... + A1 - 1ai-1 + Ai +1ai+1 + ... + Amam·
Adding -ai = (-l)al to both sides and using the commutative "law, we
obtain
o= A1a1 + ... + A1 - 1tl;-1 + (-l)al + A!+lal+1 + ... + Amam,
and we have shown that there exists elements fJ-1 , ••• , fJ-m in F, not all equal
to zero, such that
(4.7)
(In this case, fJ-j = Aj , j # i, and fJ-l = - 1 # 0.) Conversely, suppose a
formula such as (4.7) holds, where not all the fJ-i are equal to zero. Then
we shall prove that some aj is a linear combination of the remaining
vectors {a;, i # j}. Suppose that fJ-j # O. Then using the commutative
law and adding - fJ-jaj to both sides, we obtain
(4.8)
(_fJ-j-1)(-fJ-ja) = (_fJ-j1)fJ-1a1 + ... + (-fJ-j1)fJ-j-1aj-1
+ (_fJ-j1)fJ-j+1aj+1 + ... + (-fJ-j1)fJ-mam·
Using Theorem (3.5), we have, for the left-hand side,
(- fJ-j 1)( - fJ-jaj) = fJ-j 1fJ-jaj = laj = aj.
Substituting this back in (4.8), we have proved that aj E S(a1' ... , aj -1,
aJ+1, ... , am). To summarize, we have proved the following important
result.
(4.9) THEOREM. Let {a1' ... ,am} be a set of vectors in V, for m ;::: 1.
Some vector a i can be expressed as a linear combination of the remaining
vectors {a 1 , ... , ai-1, al+ 1 , " " am} if and only if there exists elements
fJ-1' ... , fJ-m in F, not all equal to zero, such that
The new concept which was mentioned at the beginning of the proof
of Theorem (4.9) is the rather subtle but natural condition on the vectors
{a1' ... , am} which emerged in the course of the proof.
in V. The set of vectors {a1' ... , am} is said to be linearly dependent if there
exist elements of F, A1 , ... , Am, not all zero, such that
A1a1 + A2a2 + ... + Amam = 0.
Such a formula will be called a relation of linear dependence. A set of
vectors which is not linearly dependent is said to be linearly independent.
Thus, the set {a1' ... , am} is linearly independent if and only if
Do there exist numbers ct, fl, y not all zero, satisfying these equations?
Try setting y = 1; then the equations become
ct+fl+2=0
-a+{J+l=O,
ct = -to fl = -t, y = 1.
A solution is, first of all, a vector <al, a2, a3) in R 3 , and the set of all
solutions is a subspace S of R 3 • Setting X3 = 0, we see that
UI=(1,-t,O)ES.
Similarly, setting Xl = 0, we have
U2 = <0, 1 , 2) E S.
We will show that S =; S(Ul, U2); in other words, that Ul and U2 are
generators of the subspace consisting of all solutions of the equation.
Suppose U = <a, {3, y) is a solution. If a = {3 = y = 0, then U E S, and
there is nothing further to show. Now suppose a O. Then *
U- aliI = < 0, {3 + ~, y > <~ > ~
= 0, ,y = 112,
EXERCISES
2. Verify that the subsets C(R), D(R), and peR) of ~(R) defined in
Example C, are subspaces of ~(R).
3. Determine which of the following subsets of C(R) are subspaces of
C(R).
a. The set of polynomial functions in C(R).
b. The set of all IE C(R) such that I(t) is a rational number.
c. The set of alI IE C(R) such that 1m
= O.
g
d. The set of all IE C(R) such that I(t} dt = 1
g
e. The set of alI IE C(R) such that l(t) dt = O.
f. The set of alI IE C(R) such that dfldt = O.
g. The set of all IE C(R) such that for some a, f3 , y E R
a-
d21 + f3-
dl + yl= g
dt 2 dt
6. Show that the set of polynomial functions peR) (see Example C) is not
a finitely generated vector space.
7. Is the intersection of two subspaces always a subspace? Prove your
answer.
8. Is the union of two subs paces always a subspace? Explain.
9. Let a in Rn be a linear combination of vectors b1 , . . . , br in R n , and let
each vector b" 1 ::;; i ::;; Y, be a linear combination of vectors Cl, ••• , C••
Prove that a is a linear combination of Cl, ..• , c•.
10. Show that a set of vectors, which contains a set of linearly dependent
vectors, is linearly dependent. What is the analogous statement about
linearly independent vectors?
34 VECTOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
showing that Ul and U 2 are linearly dependent. But we know that Ul and
U2 are linearly independent; so we have arrived at a contradiction. There-
fore our original assumption, that S = S(U) , must have been incorrect,
and we conclude that every set of generators of S contains at least two
vectors.
Our objective is to use a similar argument to prove the following
basic result, which is a general version of what we have just proved in
connection with the example. The proof of the theorem is followed by a
numerical example, which the reader may prefer to study before tackling
the proof.
where at least one of the coefficients {fl2, ... , flm} is different from zero.
This completes the proof that {b l , ••. , bm} are linearly dependent, and
the theorem is proved.
Example B. Let n be a fixed positive integer. We shall prove that the vector
space R", defined in Section 2, has dimension n.
All we have to do is find one basis of Rn, containing n vectors. Let
el = <t, 0, 0, ... , 0)
e2 = <0, 1 , 0, ... , 0)
EXERCISES
Example A. We shall find a b.. sis for the subspace of R4 generated by the
vectors
a=<-3,2,1,4>, b = <4, 1 , 0, 2>, c = <-10,3,2,6>.
Letting el , e2 , e3, e4 be the basis of R4 consisting of the unit vectors
el = <1, 0, 0, 0) , e4 = <0,0,0, I),
we have, as in Example B of Section 5,
a = - 3el + 2e2 + e3 + 4e4
b = 4el + e2 + 2e4
c = - lOel + 3e2 + 2e3 + 6e4.
It is not obvious whether a, b, c are linearly independent or not. The
best way to find out, and to find a basis for S(a, b, c), is to experiment
with a, b, c by adding multiples of the vectors to each other, to arrive at
a new set of generators of S(a, b, c) for which the relations (*) are as
simple as possible. Some operations that lead to new sets of generators
are the following:
(I) Exchanging one vector for another [for example, S(b, a, c) =
S(a, b, c), since every linear combination of the vectors {b, a, c} is
certainly a linear combination of the vectors a, b, c and conversely].
(II) Replacing one vector by the sum of that vector and a multiple of
another- vector by a scalar. For example, S(a - 2b, b, c) =
S(a, b, c). To check this statement, every linear combination of
{a - 2b, b, c} is certainly a linear combination of {a, b, c} (why?).
Therefore S(a - 2b, b, c) C S(a, b, c). Conversely, suppose
x=Mi+p.b+vc
is a linear combination of {a, b, c}. Since
a = (a - 2b) + 2b,
we have
x = )..(a - 2b) + (2)" + p.)b + vc,
and we have shown that x E S(a - 2b, b, c).
The next thing to observe is that these operations on vectors really
involve only the coefficients in the equations (*).
A good way to visualize the situation is to introduce the concept of a
matrix.
SEC. 6 ROW EQUIVALENCE OF MATRICES 39
(6.2) ( -3 2 1 4)
4 1 0 2 .
-10 3 2 6
In general, the rows {rlo r2, r3} of a 3-by-4 matrix will be given as follows:
r1 = <all, a12, a13, au),
r2 = <a21 , a22, a23, (24) ,
r3 = <a31 , a32, a33, (34)·
The numbers {aij} are called the entries of the matrix. The notation is
chosen so that the entry aii is thejth entry in the ith row. For example, if
we apply this notation to the matrix (6.2), all = - 3, a21 = 4, a33 = 2,
au = 4, etc.
( -3.2 1 4)
4 1 0 2
II
(-114 01 01 0)
2 .
-10 3 2 6 -10 3 2 6
We shall call these operations of types I and II elementary row operations
on the matrix and write
~)
2 1 a12 a13
( -3 (all a 14 )
-1~
(6.3) 1 0 a21 a22 a23 a24
3 2 a31 a32 a33 a34
to mean that the matrix
A' =
(all
a21
a12
a22
a13
a23
a 14 )
a24
a31 a32 a33 a34
is obtained by the first matrix A in the following way. There exist a
finite number of matrices A = A 1 , A 2 , ••• ,As = A', all of the same size
40 VECTOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
( -! i
-10 3 2
~ ~).
6
This means that all we have to do to find a basis for S(a, b, c) is to apply
elementary row operations to the matrix (6.2) until we find vectors
{a', b', c /} as in (6.4) where it is easy to test for linear independence of the
nonzero vectors obtained.
One way to/do this is called gaussian elimination; we apply elementary
row operations to eliminate (or replace by zero) the coefficients of el in
all but one of the vectors, and then proceed to eliminate coefficients of
the next basis vector involved, etc.
In our example we shall eliminate the coefficients of el in the second
and third rows. By replacing the second row by the second row plus 4 of
the first row, we have
II
(
-3
2 1 4)
!! i 22
-1~
3 3 3 .
3 2 6
In the second matrix we replace the third row by the third row minus l.J!
times the first row, giving
2 1
-3
II 11 4
4)
22
C:
( 3 3 3.
-1~ -11
-3-
-4
3
-;2
Now we can simply replace the third row by the third row plus the second
row, obtaining
4 2~)
2
11 II
2~)
t Later
C: 3
-11 -4
-3- '"3
3" 3
-322
Putting our work down in a more economical form (which the reader
should use in doing problems of this sort), we have
-3
II
(
-1~
II
C:
!!2 ~1 224)
c: 3 3 3 .
o 0 0
It follows that>.= 0, since {el, e2, e3, e4} are linearly independent. The
equation (6.6) now becomes
p.b' = 0,
and we have p. = 0 since b' "# O.
We conclude that the vectors a' and b' form a basis for the subspace
Sea, b, c), because they are linearly independent, and generate the
subspace.
(mth row)
The columns of A are defined to be the ,vectors in Fm ,
Example B. Let
-!)
Then A is a 4-by-3 matrix with rows
r1 = <2,1,0)
r2 = <-
3 , 1 , 2)
ra = <I, 1 , -I)
r4=/<2,1,4).
The element in the third row, first column is 1, for example. What is >'22?
>'33? >'13?
(6.11) THEOREM. Let V be a vector space with a finite basis {a1' ... , an}.
Let {b 1 , ... , b m} be vectors in V, and let
b1 = A11 a1 + ... + A1nan
44 VECTOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
Let
Proof It is sufficient to prove the result for the case where A' is obtained
from A by one elementary row operation. If the row operation is of
type I or II, the conclusion of the theorem was checked in Example A,
and we shall not repeat the details. Suppose now that the elementary
row operation is of type III, so that the ith row r; of A' is given by
for some /L '# O. We have to prove that the subspaces generated by the
vectors {b l , . . . , b m} and {b l , . . . , bl - I , /Lbt. bl+ I, . . . , b m} are the same.
To show that
(6.12)
let
nonzero entry in bi is to the left of the position of the first nonzero entry
in bl + 1 , for i = 1,2, .. . ,p-l. For example,
(1,0, -2,3), <0,1,2,0), <0,0,0, I)
are in echelon form, while
<0,1,0,0), (1,0,0,0)
are not.
of a set of vectors {b 1, ... , bm} in terms of a set of basis vectors {a1 , ... , an}
of a vector space V [as in (6.7)]. Suppose that the rows r1, ... , rm of the
matrix A are in echelon form. Then the vectors {b 1 , ... , bm} are linearly
independent.
Proof We shall use induction on m. If m = 1, then by Definition
(6.13), b 1 # 0, and {b 1 } is a linearly independent set.
As an induction
hypothesis, we assume that m > 1, and that the result holds for a matrix
with m - 1 rows. Now let A be as in the statement of the lemma, and
suppose that
(6.15)
We show that f31 = 0. Let All be the first nonzero entry of r1. Then by
the definition of echelon form, when (6.15) is expressed as a linear com-
bination of the basis vectors {a1' ... ,an}, the coefficient of aj is AlIf31'
which must be equal to zero. Since All # 0, we have f31 = 0. Now
consider the vectors {b 2 , ••• ,bm}. The coefficient matrix of these vectors
has m - 1 rows, which are still in echelon form (why?). The relation
(6.15) now has the form
f3 2b2 + ... + f3mbm,
and applying our induction hypothesis we have
f32 = ... = f3m = 0.
This completes the proof of the lemma.
° Ali ... )
... ,
°° °°
such that either all but the first row of A' consists of zeros, or rows
{r;, ... , rD of A' are in echelon form and the remaining rows are zero.
It is then clear from the definition of echelon form that all the nonzero rows
of A' are in echelon form, and the first statement is proved.
Let {b~, ... , b~} be the vectors corresponding to the first k rows of
A'. By Theorem (6.11),
S(b 1 , .•• , bm ) = S(b~, ... , b~) ,
and by Lemma (6.14), the vectors {b~, ... , b~} are linearly independent.
SEC. 6 ROW EQUIVALENCE OF MATRICES 47
Therefore {b~, ... , b~} is a basis for S(b 1 , ••• ,bm). Moreover, any other
matrix satisfying the conditions satisfied by A' will have the property that
its nonzero rows give a basis for S(b 1 , ••• ,bm). It follows from Theorem
(5.3) that the number of nonzero rows of A' is uniquely determined. At
this point we have proved statements (i) and (ii) of the Theorem. State-
ment (iii) follows, as in Example A, by another application of Theorem
(5.3). In fact, the vectors {b 1 , ••• , b m} are linearly independent if and
only if m is the number of basis vectors of S(b 1 , ••• ,bm). But the
number of basis vectors in a basis of S(b 1 , ••• , bm) is equal to the number
of nonzero rows in A', which is k. This completes the proof of the
theorem.
Example C. We give one more example of how the techniques in this section
are used.
Find a basis for the subspace of R3 generated by the vectors (1, 3, 4>,
<4,0, 1>, <3, 1,2>. Test the vectors for linear dependence.
Following our procedure, we express the vectors in terms of a basis
of R 3 , consisting of the unit vectors {el. e2, e3}:
hl = <1 • 3. 4> = el + 3e2 + 4e3
b2 = <4,0.1> = 4el + e3
b3 = <3, 1 • 2> = 3el + e2 + 2e3.
The matrix of coefficients is
A = (! ~ i).
312
We proceed to find a matrix A' "" A whose first k-rows are in echelon
form.
-1~) -1~)
II 3 II 3
-12 -12
A""
G 1 G -8 -10
III 3 II 3
G 1
-8 -j) G 1 24)
0
~ .
Notice how the elementary row operation of type III was used to make the
first nonzero entry in the second row equal to one. This step, while not
essential, will often simplify the calculations.
Applying Theorem (6.16), we conclude that the vectors
bi = el + 3e2 + 4e3
b~ = e2 + ie3
form a basis for S(b 1 , b2 , b3), and that the original vectors {b 1 , b2 • b3}
are linearly dependent.
48 VECfOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
EXERCISES
1. For each of the following matrices A, find a matrix A' - A such that the
nonzero rows of A' are in echelon form.
1 1
° -~)
1 2 1 -1 1
a.
(
2 3
121 -1
b.
(
° 2
1 2 -D c. °3 1
-1
2. Test the following sets of vectors for linear dependence. Assume the
vectors all belong to Rn for the appropriate n.
a. < -1, I), (1,2), (1, 3).
b. <2, I), (1, 0), < -2, I).
c. (1,4,3) (3, 0, I), <4, 1,2).
d. (1, 1,2), <2, 1,3), <4, 0, -I), < -1,0, I).
e. <0,1, 1,2), <3, 1,5,2), < -2,1,0, I), (1, 0, 3, -I).
f. <I, 1,0,0, I), < -1, 1, 1,0,0), <2, 1,0, I, I), <0, -I, -1, -1,0).
3. Find bases in echelon form for the vector spaces with the sets of vectors
as generators given in parts (a) to (f) of Exercise 2.
4. Let fl' f2' f3 be functions in g;(R).
a. For a set of real numbers XI' x 2 , x 3 , let (J;(x j » be the 3-by-3 matrix
whose (i, j) entry is J;(x), for 1 ~ i, j ~ 3. Prove that the functions
fl' f2' f3 are linearly independent if the rows of the matrix Ulx) are
linearly independent.
b. Assume the functions fl' f2' f3 have first and second derivatives on some
interval (a, b), and let W(x) be the 3-by-3 matrix whose (i,j) entry is
fV-I), for 1 ~ i, j ~ 3, where PO) = f, f(1) = J' and p 2) = f" for a dif-
ferentiable function J. Prove that fl' f2' f3 are linearly independent if
for some X in (a, b), the rows of the matrix W(x) are linearly independent.
Show that the following sets of functions are linearly independent.
c. fl(x) = _x 2 + X + 1,J2(x) = x 2 + 2x, f3(X) = x 2 - l.
d. fl(x) = e- x ,J2(x) = X,J3(X) = e h .
e. fl(x) = e X ,J2(x) = sin X,J3(X) = cos x.
Note that if the tests in (a) or (b) fail, it is not guaranteed that the functions are
linearly dependent.
5. Test the following sets of polynomials for linear dependence. (Hint: Use
Exercise 3, p. 37.)
a. x 2 + 2x + 1,2x + 1,2x2 - 2x - l. b. 1, X - I, (x - 1)2, (x - 1)3.
(7.1) LEMMA. If{al> ... , am} is linearly dependent and if{al,···, am-I}
is linearly independent, then am is a linear combination of aI, ... , am-I.
Proof By the hypothesis, we have
AlaI + ... + Amam = 0,
where some Ai # 0. If Am = 0, then some AI # ° for 1 ~ i ~ m - 1,
and the equation of linear dependence becomes
as we wished to prove.
Then S + T = S(Ul, ... , uc , Vl, .•. , Va, W1 , ••• , we), and Theorem (7.5)
will be proved if we can show that these vectors are linearly independent.
Suppose we have
and hence there exist elements of F, ~l' ••• , ~c, such that
EXERCISES
Find dim S, dim T, dim (S + T), and, using Theorem (7.5), find dim
(S n T).
6. Let F be the field of 2 elements, t and let V be a two-dimensional vector
space over F. How many vectors are there in V? How many one-
dimensional subspaces? How many different bases are there?
where the aij and f31 are fixed real numbers and the Xl, .•• , Xn are the
unknowns. The indexing is chosen such that for 1 ::;; i ::;; m, the ith
equation is
where the first index appearing with aij stands for the equation in which
atj appears and the second index j denotes the unknown Xj of which aii
is the coefficient. Thus an is the coefficient of Xl in the second equation,
and so forth.
The matrix whose rows are the vectors
is called the coefficient matrix of the system (8:1). (We recall that in
Section 6 matrices were introduced in a slightly different situation.)
As in Section 6, we use the notation
for the matrix with rows rl, ... , rm as above. The entries {ali} of the
matrix are arranged in such a way that ali is the jth entry of the ith row.
For example, a13 is the third entry in the first row, a2l the first entry in the
second row, and so forth. We shall often use the more compact notation
A or (au) to denote matrices.
Although matrices were defined in terms of their rows, the notation
suggests that with each m-by-n matrix we can associate two sets of vectors
in the vector spaces Rm and Rn, respectively, namely, the row vectors
54 VECTOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
We should also notice that row and column vectors are special kinds of
matrices, so we may write (using boldface for matrices, as usual),
The row subspace of the m-by-n matrix (ali) is the subspace S(r1, ... , rm)
of Rn, and the column subspace is the subspace S(C1, ... , cn) of Rm.
A solution of the system (8.1) is an n-tuple of real numbers {A1 , ... , An}
such that
In other words, the numbers {AI} in the solution satisfy the equations (8.1)
upon being substituted for the unknowns. We may identify a solution
with a vector in Rn and may therefore speak of a solution vector of the
system (8. I). Recalling the definition of the column vectors, we see that
x = <A1, ... , An) is a solution of the system (8.1) if and only if
(8.2)
where b = <f31, ... , f3n), and we may describe the original system of
equations by the more economical notation:
(8.3)
A system of homogeneous equations, or a homogeneous system, is a
system (8.3) in which the vector b = 0; if we allow the possibility b =F 0,
we speak of a nonhomogeneous system. If we have a homogeneous system,
(8.4)
then the zero vector <0, ... ,0) is always a solution vector, called-the
trivial solution. A solution different from <0, ... , 0) is called a nontrivial
solution.
SEC. 8 SYSTEMS OF LINEAR EQUATIONS 55
The next result is immediate from our definitions and Theorem (8.5).
(8.11)
Xo + x is a solution of (8.10) and all solutions of(8.10) can be expressed in
this form.
Proof Suppose first that x = <al, ... , an> is a solution of the homoge-
neous system and that Xc = <aiO) , ... , a~O» is a solution of (8.10). Then
we have:
(8.12)
and
Example A. Te~t the following system of equations for solvability, and find
a solution if there is one:
Xl + 2X2 - 3X3 + X4 = 1
Xl + X2 + X3 + X4 = O.
In this case, the matrix of the system is
-3
1
(8.14) -3 1 1).
1 1 0
Rewriting the system in the form (8.3), we have
where
Cl = <1, I), C2 = <2,1), C3 = <-3, I), C4 = <1, I), b = <1,0).
The rank of the matrix is
58 VECTOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
It follows that S(C1, C2, C3, C4) = R 2 , and hence bE S(C1, C2, C3, C4)'
By Theorem (8.5), we conclude that a solution exists.
To find a solution, it is more convenient to work with the rows of
the augmented matrix (8.14). We shall prove that if a 2-by-5 matrix
associated with the new matrix A' has exactly the same solutions as the
original system. When this occurs, we shall call the new system
equivalent to the original system of equations. It is sufficient to check
this statement in case A' is obtained from A by a single elementary row
operation. For an elementary row operation of type I (interchanging
two rows) or III (multiplying a row by a nonzero constant) the result is
clear. In order to discuss an elementary row operation of type II, let us
write the original equations in the form
L1 =0
L2 = 0,
where L1 = Xl + 2X2 - 3X3 + X4 - 1, L2 = Xl + X2 + X3 + X4. We
have to prove that if L~ = L1 + >'L 2 , L; = L 2 , then the solutions of
L1 = 0 and L2 = 0 are the same as the solutions of
L~ = L1 + >.£2 = 0, L; = L2 = O.
Certainly, L1 = 0 and L2 = 0 imply L~ = 0 and L2 = O. On the other
hand, if L~ = 0 and L; = 0, then L2 = 0, and L1 + >.£2 = 0, so that
L1 = O. This completes the proof.
Solving the second equation for the yariables X2, X3, X4 which were not
eliminated by putting the matrix in echelon form, we have, for example,
(~
1 1
II 4 7
2
0 -3
3
1)
(~
1 1
2 3
0
0 -3
1
-1)
(~
1 1
2 3
(" 1 1
-2 0
1 1
4
1
1
-2 -i) (" -3
-2
1-i) 1
0
1 -2
1
4
1
(~ -i)
1 1
4 7
2 3
0 -3
(~ -i)
1 1
2 3
0 1
0 -3
1 1
G -i)
2 3
0 1
0 0
SEC. 8 SYSTEMS OF LINEAR EQUATIONS 61
EXERCISES
b. Xl + X2 - X3 = 3.
Xl - 3X2 + 2x3 = 1.
2Xl - 2X2 + X3 = 4.
c. Xl + X2 - 5X3 = - 1.
d. 2Xl + X2 + 3X3 - X4 = 1.
3Xl + X2 - 2X3 + X4 = O.
2Xl + X2 - X3 + 2X4 = - 1.
e. -Xl + X2 + X4 = O.
X2 + X3 1.
f. Xl + 2X2 + 4X3 = 1.
2Xl + X2 + 5X3 = O.
3Xl - X2 + 5X3 = O.
g. 3x l + 4X2 = - I.
-Xl - X2 = 1.
Xl - 2X2 = O.
2Xl + 3X2 = O.
h. 2Xl + X2 - X3 = O.
Xl - X3 = O.
Xl + X2 + X3 = I.
i. Xl + 4X2 + 3X3 = 1.
3Xl + X3 = 1.
4Xl + X2 + 2X3 = I.
2. For what values of a does the following system of equations have a
solution?
3Xl - X2 + aX3 = 1
3Xl - X2 + X3 = 5
As in the earlier sections in this chapter, this section begins with some
general facts about solutions of systems of homogeneous equations,
whose purpose is partly to provide a language for talking about the
problems in a precise way, and partly to obtain some theorems which
enable us to predict what will happen in a particular case before becoming
bogged down in numerical calculations. Later in the section an efficient
computational method will be given for solving particular systems of
equations.
From this theorem and the results of the preceding sections, we will
know all solutions of a homogeneous system as soon as we find a basis for
the solution space, that is, the set of solutions of the system.
Then for r + 1 ~ i ~ n,
Uj = <.W) , ... , ,\~t) , 0, ... , 0, - 1, 0, ... , 0)
I i I
with the - I in the ith position of Uj, are solutions of the system. It
remains to show that the {UI} are linearly independent and that they
generate the solution space.
To show that they are linearly independent, suppose we have a
possible relation of linear dependence:
(9.3) for 11-1 E R.
The left side is a vector in Rn all of whose components are zero. For
r + 1 ~ i ~ n, the ith component of (9.3) is - 11-1 (why?), and it follows
that I1-r+1 = ... = I1-n = O. Now let x = <a1, •.• , an) be an arbitrary
solution of the original system. From the definition of Ur + 1, ••. , Un, we
have
m
X + k=r+1
L: akuk = <gl, ... , g" 0, ., .,0)
where gl, ... , gr are some elements of R. Since the set of solutions is a
subspace of R n , the vector y = <gl, ... , g" 0, ... ,0) is a sblution of the
original system and we have, by (8.4),
64 VECfOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
This result is immediate from our definition of the rank as the dimen-
sion of the column space of the coefficient matrix.
We shall now apply our result to derive a useful and unexpected
result about the rank of a matrix
all ... a 1n)
A= ( ........... .
ami ... amn
with columns Cl, ... , Cn and rows rl, ... , rm • Let us define the row rank
of A as the dimension of the row space S(r 1 , ••• , rm). Some writers call
the rank as we have defined it the "column rank" but, as the next
theorem shows, the row rank and the column rank are always equal.
With the matrix A, let us consider the homogeneous system
(9.5)
It is convenient for this proof and for some arguments in the next section
to use the notation
for the two vectors rj and x, so that the system (9.4) can be described
also by the system of equations
rl· x = 0
(9.6)
rm· x = 0
The" inner product" r . x has the property that
('\r + p.s). x = ,\(r· x) + p..(s· x), for ,\ and p.. E R,
and for rand s E Rn .
Example A. Find a basis for the solution space of the system of homogeneous
equations
Xl + 2x2 - 3X3 + X4 = 0
Xl + X2 + X3 + X4 = O.
66 VECTOR SPACES AND SYSTEMS OF LINEAR EQUATIONS CHAP. 2
G1
2 -3
1 (~ -1
2 -34 01).
An equivalent system is therefore
(9.9)
Xl + 2X2 - 3X3 + X4 = 0
OXl - X2 + 4X3 + OX4 = O.
Since the rank of the coefficient matrix is 2 [ by computing the ro~ rank,
for example, taking account of Theorem (9.7) 1, we know that the
dimension of the solution space is 4 - 2 = 2. The second equation of
(9.9), viewed as an equation in {X2, X3, X4}, has two linearly independent
solutions given by
(1, i, 0) and <0,0, 1).
Substituting these in the first equation gives us two linearly independent
solutions for the original system:
Example B. Find the general solution (i.e., all solutions) of the system of
nonhomogeneous equations
Xl + 2X2 - 3X3 + X4 =
Xl + X2 + X3 + X4 = O.
By Theorem (8.9), the general solution is given by
U = Xo +X
where Xo is a solution of the nonhomogeneous system and X ranges over
the solutions of the homogeneous system. Applying the results of
Example A of Section 8 and Example A of this section, we have for the
general solution,
Xl + X2 - 2X3 + X4 =0
- Xl - X2 + X3 + 3X4 = O.
2Xl + 2x2 + 5X3 = O.
SEC. 9 SYSTEMS OF HOMOGENEOUS EQUATIONS 67
-2 1 -2
(-i -1
2
1
5 i) G 0 -1
0
1 -2
9 -~)
G 0 -1
0 0
1) .
This time an equivalent system of equations is
Xl + X2 - 2x3 + X4 = 0
- X3 + 4X4 = 0
X4 =o.
The rank of the coefficient matrix (which can always be found by the
methods of Section 6) is 3; so the dimension of the solution space is
4 - 3 = 1. This time the last equation gives only the information
X4 = O. Substituting this information in the second equation yields
X3 = o. We get a basis for our solution space from the first equation,
which yields
u = (1, -1, 0, 0)
as the desired basis vector. (Remember always to substitute back in the
original equations as a check.)
Example E. The purpose of this last example is to show that the theory of
vector spaces is really deeper and more far-reaching than the study of
Rn and systems of linear equations. Let us return to the second problem
discussed in Section 1. We want to find all solutions of the differential
equation
68 VECTOR SPACES AND SYSTEMS OF UNEAR EQUATIONS CHAP. 2
EXERCISES
- Xl + 2X2 + X3 + 4X4 = 0
2Xl + X2 - X3 + X4 = 1.
4. In each of the problems in Exercise 2 of Section 6, find a relation of linear
dependence (with nonzero coefficients) if one exists.
5. In plane analytic geometry, given two points, such as (3, 1) and (-1,0),
a method is given for finding an equation
Ax+By+C=O
such that both points are solutions of the equation. Show that this
problem is equivalent to solving the homogeneous system
p = b + '\a.
By analogy we might then define a plane in Rn as a two-dimensional
FIGURE 2.7
70 VECTOR SPACES AND SYSTEMS OF LINEAR EOUATIONS CHAP. 2
FIGURE 2.8
Then
p - q = (b + a) - (b +a f
) = a - a f E S.
(10.4)
br·x =0
But these obviously do not solve the problem, since it may happen (and
usually does) that, say, b l • b l # 0, so that bi is not, in general, a solution
vector of (I 004). Let S* be the solution space of (lOA). By (9.7) the rank
of the matrix whose rows are b l , . . . ,br is r; hence S* has dimension
n - T, by (904). Let {c l , . . . , cn - r } be a basis for S* and consider the
system of equations
CI· X = 0
Cn - r • X =0
By the same reasoning, the solution space S** of this system has dimension
r and, clearly, S c S** (why?). Since dim S = r, we have by Exercise 3
of Section 7 the result that S = S**, and the lemma is proved.
- X4 = °
= 0.
The next result completes our geometrical interpretation of systems
of linear equations.
Then the nonhomogeneous system X1C1 + ... + XnCn = b has for its
solutions exactly the set y + S = V. This completes the proof of the
theorem.
For example, let us find a system of equations for the linear manifold
V whose directing space S has a basis
and contains the vector (I, 2, 3, 4). As we showed in the first example in
this section, S is a solution space of the system
- X4 = 0
= o.
As in the proof of Theorem (10.6), we substitute the vector (I, 2, 3, 4) in
the system to obtain
- 4 =-3
-1+2+3 4.
Then the equations whose solutions form the linear manifold V are
Xl - X4 =-3
- Xl + X2 + X3 4,
by Theorem (10.6).
EXERCISES
6. Find two distinct vectors on the line belonging to the intersection of the
hyperplanes:
Xl + 2X2 - Xa = - 1, 2X1 + X2 + 4xa = 2 in Ra ,
Xl + X2 = 0, X2 - X3 = 0, °
X2 - 2X4 = in R4 •
7. Prove that, if p and q are vectors belonging to a linear manifold V, then
the line through p and q is contained in V.
8. Let Sand T be subspaces of R n , which are represented as the solution
spaces of homogeneous systems
a1 • X = 0, ... , aT • X = °
and b1 • X = 0, ... , b• . X = 0,
respectively. Prove that SliT is the solution space of the system
a1 • x = 0, ... , aT • x = 0, b1 •X = 0, ... , b• . x = 0.
Use this remark to find a basis for SliT, where Sand T are as in
Exercise 5 of Section 7.
9. Let Sl = S(e1, e2, e3) in R 4 , where el = (1,0,0,0), e2 = <0, 1,0,0),
and e3 = <0,0, 1,0). Let S2 = S(al, a2, as), where al = (1, 1,0, 1),
a2 = <2, -1,3, -1), a3 = < -1,0,0,2). Find dim (Sl + S2) and
dim (Sl II S2). Find a basis for Sl II S2.
10. Find the point in R3 where the line joining the points <1, -1,0) and
< -2,1,1) pierces the plane
3X1 - X2 + X3 - 1 = 0.
Chapter 3
Linear Transformations
and Matrices
75
76 LINEAR TRANSFORMATIONS AND MATRICES CHAP. 3
f = f', if they have the same domain X and if, for all x E X,J(x) = f'(x).
The functionfis said to be one-to-one if Xl #- X2 in X implies thatf(x l ) #-
f(x 2). [Note that this is equivalent to the statement that f(x 1 ) = f(X2)
implies Xl = x 2 .] The function f is said to be onto Y if every Y E Y can
be expressed in the form y = f(x) for some X E X; we shall say thatfis a
function of X into Y when we want to allow the possibility that f is not
onto Y. A one-to-one function f of a set X onto a set Y is called a one-
to-one correspondence of X onto Y.
is one-to-one [because g'(X) > 0 for all x, so that the function is strictly
increasing in the sense that Xl < X2 implies g(Xl) < g(X2)]. But g is not
onto, since g(x) > 0 for all real numbers x. The function h: R* --+ R
defined by
hex) = loge lxi, xER* = {xER, x#- O}
is onto, but not one-to-one, since hex) = h( - x) for all x.
We are now ready to define linear transformations from one vector
space to another. Throughout this section, F denotes an arbitrary field.
(11.3)
Xl + X2 = Y
defines a function of R2 -+ Rl , namely, the function T: <a1 , a2> -+
<al + a2>. For example <2, - 3> -+ < -1>, <1, 1> -+ <2>, (1, -1>-+
<0>.
In general, the system (11.3) defines a function T from Fn into Fm,
which assigns to each n-tuple <Xl, ... , x n> in Fn the m-tuple <Yl , ... , Ym>
in Fm. We leave it to the reader to verify that the function defined by a
system (11.3), with a fixed coefficient matrix, is always a linear transforma-
tion of Fn -+ Fm .
For g E F we have
(S + T)(gv) = S(gv) + T(gv) = g [ S(v)] + g [ T(v) ]
= g( S(v) + T(v) ] = g( (S + T)(v) ] .
Turning to the mapping as, we have
(as)(vl + v2) = a [ S(v 1 ) + S(V2) ] = as(vl) + as(v2)
and, for g E F,
(as)(gv) = a [ S(gv) ] = a [ gS(v) ]
= (ag)S(v) = (ga)S(v)
= g [ (as)(v) ],
since F satisfies the commutative law for multiplication.
The linear transformation 0, which sends each v E V into the zero
vector, satisfies the condition thatt
T +0= T, TEL(V, W)
and the transformation - T, defined by
(-T)v = -T(v)
satisfies the condition that
T+(-T)=O.
The verification of the other axioms is left to the reader.
t The symbol 0 is given still another meaning, but the context will always indicate
which meaning is intended.
80 LINEAR TRANSFORMATIONS AND MATRICES CHAP. 3
and
[(S + T)U] (v) = (S + T)U(v) = S [ U(v)] + T [ U(v) ]
= (SU)(v) + (TU)(v) = (SU + TU)(v).
The properties of the identity transformation 1 are immediate. This
completes the proof of the theorem.
1. a + b = b + a.
2. (a + b) + c = a + (b + c).
3. There is an element 0 such that a + 0 = a for all a E fll and an ele-
ment 1 such that al = la = a for all a.
4. For each a E fll there is an element -a such that a + (-a) = O.
5. (ab)c = a(bc).
6. a(b + c) = ab + ac, (a + b)c = ac + bc.
7. If the commutative law for multiplication (ab = ba, for a, b E fll)
holds, then fll is called a commutative ring.
It is worth checking to what extent the proofs of the ring axioms for
L(V, V) depend on the assumption that the elements are linear transforma-
tions. To make this question mOre precise, let M(V, V) be the set of all
functions T: V ~ Vand define S + Tand STas for linear transformations.
It can easily be verified that all the axioms for a ring hold for M(V, V),
with the exception of the one distributive law
S(T+ U) = ST+ SU,
which actually fails for suitably chosen S, T, and U in M(V, V).
The mapping T: (ex, fJ) ~ (fJ, 0), for ex, fJ E R, is a linear transforma-
tion of R2 such that T '# 0 and T2 = O. It is impossible for T to have a
reciprocal T such that TT = 1, since TT = 1 implies
T(TT) = T· 1 = T,
while, because of the associative law,
T(TT) = T2T = 0 . t = 0,
which produces the contradiction T = O. Because of this phenomenon,
it is necessary to make the following definition.
Similarly, let T(w) = v, and let a E F. Then T(aw) = aT(w) = av, and
we have
T-l(av) = aw = aT-lev),
completing the proof that T- l EL(V, V). Finally, to check that TT- l =
T-1T = 1, we have, for v E V, and w such that T(w) = v,
TT-l(v) = T(w) = v = lv,
and T-1T(w) = T-l(v) = w = lw.
Since these equations hold for all vectors v and w, we conclude that Tis
invertible, and the theorem is proved.
If two vectors spaces are isomorphic, then they have the same structure
as vector spaces, and every fact that holds for one vector space can be
translated into a fact about the other. In fact, this is the reason for using
the word "isomorphic," derived from Greek words meaning "of the same
form." For example, if one space has dimension 5, so does the other,
since the given isomorphism will carry a basis of one space onto a basis of
the other. In case there is one isomorphism T: V ~ W, there are many
other isomorphisms different from T, and in translating properties of V
to properties of W, we have to be careful to indicate which isomorphism
we are using.
with coefficients at E R. Since {Vl, V2, V3} are linearly independent, the
coefficients {al , a2, a3} are uniquely determined by the vector v (why?).
Therefore, the rule
T(v) = <al, a2, a3>
defines a function T: V -+ R 3 • We have to check that T is linear, and
one-to-one, and onto. Suppose v is given as above, and let a E R. Then
Therefore,
SEC. 11 LINEAR TRANSFORMATIONS 85
Similarly, T(v + v') = T(v) + T(v'), since the coefficients of v + v' with
respect to the given basis are the sums of the coefficients of v and v',
respectively. The fact that every vector in V is a linear combination of
{Vl, V2, V3} shows that T is onto. Finally, suppose T(v) = T(v') =
<Ul, U2, U3>' Then v = U1Vl + U2V2 + U3V3 = v', and T is one-to-one.
Of course, there is (nothing special about dimension 3; the same
argument will show that a vector space of dimension n over R is isomorphic
to Rn.
T: Yl = - Xl + 2X2 U: Yl=Xl.
Y2 = 0 Y2 = Xl
Find systems of equations defining T + U, TU.
We have T«Ul, U2» = <- Ul + 2U2, 0>, while U«fJl, fJ2» =
<fJl, fJl>' Therefore,
(T + U)«Ul' U2» = T(Ul' U2> + U<Ul, U2>
= < - Ul + 2U2, 0> + <Ul , Ul>
. = <2U2, Ul> .
Therefore a system of equations defining T + U is given by
i = 1, ... , m,
Xl - 2X2 + X3 = 0
Xl + X3 = 0
i = 1, ... , m.
A=G -2
o
This is certainly the case, since A has rank 2, and we conclude that the
transformation is onto.
SEC. 11 LINEAR TRANSFORMATIONS 87
EXERCISES
The proof is the same as the verification that Fn is a vector space and
is omitted.
The definition of matrix multiplication is not as obvious. In order
to start, let us consider a problem similar to one discussed in the problems
in Section I I .
Let T: R2 ~ R3 be the linear transformation
Yl = Xl + X2
(12.3) Y2 = -Xl + X2
Y3 = Xl
U-<+B= (-I °° I)
110,
-I 2
The linear transformation UT maps R2 into R 3 . Let us work out the
system of equations defining it.
If
T(x) = y, U(y) = Z
then
UT(x) = U(y) = z.
Rewriting the system for U so that it carries {Yl, Y2, Ya} ~ {Zl' Z2, za}, we
have the system
Zl = -Yl + Ya
(12.4) U: Z2 = Yl + Y2
Za = -Yl + 2Ya
Now we can substitute for {Yl, Y2, Ya} using (12.3) to find the system
associated with UT:
Zl =- (Xl + X2) + Xl = - X2
(12.5) Z2 = (Xl + X2) + (- Xl + X2) = 2X2
Za =- (Xl + X2) + 2Xl Xl - X2
which is associated with the matrix
90 UNEAR TRANSFORMATIONS AND MATRICES CHAP. 3
Then the entry in the (I, I) position of the product matrix BA may be
obtained by taking the first row ofB and the first column of A, multiplying
corresponding elements together, and adding. The other entries are seen
to be obtained by the same process.
We can now make a formal definition.
where
)/11 = (-1,0,1)( -D = °
)/12 = (-1,0, 1)(D = -1
)/21 = (1, 1, 1)( - D= 1
)/22 = (1, 1, l)(D = 2
(1, 1, l)(D
to stand for multiplying corresponding elements and adding. Thus
(iii)
(iv) (1 1) = (2
-1
2).
92 LINEAR TRANSFORMATIONS AND MATRICES CHAP. 3
The next result is perhaps the most useful fact about matrix
multiplication.
(12.8)
we have
with zeros except in the (1, 1), (2, 2), ... , (n, n) positions where the entries
are all equal to one.
rank n. Therefore the matrix whose rows are in echelon form, which we
obtained in the first part of the argument, will have the form
A = (~ -1)o .
We do the work in two columns; in one column we apply elementary row
operations to reduce A to the identity matrix, and in the other column we
apply the same elementary row operations to 1.
A",
(~ -1)
t
I",
(-~ ~)
(~ -i) (-i ~)
(~ ~) (- ~ i)·
We conclude that
A-1 =( 0
-1
t)t '
and can easily check that this is the case.
be a function such that if x = <Xl, X2), then T(x) = Y = <Yl, Y2, Y3)
where
Yl = Fl(Xl, X2)
Y2 = F2(Xl, X2)
Y3 = F3(Xl, X2) ,
where the notation (.)x means that the derivatives are evaluated at
x. Now suppose U: R3 --+ R3 is a function such that
U(y) = z, where Z = <Zl, Z2, Z3), and
Zf = Zf(Yl , Y2 , Y3),
for 1 :5 i :5 3. Suppose that the partial derivatives ozt/oYJ also exist and
are continuous at the point Y and form the jacobian matrix
The point is that these chain rules can all be expressed (and remembered!)
in the simple form
COS X)
= ( -s~nx .
Moreover,
OZl OZl OZl)
J( U)y = ( °Y1 0Y2 °Y3
OZ2 OZ2 OZ2
0Y1 0Y2 0Y3
= (Y2
o
Y1
Y3
0)
Y2 •
EXERCISES
D = Cl :J
to denote a diagonal matrix. Figure out the effect of multiplying an
arbitrary matrix on the left and on the right by a diagonal matrix D.
5. Two matrices A and B are said to commute if AB = BA. Prove that the
only n-by-n matrices which commute with all the n-by-n diagonal matrices
over F are the diagonal matrices themselves.
6. Verify the statements made in Example C about the effect of multiplying
an arbitrary n-by-n matrix A on the left by the elementary matrices
PI!, BllA}, and DI(p.}.
SEC. 13 LINEAR TRANSFORMATIONS AND MATRICES 99
7. Test the following matrices for invertibility, and if invertible, find the
inverse by the method of Example C.
a. (-22 -1)1 b.
(21 1)1
c. G _~ !)
e. (~ i ! !),
1 I 0 1
8. Show that the equation Ax = b, where A is an n-by-n matrix, and x and
b column vectors, has a unique solution if and only if A is an invertible
matrix. In case A is invertible, show that the solution of the equation is
given by x = A -lb.
(13.1) THEOREM. Let {Vl' ... , v n} be a basis of V over F. 'f Sand Tare
elements of L(V, W) such that S(VI) = T(Vi), 1 ~ i ~ n, then S = T.
Moreover, let Wl, ... , Wn be arbitrary vectors in W. Then there exists
one and only one linear transformation T E L(V, W) such that T(v t ) = WI.
Proof Let v = L~ giVI. Then S(VI) = T(V,), 1 ~ i ~ n, implies that
Now consider a fixed basis {VI' ... , vn } of V over F and for simplicity
let T E L(V, V). Then for each i, T(vj) is a linear combination of
VI, ... , Vn , and the coefficients can be used to define the rows or columns
of an n-by-n matrix which together with the basis {VI' ... , vn } determines
completely the linear transformation T, because of the preceding theorem.
The question whether we should let T(vj) give the rows or columns of the
matrix corresponding to T is answered by requiring that the matrix of a
product of two transformations be the product of their corresponding
matrices.
For example, let V be a two-dimensional vector space over F with
basis {VI' V2}' Let Sand Tin L(V, V) be defined by
S(VI) = -VI + 2V2, T(v l ) = 2VI + 3V2,
S(V2) = VI + V2, T(V2) = -V2'
Then ST is the linear transformation given by
ST(VI) = S(2VI + 3V2) = 2( - VI + 2V2) + 3(VI + V2) = VI + 7V2,
ST(V2) = S( -V2) = -(VI + V2) = -VI - V2'
The matrices corresponding to S, T, ST, if we let S(Vj) correspond
to the rows of the matrix of S, etc., are respectively
3) =F ( -11
-1 -D'
Let us see if we have better luck by letting S(v;) correspond to the columns
of the matrix of S, etc. In this case the matrices corresponding to
S, T, ST are respectively
(13.2) DEFINITION. Let {VI' ... , vn } be a basis of Vand let T E L(V, V).
The matrix of T with respect to the basist {VI' ... , vn} of V is the n-by-n
t The matrix of T depends not only on the set of basis vectors {Vl, ... , vn }, but on
the order in which they are given. The order is usually clear from the context, but
in complicated situations we shall sometimes say ordered basis, to emphasize a
particular order we have in mind.
SEC. 13 LINEAR TRANSFORMATIONS AND MATRICES 101
matrix whose ith column, for I :::; i :::; n, is the set of coefficients obtained
when T(vj) is expressed as a linear combination of V1 , ... , Vn . Thus the
matrix (a r.) of T is described by the equations
To give another example, let Tbe the linear transformation ofa three-
dimensional vector space with basis {V1' V2, vs} such that
T(V1) = 2V1 - 3vs
T(V2) = V2 + 5vs
T(vs) = V1 - V2'
Then the matrix of T with respect to the basis {V1' V2, vs} is
20
( o I
-3 5 -i)·
Let us check whether Definition (13.2) is consistent with the interpreta-
tion of the matrix of a linear transformation given by a system of equations.
For example, let T be defined by the system
Y1 = 3X1 - X2
Y2 = Xl + 2X2'
Let
then e1 and e2 form a basis for R 2 , and we can compute the matrix of T
with respect to this basis according to Definition (13.2). Using the results
of the last example in the preceding section, we have
(i -~).
The reader can easily check that for a general system of n equations in n
unknowns with matrix A, the matrix of the corresponding linear trans-
formation with respect to the basis {e1' ... , en} defined as above, is A,
according to Definition (13.2).
102 LINEAR TRANSFORMATIONS AND MATRICES CHAP. 3
n n n
= L: (JJjT(vj)
j=1
= L: {Jjj L: a/<jV/<
J=1 /<=1
I (I
= /<=1 j=1 a/<j{Jf;)v/< .
Thus the (k, i) entry of the matrix of TS is L:7 =1 a/<j{Jj;, which is also the
(k, i) entry of the product (ajj)({Jjj). This completes the proof.
The proof is immediate by Theorems (13.3) and (11.6) and does not
require any computation at all.
We assert that {W1' ... , wn } is another basis of V if and only if the matrix
(/1-ij) is invertible [see Definition (12.10)]. To see this, suppose first that
{W1' ... , wn}'is a basis. Then we can express each
J=l
Ln 7]kj/1-ji = {I
0
if i = k
ifi#k
and we have pro~ed that (7]ij)(/1-ij) = I. Similarly, (/1-I})(7]lj) = I and we
have shown that (/1-jj) is an invertible matrix. We leave as an exercise the
proof that if (/1-1}) is invertible then {W1' ... , wn } is a basis.
(13.6) THEOREM. Let {V1' ... , vn } be a basis of V and let {W1' ... , wn }
be another basis such that
n
Wj L /1-fjVj
= j=l
where (/1-rs) is an invertible matrix. Let T E L(V, V), and let (al}) and
104 LINEAR TRANSFORMATIONS AND MATRICES CHAP. 3
(a;j) be the matrices of T with respect to the bases {VI' ... ,vn } and
{WI' ... , wn}, respectively. Then we have
(fLI1)(a;j) = (aIJ) (fLi1)
or
Proof. We have
T(wI) = T( i: fLji Vf ) = i
1=1 1=1
fLji( i
k=l
akJvk)
= i (i akJfLJI) Vk .
k=l 1=1
Therefore
n n
1=1
L: fLk1 a;1 L: akffL1i,
= 1=1 1 ~ i, k ~ n,
where S is the matrix whose columns express the new basis in terms of the
original one:
A= ( 1 1)
-1 2 .
Let us find the matrix B of T with respect to the basis {Ul = el + e2,
U2 = el - e2}. We have
T(Ul) = T(el) + T(e2) = 2el + e2,
T(U2) = T(el) - T(e2) = - 3e2 .
In order to express T(Ul) and T(U2) as a linear combination of Ul and
U2, we have to solve for el and e2 in terms of Ul and U2. We obtain
and
T(Ul) = tUl + !-U2,
T(U2) = -tUl + tU2.
The matrix B of T with respect to the basis {Ul, U2} is
B = (tt -t)
3
"2
•
To conclude this section, we apply our basic results about basis and
dimension from Section 7, together with the results of this section, to
obtain some useful theorems about linear transformations. First of all,
let Vand Wbe finite dimensional vector spaces over F, and let TE L(V, W).
Then the subsets
T(V) = {w E WI w = T(v) for some v E V}
and
neT) = {v E V I T(v) = O}
106 UNEAR TRANSFORMATIONS AND MATRICES CHAP. 3
(13.8) DEFINITION. Let T E L(V, W), and let T(V) and neT) be the
subspaces of Wand V, respectively, defined above. Then the subspace
T(V) of W is called the range of T, and its dimension is called the rank of
T. The subspace neT) of V is called the null space of T, and its dimension
is called the nullity of T.
The next result is certainly the most useful and important theorem
about linear transformations we have had so far.
i:
T( i=k+l 7J!V!) = 0,
and If= k + 1 7JiV! E neT). Because of the way the basis {Vi} for V was chosen,
it follows that all the {7J!} are zero, and the theorem is proved.
Proof We prove that (1) implies (2), (2) implies (3), and (3) implies
(1). First we assume (1). Then there exists T- 1 EL(V, V) such that
IT- 1 = T-1T = 1. Suppose T(V1) = T(V2). Applying T- 1 we obtain
T-1T(V1) = T- 1T(V2), and V1 = V2. Thus (1) implies (2).
Next assume (2). Then the null space neT) = O. By Theorem (13.9)
we have dim T(V) = n, and it follows that T(V) = V, and that Tis onto.
Finally, assume T is onto. By Theorem (13.9) again, T is also
one-to-one. It follows that if {V1 , ... , Vn} is a basis for V, then {T(v 1), ... ,
T(v n )} is also a basis. By Theorem (13.1), there exists a linear transforma-
tion U such that UT(vj) = VI> i = 1, ... , n. By (13.1) again, we have
UT = 1. On the other hand, TU(Tvj) = TVj for i = 1, ... , n, and
since {Tvj, ... , Tv n} is a basis, we have TU = 1. Therefore T is invertible,
and the theorem is proved.
EXERCISES
In all the Exercises, all vector spaces involved are assumed to have finite bases,
and the field F can be taken to be the field of real numbers in the numerical
problems.
1. Let S, T, U E L( V, V) be given by
S(U1) = U1 - U2, T(ul) = U2, U(Ul) = 2Ul
S(U2) = Ul, T(u2) = Ul, U(U2) = -2U2
where {Ul , U2} is a basis for V. Find the matrices of S, T, U with respect
to the basis {Ul, U2} and with respect to the new basis {Wl, W2} where
Wl = 3Ul - U2
W2 = Ul + U2.
Find invertible matrices X in each case such that X-lAX = A' where A
is the matrix of the transformation with respect to the old basis, and A'
the matrix with respect to the new basis.
2. Let S, T, U be linear transformations such that (letting {Ul, U2} or
{Ul, U2, ua} be bases of the vector spaces)
S(Ul) = Ul + U2,
S(U2) = -Ul - U2,
T(ul = Ul - U2,
T(u2) = 2U2,
U(Ul) = Ul + U2 - Ua
U(U2) = U2 - 3ua
U(Ua) = -Ul - 3U2 - 2ua.
The vector spaces V and Ware isomorphic via the bases {vJ and {Wj} to
the spaces Fm and Fn, respectively (see Section II, Example E). Show
that if x E Fm is the column vector corresponding to the vector x E V
via the isomorphism, then Ax is the column vector in Fn corresponding
to Tx. In other words, the correspondence between linear transforma'-
tions and matrices is such that the action of T on a vector x is realized
by matrix multiplication Ax.
Chapter 4
The word symmetry has rich associations for most of us. A person familiar
with sculpture and painting knows the importance of symmetry to the
artist and how the distinctive features of certain types of architecture and
ornaments results from the use of symmetry. A geologist knows that
crystals are classified according to the symmetry properties they possess.
A naturalist knows the many appearances of symmetry in the shapes of
plants, shells, and fish. The chemist knows that the symmetry properties
of molecules are related to their chemical properties. In this section we
shall consider the mathematical concept of symmetry, which will provide
some worthwhile insight into all of the preceding examples.
109
110 VEcrOR SPACES WITH AN INNER PRODUcr CHAP. 4
FIGURE 4.1
Let us begin with a simple example from analytic geometry, Figure 4.1.
What does it mean to say the graph of the parabola y2 = x is symmetric
about the x axis? One way of looking at it is to say that if we fold the
plane along the x axis the two halves of the curve y2 = x fit together.
But this is a little too vague. A more precise description comes from the
observation that the operation of plane-folding defines a transformation
T, of the points in the plane, which assigns to a point (a, b) the point
(a, -b) which meets it after the plane is folded. The symmetry of the
parabola is now described by asserting that if (a, b) is a point on the
parabola so is the transformed point (a, -b). This sort of symmetry is
called bilateral symmetry.
Now consider the example of a triod, Figure 4.2. What sort of
symmetry does it possess? It clearly has bilateral symmetry about the
three lines joining the center with the points a, b, and c. The triod has
also a new sort of symmetry, rotational symmetry. If we rotate points in
a b
FIGURE 4.2
SEC. 14 THE CONCEPT OF SYMMETRY 111
the plane through an angle of 120°, leaving the center fixed, then the triod
is carried onto itself. Does the triod have the same symmetry properties
as the winged triod of Figure 4.3? Clearly not; the winged triod has only
rotational symmetry and does not possess the bilateral symmetry of the
triod. Thus we see that the symmetry properties of figures may serve to
distinguish them.
FIGURE 4.3
FIGURE 4.4
112 VECTOR SPACES WITH AN INNER PRODUCT CHAP. 4
c
FIGURE 4.5
through an angle of 120° and let S be the bilateral symmetry about the
arm a. The reader may verify that the symmetry group G of the triod
consists of the symmetries
{I, R, R2, S, SR, SR2}.
We may also verify that S2 = I, R3 = I, and SR = R-1S and that these
rules suffice to multiply arbitrary elements of G.
The symmetry group of the winged triod is easily seen to consist of
exactly the symmetries
{I, R, R2}.
More generally, let X be the n-armed figure (Figure 4.6), S the bilateral
FIGURE 4.6
symmetry about the arm al, and R the rotation carrying a2 ~ al. Then
the group of X consists exactly of the symmetries
(14.3)
ay + fJ8 = o.
(14.5) A distance-preserving transformation T of R2 which leaves the zero
element fixed is a linear transformation.
FIGURE 4.7
Since both T and f are determined by their action on el and e2, we have
T= f.
FIGURE 4.8
116 VECfOR SPACES WITH AN INNER PRODUCf CHAP. 4
A = (af3 -a(3).
In the former case T is a rotation through an angle 8 such that cos 8 = a,
while in the latter case T is a bilateral symmetry about the line making an
angle t8 with el. Notice that in the second case the matrix of T2 is
FIGURE 4.9
EXERCISES
1. For vectors a = <Ul, U2), b = <fh, fJ2), define their inner product
+ u2fJ2' Show that the inner product satisfies the following
(a, b) = ulfJl
rules:
a. (a, b) = (b, a)
b. (a, b + c) = (a, b) + (a, c)
c. (M, b) = >..(a, b), for all >.. E R.
d. Iiall = yea, a)
e. a 1.. b if and only if (a, b) = o.
f. a J.. b if and only if Iia + bll
= Iia - bll. (Draw a figure to illustrate
this statement.)
2. A line L in R2 is defined to be the set of all vectors for the form p + x,
where p is a fixed vector, and x ranges over some one-dimensional subspace
8 of R 2 • Thus if 8 = 8(a), the line L consists of all vectors of the form
p + >"a, where>.. E R. We shall use the notation p + 8 for the line L
described above. t
a. Let p + 8 and q + 8 be two lines with the same one-dimensional
subspace 8. Show that p + 8 and q + 8 either coincide or have no
vectors in common. In the latter case, we say that the lines are
parallel.
b. Show that there is one and only one line L containing two distinct
vectors p and q, and that L consists of all vectors of the form p +
>..(q - p), >.. E R.
c. Show that three distinct vectors p, q, r are collinear if and only if
8(q - p) = 8(q - r).
d. Show that two distinct lines are either parallel or intersect in a unique
vector.
3. Let L be the line containing two distinct vectors p and q, and let r be a
vector not on L. Show that a vector u on L such that (u - r) J.. (q - p)
is a solution of the simultaneous equations
(u - r, q - p) = 0,
u = p + >..(q - p), >.. E R.
Show that there is a unique value of >.. for which these equations are
satisfied. Derive a formula for the perpendicular distance from a point
to a line. Test your formula on some numerical examples.
4. Let
t This definition of a line in R2 is identical with the definition given in Section 10;
no results from Section 10 are needed to do any of these problems, however.
SEC. 15 INNER PRODUcrS 119
In both cases the reader may verify that the functions defined are actually
inner products.
120 VECTOR SPACES WITH AN INNER PRODUCT CHAP. 4
-(u, v) ::;; 1.
Combining these inequalities we have I(u, v)l::;; I.
FIGURE 4.10
u
FIGURE 4.11
for the vectors u = (al, ... , an>, V = (fh, ... , f3n>. The unit vectors
form an orthonormal basis of Rn with respect to the inner product given
above.
(b) We shall prove that the functions
fn(x) = sin nx, n = 1,2, ...
124 VECTOR SPACES WITH AN INNER PRODUCT CHAP. 4
1I'"
;; _" SIn .
nx SIn mx dx = {10 n=m
n"* m.
First, for n = m we have
~ I"
TT _"
(sin nx)2 dx = ~ I"
TT _" 2
~ (1 - cos 2nx) dx,
(w,Uj) = (Wr+1 - ±
1=1
(Wr+1,Ui)UioUj)
r
= (Wr+1' Uj) - I
1= 1
(Wr+1' Ui)(UI> Uj)
FIGURE 4.12
126 VECTOR SPACES WITH AN INNER PRODUCT CHAP. 4
FIGURE 4.13
= <0,4> - .!
y 10
(-4)· .!
.y 10
<-3, -1>
(1,5)
FIGURE 4.14
Since (T(u), T(u)) = (u, u) and (T(v), T(v)) = (v, v), the last equation
implies that (T(u), T(v)) = (u, v) for all u and v.
Statement (2) implies statement (3). Let {Ul' ... , un} be an ortho-
normal basis of V; then
(UI' UI) = I, i ¥= j.
By statement (2) we have
(T(UI), T(ul)) = 1, i ¥= j,
and {T(Ul)' ... , T(u n)} is an orthonormal set.
Statement (3) implies statement (1). Suppose that for some ortho-
normal basis {Ul' ... , un} of V the image vectors {T(Ul), ... , T(u n)} form
an orthonormal set. Let
Thus statement (1) is proved and we have shown the equivalence of the
first three statements.
Finally, we prove that statements (3) and (4) are equivalent. Suppose
that statement (3) holds and let {Ul' ... , un} be an orthonormal basis for
V. Let
and, if i ¥= j,
EXERCISES
v = ~ (v, u,)u,.
'=1
6. Let U be a vector in Rn such that Ilull = 1 (for the usual inner product).
Prove that there exists an n-by-n orthogonal matrix whose first row is u.
7. Let D( V) be the set of all orthogonal transformations on V. Prove
D( V) is a group with respect to the operation of multiplication.
130 VECTOR SPACES WITH AN INNER PRODUCT CHAP. 4
8. Two vector spaces Vand W with inner products (V1' V2) and [ W1 , W2 ] ,
respectively, are said to be isometric if there exists a one-to-one linear
transformation T of V onto W such that [TV1' TV2 ] = (V1, V2) for all
Vl, V2 E V. Such a linear transformation T is called an isometry. Let V
be a finite dimensional space with an inner product (u, v), and let
{V1' ••• , v n } be an orthonormal basis. Prove that the mapping
where not all of a1, a2, a3 are zero. The two-dimensional subspace W
associated with the plane is the set of solutions of the homogeneous
equation
Note that if n = <a1, a2, a3>, then (n, w) = 0 for all WE W, and (n, p)
is a constant for all pEP. The vector n is called a normal vector to the
plane.
a. Let n be a nonzero vector in R 3 , and a a fixed real number. Show
that the set of all vectors p such that
(p, n) = a
3Xl - X2 + X3 - 1 = o.
c. Find the equation of the plane with normal vector n = <1, -1,2>,
and containingp = <-1,1,0>. State the equation both in the form
a1Xl + a2X2 + a3X 3 = f3
and (n,p) = a.
SEC. 15 INNER PRODUCTS 131
d. Show that a vector x lies on the plane with normal vector n, passing
through p, if and only if
(x - p, n) = O.
Determinants
Let us begin with a study of the function A(al' a2) which assigns to each
pair of vectors al , a2 E R2 the area of the parallelogram with edges al and
a2 (Figure 5.1). Instead of working out a formula for this function in
---I~./
FIGURE 5.1
132
SEC. 16 DEFINITION OF DETERMINANTS 133
terms of the components of al and a2, let us see what are some of the
general properties of the function A. We have, first of all,
FIGURE 5.2
FIGURE 5.3
(16.4) A(a l , a2) '# °if and only if al and a2 are linearly independent.
The statement (16.4) is pf -haps the most important and interesting
property of the area functiolc. It states that two vectors are linearly
dependent if and only if the area of the parallelogram determined by
them is zero. In other words, the computation of a single number (the
area) gives a test for linear dependence.
It is because of the obvious usefulness of the sort of test described in
(16.4) that we shall define a function with properties similar to the area
function in as general a setting as possible.
It will be convenient to begin by defining a function on sets of vectors
from Fn which satisfy Axioms (16.1), (16.2), and (16.3) for arbitrary
A E F. We shall derive consequences of these axioms in this section and
postpone to the next section the task of proving that such a function really
does exist. The reader will note that there is nothing logically wrong
with this procedure; it is, for example, what we do in euclidean geometry,
namely, to derive consequences of certain axioms before we have a
construction of certain objects that satisfy the axioms. We return to the
connection with areas and volumes in Section 19.
(i) D(al,"" ai -1, al + aj' ai + 1, ... , an) = D(al' ... ,an), for 1 :s;
i :s; nand j '# i.
(ii) D(al,"" ai-l, Aa;, ai+l, ... , an) = AD(al' ... , an), for all A E F.
(iii) D(el,"" en) = I, if ej is the ith unit vector
(b)
(c)
°
D(a l , ... , an) = if two of the vectors at and aj are equal.
D is unchanged if at is replaced by at + LNt Ajaj, for arbitrary
AjEF.
(d) D(a l , ... , an) = 0, if {al' ... , an} is linearly dependent.
(e) D(a l , ... , aj-l, Aai + /-ta;, aj+l, ... ,an)
= AD(al' ... , aj, ... , an) + /-tD(al' ... , a; , ... , an)
for 1 :s; i :s; n, for arbitrary field elements A and /-t, and for vectors aj
and a; E Fn.
to indicate that the ith argument of the function D is a and the jth is b,
etc. Then we have, for arbitrary i and j,
= D( ... , 0, ... )
We may assume that {a2' ... ,an} are linearly independent [otherwise,
by statement (d) both sides of (16.7) are zero and there is nothing to
prove]. By (7.4) the set {a2' ... , an} can be completed to a basis
(16.8) D(A1ii1 + L:I>l Alai' a2,···, an) = A1D(ii1' a2,·.·, an), for all
choices of A1 , ... , An'
Now let
By (16.8) we have
D(a1 + a~, a2, ... , an) = P'l + p.1)D(iil> a2, ... , an),
D(a1' a2, ... , an) = A1D(ii1' a2, ... , an),
D(a~, a2, ... , an) = P.1D(ii1, a2, ... , an).
By the distributive law in R we obtain (16.7), and the proof of the theorem
is completed.
Remark. Many authors use ~tatements (e) and (a) instead of Axiom (i)
in the definition of determinant. The point of our definition [using
Axiom (i)] is that we are assuming much less and can still prove the
fundamental rule (e) by using the fairly deep result (7.4) concerning sets
of linearly independent vectors in Fn.
D(A) =
The last notation for determinants is the most convenient one to use for
carrying out computations, but the reader is hereby warned that its use
has the disadvantage of having almost the same notations for the very
different objects of matrices and determinants.
The next theorem is the key to the practical calculation of deter-
minants; it is stated in terms of the elementary row operations defined in
Definition (6.9).
138 DETERMINANTS CHAP. 5
D(A') = - D(A) .
D(A') = D(A).
D(A') = jLD(A).
-1 0 1 1
2 -1 0 2
1 2 1 -1
-1 -1 1 0
We know from Sections 6 and 12 that we can first apply elementary row
operations to obtain a matrix A' row equivalent to the given matrix
-1 o 1
A = ( 2
-1
1
-1 0
2 I
-1 1
-i)
such that the nonzero rows of A' are in echelon form. At this point we
will know whether or not the rows of A are linearly independent. If the
rows are linearly dependent then D(A) = 0 by Theorem (I6.6)(d). If
the rows are linearly independent then the rows of A' will all be different
from zero, and from Section 12, Example C, we can apply further ele-
mentary row operations to reduce A' to the identity matrix. Theorem
(16.9) tells us how to keep track of the determinant at each step of the
process, and the computation will be finished using the fact that the
determinant of the identity matrix is 1, by Definition (16.5)(3).
SEC. 16 DEFINITION OF DETERMINANTS 139
-1 0 1 1 -1 0 1 1
2 -1 0 2 0 -1 2 4
-1 1 2 1 -1
(replacing a2 by a2 + 2al)
1 2 1
-1 -1 1 0 -1 -1 1 0
-1 0 1 1
0 -1 2 4 (replacing a3 by a3 + at.
0 2 2 0 and a4 by a4 - al)
0 -1 0 -1
-1 0 1 1
(row operations of
0 -1 2 4
type II applied to the
0 0 6 8
last two rows)
0 0 0 -5 +!
-1 0 0 0
0 -1 0 0 (more row operations of
0 0 6 0 type II as in Section 12)
0 0 0 -t
1 0 0 0
(elementary row
0 1 0 0
= (-1)(-1)6(-t) operations of
0 0 1 0
type III)
0 0 0 1
= -4l = -14.
We note that this whole procedure is easy to check for arithmetical errors.
EXERCISES
1
a. ~) b. ( 11 01 2)1 o
1
-1 1 0
-1
d. The matrices in Exercise 7 of Section 12.
2. Let A be a matrix in triangular form (with zeros below the diagonal),
140 DETERMINANTS CHAP. 5
o
where the At are square matrices, possibly of different sizes. Prove that
Using (17.4) and (17.5) applied to the first position, then to the second
position, etc., we have
= i
Jr=l
A1JrA(eJr' i
f2=1
A2f2ef2,aa, ... ,an)
where the last sum consists of nn terms, and is obtained by lettingh, ... , jn
range independently between I and n inclusive. By (17.2) and (17.3) it
follows easily by inductiont that for all choices ofjl, ... , jn, A(eh' ... , efn)
= O. Therefore A(al, ... , an) = 0, and the uniqueness theorem is
proved.
(17.6) THEOREM. There exists a function D(al, ... , an) satisfying the
conditions of Definition (16.5).
Proof We use induction on n. For n = I, the function D(a) = a, a E F,
satisfies the requirements. Now suppose that D is a function on F n - l
that satisfies the conditions in Definition (16.5). Fix an indexj, 1 ~ j ~ n,
and let the vectors al, ... , an in Fn be given by
alk E F, 1 ~ i ~ n.
t What has to be proved by induction is that either Ll(eil, ••• , el n ) = 0 (if two of
the j's are equal) or Ll(eil , .•• , el n) = ± Ll(e" ... , en) (if the j's are distinct).
142 DETERMINANTS CHAP. 5
Then define:
(17.7) D(a1' ... , an) = ( - I)1+fa1fDlf + ... + (-l)n+fa nfDnf
where, for 1 :::; i :::; n, Dtj is the determinant of the vectors a~), ... , a~~ 1
in Fn -1 obtained from the n - 1 vectors aI, ... , at -1, al + 1, . . . ,an by
deleting the jth component in each case.
We shall prove that the function D defined by (17.7) satisfies the
axioms for a determinant. By the uniqueness theorem (17.1) it will then
follow that all the expansions (17.7) for different j are equal, which is an
important result in its own right.
Let us look more closely at (17.7). It says that D(a1,"" an) is
obtained by taking the coefficients of the jth column of the matrix A
with rows aI, ... , an and multiplying each of them by a power of (-I)
times the determinant of certain vectors, which form the rows of a matrix
obtained from A, by deleting the jth column and one of the rows.
First let e1, ... , en be the unit vectors in Fn; then the matrix A is
given by
where zeros fill all the vacant spaces. Then there is only one nonzero
entry in the jth column, namely, aii = 1. The matrix from whose rows
D11 is computed is the (n - l)-by-(n - I) matrix
A'=
SEC. 17 EXISTENCE AND UNIQUENESS OF DETERMINANTS 143
A" =
all .. , a 1 n)
A = ( ............ ,
anI . .. ann
where the sum is taken over the nn possible choices of (j1, ... ,jn). Since
D(eil' ... , efn) = 0 when two of the entries are the same, we can rewrite
(17.11) in the form
D(A) of the matrix whose rows are al, ... , an, then the complete ex-
pansion shows that, since the D(eh' ... , ein ) are ± 1, the determinant is a
sum (with coefficients ± 1) of products of the coefficients of the matrix A.
If the matrix A has real coefficients, and is viewed as a point in the n 2 _
dimensional space, then (17.12) shows that D(A) is a continuous function
of A.
The idea of viewing the determinant D(al' ... , an) as a function of
a matrix A with rows al, ... ,an at once raises another problem. Let
C l , •.. , Cn be the columns of A. Then we can form D(c l , ... , cn) and ask
what is the relation of this function to D(al' ... , an).
(17.13) THEOREM. Let A be an n-by-n matrix with rows al, ... , an and
columns Cl, . . . , Cn; then D(al' ... , an) = D(Cl' ... , cn).
Proof Let us use the complete expansion (17.12) and view (17.12) as
defining a new function:
(17.14) D*(Cl' ... , cn) = 2:
it ..... in
alh'" ani n D(eil' ... , ein)
= D(al' ... , an) .
We shall prove that D*(Cl' ... , cn) satisfies the axioms for a determinant
function; then Theorem (17.1) will imply that D*(Cl,"" cn) =
D(Cl' ... , cn).
First suppose that Cl, •.. , Cn are the unit vectors el, ..• , en' Then
the row vectors al, ... , an are also the unit vectors and we have
D*(el, ... , en) = D(el' ... , en) = 1.
From (17.14) it is clear, since each term in the sum has exactly one entry
from a given column, that
D*( . .. , AC!, ••• ) = AD*( . .. , C!, ... ).
i
Finally, let us consider
D*( ... , C! + c", ... ), k '# i.
i
That me~ that, for 1 :::; r :::; n, aTi is replaced by art + ar". Making
this substitution i (17.14), we can split up D*( . .. , C! + c", ... ) as a sum,
L
D*( . .. , Ci + c", ... ) = D*( .. . , C!, ... ) + D*( . .. , C", ... ), and we shall
i i
be finished if we can show that D*(Cl' ... , cn) = 0 if two of the vectors,
CT and c., are equal, for r '# s. In (17.14), consider a term
146 DETERMINANTS CHAP. 5
such thatA = r,jl = s. There will also be a term in (17.14) of the form
ali! ... akfl ... a1h ... anfnD(ei!' ... , efl' ... , eh-' ... , efn )
and the sum of these two terms will be zero, since Cr = C. and
D( ... , eh"'" efl"") = -D( ... , efl"'" eiTc"")'
Thus each term of D*(c1' ... , cn) is canceled by another, and we have
shown that D*(C1' ... , cn) = 0 if Cr = cs , for r '# s. We have proved
that D* satisfies the axioms for a determinant function. By Theorem
(17.1) we conclude that D(a1,"" an) = D(c1,"" cn), and Theorem
(17.13) is proved.
EXERCISES
book. The definition we have given for determinants was chosen partly
because it leads to a simple proof of this theorem.
A natural question to ask is the following. Suppose that A = (aij)
and B = (fJij) are n-by-n matrices; then AB is an n-by-n matrix. Is there
any relation between D(AB) and D(A) and D(B)? (For 2-by-2 matrices
we have already settled this question in Exercise 4 of Section 14.)
The first step is the following preliminary result.t
It is clear that D' satisfies the Axioms (i), (ii), and (iii) of (16.5). Hence,
by Theorem (17.1), D' = D, and solving for f(al, ... , an) in (18.2) we
obtain the conclusion of the lemma.
Before giving the proof of the main theorem, we have to recall a fact
about linear transformations on Fn. In Section 12, we showed that each
n-by-n matrix A defined a linear transformation U of Fn into Fn , where the
action of U on a vector x = <gl, ... , gn> is given by matrix multiplication
of A with x regarded as a column vector:
In the course of the proof of the next theorem, a matrix A is used to define
a linear transformation of Fn according to this definition.
We can now state the multiplication theorem:
t The idea of using this lemma as a key to the multiplication theorem comes from
Schreier and Sperner (see the Bibliography).
148 DETERMINANTS CHAP. 5
<i k=l
fJlkakl, ... ,
Proof Part (d) of Theorem (16.6) shows that if A has rank less than n
then D(A) = O. It remains to prove that if A has rank n then D(A) # O.
(For another proof see Exercise 3 of Section 8.) Let {al' ... , an} be the
row vectors of A. Since A has rank n, {a 1 , ••• , an} is a basis for Fn;
therefore for each i, 1 :::; i :::; n, we can express the ith unit vector ei as a
linear combination of {al' ... , an}:
(18.6)
Let B be the matrix (fJli); then (18.6) implies that the matrix whose rows
are {e 1 , ••• , en} is the product matrix BA. Since D(el' ... , en) = 1, we
have by Theorem (18.3),
1 = D(BA) = D(B)D(A)
and D(A) # 0 as required.
EXERCISES
A
-2
~ (~ -~ o
1) 3
3
1
2' B -
30
_ ( -1 1
0 1
0)
oo 0
-2 0 .
-2 3 2 1 1
150 DETERMINANTS CHAP. 5
In this section we take up a few of the many special topics one can study
in the theory of determinants.
Let A = (ai)) be an n-by-n matrix. Define the (i,j) cofactor Ai; as Aij =
(-ly+j Dij where Dij is the determinant of the n - 1 by n - I matrix,
obtained by deleting the ith row andjth column of A. Then the formulas
(I 7.7) can be stated in the form
(19.2) j i= I.
This is easily obtained from (I9.1) as follows. Consider the matrix A'
obtained from A by replacing the lth column of A by the jth column, for
SEC. 19 FURTHER PROPERTIES OF DETERMINANTS 151
j #- I; then A' has two equal columns and hence D(A') = 0, since the
determinant function satisfies the conditions (i), (ii), and (iii) of (16.5)
when considered as a function of either the row or the column vectors,
according to the proof of Theorem (17.13). Taking the expansion of
D(A') along the Ith column, we obtain (19.2).
Let tA be the transpose of A; then the column expansions of D(tA)
yield the following row expansions of D(A), since D(A) = D(tA) by (17.15).
n
(19.3) L ajkAjk = D(A), j = 1,2, ... , n.
k=l
(19.4)
We know that the matrix 1 plays the same role as I in the real number
system:
AI=IA=A
A = (; ~),
and let D(A) = a8 - P'Y = 1.
(19.10)
has a unique solution if and only if the determinant of the coefficient matrix
D(A) =1= 0. If D(A) =1= 0, the solution is given by
where C1, ... , Cn are the columns of A and b = <f31, ... , f3n).
Proof By (18.5), D(A) =1= 0 if and only if the columns C1, ... , Cn are
linearly independent, and thus the statement about the existence of a
SEC. 19 FURTHER PROPERTIES OF DETERMINANTS 153
and we have
n
D(cl, ... , CI-l, b, CI+l, •.• , cn) = D(Cl, ... , Cj-l, 2:
j=l
X,C" Cj+l, ••. , cn)
n
= 2:
j=l
XjD(Cl, ... ,Cj_l,Cj,Cl+l, ... ,Cn)
= XjD(Cl, ... , cn) = xjD(A),
proving the theorem.
G -~ ~)
are
1 -11
12 3' 1-13 01
1·
el = unit vectors.
Such a function can be interpreted as the volume of the n-dimen-
sional parallelopiped, with edges a1, ... , an, which consists of all vectors
x = L ;>"Ial, for 0 :s; ;>"1 :s; 1. The connection between volume functions
and determinants is given in the following theorem.
(19.12) THEOREM. There is one and only one volumefunction V(a 1 , ... , an)
on R n , which is given by
ID(al' ... ,an)1 = Ilalll ... Ilanll D( II:~ II' ... , 11::11)
::;; Ilalll ... Ilanll,
since adllatll has length 1 for 1 ::;; i ::;; n.
N ow assume Ul , ... , Un are vectors of length 1 ; we have to prove that
ID(Ul, ... ,un)1 ::;; 1. If T is an arbitrary orthogonal transformation of
R n , then formula (18.4) in the proof of Theorem (18.3) shows that
iD(T(Ul)' ... , T(un »I = ID(T)I ID(Ul' ... , un)l.
Since Tis an orthogonal transformation we have ID(T) I = 1 (see Exercise
5, Section 18), so that
ID(T(Ul), ... , T(un))l = ID(Ul, ... , un)l·
We may assume that {Ul' ... , un} is a linearly independent set, otherwise
ID(Ul' ... , un)1 = 0, and the inequality holds in an obvious way. By
Exercise 10 of Section. 15, there exists an orthogonal transformation T
of Rn such that T(ut) E S(e2' ... , en) for i = 2, ... , n, where {el' ... , en}
are the usual unit vectors in Rn. Then the matrix whose rows are
{T(Ul), ... , T(u n)} has the form
156 DETERMINANTS CHAP. 5
Moreover, since I T(uj) I = 1 for 1 ::s; i ::s; n, we know that the sum of the
squares of the elements in each row is 1. Expanding D(X) along the first
column we have
A22 A2n
ID(X) I = 1.\111 ........... .
An2 Ann
In this part of the section, we show how the multiplication theorem for
determinants can be used to derive some important facts about permu-
tations.
a = (1 2
jl j2
'"
~) ,
.. , in
where a(l) = jl, a(2) = j2, ... , a(n) = jn.
For example,
( 1 2 3)
3 I, 2
stands for the permutation a such that a(l) = 3, a(2) = 1, a(3) = 2.
1 = G ; ;) , al = G ~ ;) ,
a4 = G ~ ;),
Using the definition aT(X) = a(T(x» one can show that the multiplication
table of the group P(X) is given by
1 al a2 aa a4 a5
al a2 aa a4 a5
al al a5 a4
a2
aa etc.
a4
a5
where the entries in the table are computed as follows:
ala2 = G 2 ;)G 23 ;) G 23 i) = a5
= G 1 ;) G 2 i)
2 2 2
alaa
G 1 ;) = a4·
since
(TqT,)Vj = Tq(v.(I) = Tq(;(!)) = Tq.(!)
= Tq;vj, i = 1, 2, ... , n .
158 DETERMINANTS CHAP. 5
and it follows from the definition of T" that the columns of A" are simply a
rearrangement of the columns of the identity matrix I. Therefore, since
interchanging two columns of A" changes the sign of the determinant, we
have
D(A,,) = ± D(I) = ± 1.
The second statement, that €(a7') = €(a)€(7'), follows from Theorem (18.3),
since
€(a7') = D(T",) = D(A",) = D(A"A,)
= D(A,,)D(A.) = €(a)€(7').
a2 = ( 21 21 3'
3)
a3
I 2 3)
= (1 3 2 '
and it is easily checked that a5 and ae are products of two transpositions.
Proof The result is immediate from formula (17.12) and the definition
of e{a).
The formula for D(A) given above is useful because it can be used to
define the determinant of a matrix with elements in a commutative ring
(see the book by Jacobson, listed in the Bibliography).
EXERCISES
1. List the methods you know for finding the inverse of a matrix. Test the
following matrices to see whether or not they are invertible; if they are
invertible, find an inverse.
-1)1 ' 1 0)
2 1 .
-1 2
2. Find a solution of the system
3X1 + X2 1
Xl + 2x2 + X3 = 2
- X2 + 2x3 = -1
( 1 2 3 4)
-1 2 1 0 '
(-1 0 1 2)
1 1 3 0 .
-1 2 4 1
4. Prove that the equation of the line through the distinct vectors <a, fJ),
<y, 8) in R2 is given by
Xl X2 X3 1
a1 1
fJ1
0:2
fJ2
0:3
fJ3 1
= o.
Yl Y2 Y3 1
SEC. 19 FURTHER PROPERTIES OF DETERMINANTS 161
Yl = 3Xl - X2
Y2 = Xl + 2X2,
-->.
carries the square consisting of all points P such that OP = Ael + /Le2,
for 0 ~ A and /L ~ 1, onto a parallelogram. Show that the area of this
parallelogram is the absolute value of the determinant of the matrix of
the transformation T,
(~ -1)2 .
7. Show that the area of the triangle in the plane with the vertices (al' a2),
(/31 , /32), (Yl , Y2) is given by the absolute value 9f
al a2 1
t /31 /32 1
Yl Y2 1
8. Show that the volume of the tetrahedron with vertices (al' a2, a3),
(/31 , /32 , /33)' (Yl , Y2 , Y3), (8 1 , 82 , 83) is given by the absolute value of
1 ~ i ~ n.
Prove that Pl , ... , Pn lie on a linear manifold of dimension < n - 1 if
and only if
=0
for all Xl, ... , Xn in R. Prove that, if Pl, ... , Pn do not lie on a linear
manifold of dimension of < n - I, then Pl, ... ,Pn lie on a unique
hyperplane whose equation is given by the above formula.
10. Prove the following formula for the van der Monde determinant:
162 DETERMINANTS CHAP. 5
(Hint: Let Cl, C2, ••• , Cn be the columns of the van der Monde matrix.
Show that
D(Cl, ... , cn) = D(Cl, C2 - glCl, ••• , Cn-l - glCn-2, Cn - glCn-l)
o o
Then take the row expansion along the first row, factor out appropriate
factors from the result, and use induction.)
11. Show that if {al , ... , an} are nonzero and mutually orthogonal vectors
in Rn,then /D(al, ... ,an)/ = /lad /l a2/1 ••• /lan/l.
Chapter 6
Polynomials and
Complex Numbers
20. POLYNOMIALS
163
164 POLYNOMIALS AND COMPLEX NUMBERS CHAP. 6
We shall use the notation F [ x] for the set of polynomials with co-
efficients in F.
Example A. Compute
(3 - x + x 2 )(2 + 2x + x2 - x 3 ).
We see that x 5 is the high, . power of x appearing in the product with
a nonzero coefficient. Then we compute the product by leaving spaces
for the coefficients of X O , Xl, . . . ,x5 and filling them in by inspection.
Thus the product is
(20.9)
f = ao + a1X + ... + aTxT, aTi=O, r;:::O,
g = fJo + fJ1X + ... + fJsx s , fJs i=
0, s;::: O.
11 = Qg + R.
168 POLYNOMIALS AND COMPLEX NUMBERS CHAP. 6
t The notation f(x) is sometimes used for a polynomial f; this notation suppresses
the distinction between polynomials (which are sequences) and polynomial functions.
The need for the distinction arises from the fact that for finite fields F, two different
polynomials in F[x] may correspond to the same polynomial function. For example,
let F be the field of two elements [see Exercise 4 of Section 2]. Then x 2 - x and 0 are
distinct polynomials which define the same polynomial function.
SEC. 20 POLYNOMIALS 169
The next two results govern the connection between finding the zeros
of a polynomial and factoring the polynomial. First we require an
important lemma.
where the {gj} are arbitrary polynomials. Then S contains the set of poly-
nomials {f1, ... ,fk} and has the property that, if PES and h E F [ x ] ,
then ph E S. Since the degrees of nonzero elements of S are in N U {O},
by the well-ordering principle (2.5B) we can find a nonzero polynomial
d = hd1 + ... + hdk E S
such that deg d ~ deg d' for all nonzero d' E S.
We prove first that d Iit for I ~ i ~ k. By the division process we
have, for 1 ~ i ~ k,
it = dql + rl
where either rj = 0, or deg rl < deg d and
rl = it - dql E S.
where the {Pi} and {qj} are primes, implies that s = t, and for a suitable
indexing of the p's and q's we have
11 = I - ql ... qS-l(Xlpr)'
By (20.20) we have
(20.21)
and so, by what has been said,/l =1= 0 and deg/l < degf But from the
form of 11 we see that Pr 111 and, since prime factorization is unique for
SEC. 20 POLYNOMIALS 173
polynomials of degree < degf, we conclude from (20.21) thatpr I (qs - xlpr),
since Pr is distinct from all the primes q1 , ... , qs -1. Then
q. - x1pr = hPr
and qs = Pr(h + Xl)
which is a contradiction. This completes the proof of unique factori-
zation·t
We conclude this section with the observation familiar to us from high
school algebra that, although F [ x ] is not a field, F [ x ] can be embedded
in a field, and in exactly the way that the integers can be embedded in the
field of rational numbers. Some of the details will be omitted.
Consider all pairs (f, g), for f and g E F [ x ] , where g =1= O. Define
two such pairs (f, g) and (h, k) as equivalent if fk = gh; in this case we
write (f, g) ,.., (h, k). Then the relation ,.., has the properties:
1. (f, g) ,.., (f, g).
2. (f, g) ,.., (h, k) implies (h, k) ,.., (f, g).
3. (f, g) ,.., (h, k), (h, k) ,.., (p, q) imply (f, g) ,.., (p, q).
[For the proof of property (3) the cancellation law (20.6) is required. ]
Now define a fraction fjg, with g =1= 0, to be the set of all pairs (h, k),
k =1= 0, such that (h, k) ,.., (f, g). Then we can state:
4. Every pair (f, g) belongs to one and only one fractionflg.
5. Two fractionsfjg and rjs coincide if and only iffs = gr.
Now we define:
6. flg + rjs = (Is + gr)jgs.
7. (ljg)(rjs) = frjgs.
It can be proved first of all that the operations of addition and
multiplication of fractions are defined independently of the representatives
of the fractions. In other words, one has to show that if fjg = f1jg1 and
rjs = r1js1 then
fs + gr AS1 + glr1
gs glSl
and that a similar statement holds for multiplication.
t Note that the same argument establishes the uniqueness of factorization in the
ring of integers Z. In more detail, if uniqueness of factorization does not hold in Z,
there exists a smallest positive integer m = Pl ... Pr = ql ... qs which has two
essentially different factorizations. Then we may assume that no p, coincides with
a ql and that rand s are greater than 1. We may also assume that Pr < q, and form
ml = m - ql" 'qs-1Pr' Thenml < m,andpr I ml' Sinceml = ql" ·qs-l(qs - Pr),
it follows that Pr I (q, - Pr), which is a contradiction. This argument first came to
the attention of the author in Courant and Robbins (see the Bibliography).
174 POLYNOMIALS AND COMPLEX NUMBERS CHAP. 6
Now we shall state a result. The proof offers no difficulties, and will
be omitted.
(20.22) THEOREM. The set of fractions fig, g # 0, with respect to the
operations of addition and multiplication previously defined, forms a field
F(x). The mapping f ~ fglg = rp(f), where fE F [x], is a one-to-one
mapping of F [x] into F(x) such that rp(f + h) = rp(f) + rp(h) and rp(fh) =
rp(f)rp(h) for f, h E F [x] .
Proof It is clear that d If and dig. Now let d' If and d' I g. By the
uniqueness of factorization, we conclude that
€ a unit,
where dj :::; aj and d j :::; b j for 1 :::; i :::; r. It follows that d' I d and the
theorem is proved.
To apply the theorem to our example, we have to factor f and g into
their prime factors in R [ x ]. We have
f = (x - 2)(x + 1),
g = (x + 1)(x2 - X + 1).
SEC. 20 POLYNOMIALS 175
EXERCISES
1. Use the method of proof of Theorem (20.8) to find the quotient Q and
the remainder R such that
f= Qg + R
where f = 2X4 - x3 + x-I,
g = 3x3 - x2 + 3.
2. Prove that a polynomial f E F [ x ] has at most deg f distinct zeros in F,
where F is any field.
3. Let f = ax 2 + bx + c, for a, b, c real numbers and a ;f. O. Prove that
f is a prime in R [ x ] if and only if b2 - 4ac < O. Prove that if b2 -
4ac = D ~ 0 then
(
f=ax- - b + VD) (x - - b - VD) .
2a 2a
4. Let F be any field and let f E F [ x] be a polynomial of degree ~ 3.
Prove that f is a prime in F [ x ] if and only iff either has degree 1 or has
no zeros in F. Is the same result valid if deg f > 3?
5. Prove that, if a rational number mjn, for m and n relatively prime
integers, is a root of the polynomial equation
where the af E Z, then n I ao and m I aT.t Use this result to list the
possible rational roots of the equations
2x 3 -6x 2 + 9=0,
x3 - 8x 2 + 12 = O.
6. Prove that if m is a positive integer which is not a square in Z then
Vm is irrational (use Exercise 5).
7. Factor the following polynomials into their prime factors in Q [ x ]
and R [x].
a.2x 3 - x2 +x+1.
b. 3x 3 + 2x 2 - 4x + 1.
c. x 6 +1.
d. X4 + 16.
t This argument uses the fact that the law of unique factorization holds for the
integers Z, as we pointed out in the footnote to Giffen's proof of unique factorization
in F[ x].
176 POLYNOMIALS AND COMPLEX NUMBERS CHAP. 6
The field of real numbers of R has the drawback that not every quadratic
equation with real coefficients has a solution in R. This fact was cir-
cumvented by mathematicians of the eighteenth and nineteenth centuries
by assuming that the equation x 2 + 1 = 0 had a solution i, and they
investigated the properties of the new system of "imaginary" numbers
obtained by considering the real numbers together with the new number i.
Although today we do not regard the complex numbers as any more
imaginary than real numbers, it was clear that mathematicians such as
Euler used the" imaginary" number i with some hesitation, since it was
not constructed in a clear way from the real numbers.
Whatever the properties of the new number system, the eighteenth-
and nineteenth-century mathematicians insisted upon making the new
numbers obey the same rules of algebra as the real numbers. In par-
ticular, they reasoned, the new number system had to contain all such
expressions as
ct + fJi + yi 2 + ...
where ct, fJ, y, ... were real numbers. Since i 2 = -1, i 3 = - i, etc., any
such expression could be simplified to an expression like
ct + fJi, ct, fJ E R.
SEC. 21 COMPLEX NUMBERS 177
whose determinant is a2 + fl2. Since a and fl are real numbers, and are
not both equal to zero, we have a 2 + fl2 > 0, and the equations can be
solved. Thus Z-l exists, and C forms a field.
(21.4)
Z
= W.
-1
(21.5) Z
which asserts that a product of two sums of two squares can be expressed
as a sum of two squares.t
We come next to the important polar representation of complex
numbers. Let Z = <ex, f3); then letting p = Vex 2 + f32 = Izl, we can write
where () is the angle determined by the rays joining the origin to the points
(1,0) and (ex, f3). Thus,t
z = ex + f3i == p(cos () + isin () = Izl(cos () + isin ()
where we note that
Izi = Ipl, Icos () + i sin ()I = 1.
If w =Iwl(cos rp + i sin rp), then we obtain
zw = Izll wl(cos () + i sin ()(cos rp + i sin rp)
= Izwl [(cos () cos rp - sin () sin rp) + i(sin () cos rp + cos () sin rp) ] .
From the addition theorems for the sine and cosine functions, this formula
becomes
(21.6) zw = Izllwl [cos «() + rp) + i sin «() + rp)]
which says in geometrical terms that to multiply two complex numbers
we must multiply their absolute values and add the angles they make with
the" real" axis.
t For a discussion of this formula and analogous formulas for sums of four and
eight squares, see Section 35.
t We write i sin 8 instead of the more natural (sin 8)i in order to keep the number
of parentheses down to a minimum.
180 POLYNOMIALS AND COMPLEX NUMBERS CHAP. 6
Many proofs of this theorem have been found, all of which rest on
the completeness axiom for the real numbers or on some topological
property of the real numbers which is equivalent to the completeness
axiom. The reader will find a proof very much in the spirit of this course
in Schreier and Sperner's book, and other proofs may be found in Birkhoff
and MacLane's book (see the Bibliography for both), or in any book on
functions of a complex variable.
SEC. 21 COMPLEX NUMBERS 181
(21.13) COROLLARY. Every prime polynomial in R [x] has the form (up
to a unit factor)
x - a, or x2 + aX + fJ, a2 - 4fJ < O.
Proof Let fER [ x] be a prime polynomial; then f has a zero u E C.
If u = a E R, then f = g(x - a) for some g E R. If u 1: R, then ii =1= u
and, by (21.12),
(x - u)(x - ii) = x 2 - (u + ii)x + uii
182 POLYNOMIALS AND COMPLEX NUMBERS CHAP. 6
EXERCISES
(3 + i)( - 2 + 4i),
2 + i
3 + 2i' 2 - i'
2. Derive formulas for cos 38 and sin 38 in terms of cos 8 and sin 8.
3. Find all solutions of the equation x 5 = 2.
4. Let ao + alX + ... + ar_1X r - l + xr = (x - Ul)(X - U2) .• , (x - u r ) be
a polynomial in C [ x 1 with leading coefficient ar = 1 and with zeros
Ul, . . . , Ur in C. Prove that ao = ± UlU2, ... UT and ar-l = -(Ul + U2 +
... + ur ).
5. Prove that the field of complex numbers C is isomorphict with the set of
all 2-by-2 matrices with real coefficients of the form
a,{JER,
-sin 8)
cos 8
--+ cos 8 + I..sm 8
t Two fields F and F' are said to be isomorphic if there exists a one-to-one mapping
a ->- a' of F onto F' such that (a + (3)' = a' + {3' and (a{3)' = a'{3' for all a, {3 E F.
t For the definitions and properties of addition and multiplication of matrices, see
Section 12.
SEC. 21 COMPLEX NUMBERS 183
184
SEC. 22 BASIC CONCEPTS 185
(22.1) DEFINITION. Let f(x) = Ao, + A1X + ... + ATXT E F [ x] and let
TEL(V, V); thenf(T) denotes the linear transformation
f(T) = Ao • 1 + A1T + ... + ArT'
where 1 is the identity transformation on V. Similarly, we may define
f(A) where A is an n-by-n matrix over F, with 1 replaced by the identity
matrix I.
(22.3) THEOREM. Let T E L(V, V); then I, T, T2, ... , p2 are linearly
dependent in L(V, V). Therefore there exists a uniquely determined integer
r ::s; n 2 such that
I, T, T2, ... , Tr-1 are linearly independent,
I, T, T2, ... , Tr-1, TT are linearly dependent.
186 THE THEORY OF A SINGLE LINEAR TRANSFORMATION CHAP. 7
Then we have
T' = gol + glT + ... + gT_ 1T'-1, gj E F.
Let m(x) = x T - gT_1XT-1 - ... - go' I E F [x]. Then m(x) has the
following properties:
1. m(x) =1= 0 in F [ x ] and meT) = O.
2. If f(x) is any polynomial in F [ x ] such that f(T) = 0, then m(x) If(x)
inF[x].
Proof The existence of the polynomial m(x) and the statement (1)
concerning it follow from the introductory remarks in this section. Now
let f(x) be any polynomial in F [ x ] such that f(T) = O. Because
I, T, ... , T' -1 are linearly independent, there does not exist a polynomial
R(x) =1= 0 of degree < r such that R(T) = O. Now apply the division
process to f(x) and m(x), and obtain
f(x) = m(x)Q(x) + R(x)
where either R(x) = 0 or deg R(x) < r = deg m(x). By Lemma (22.2) we
have
R(T) = (f - mQ)(T) = f(T) - m(T)Q(T) = 0
The remarks about the uniqueness of m(x) are clear by part (2) of
Theorem (22.3). To see this, let m(x) and m'(x) be two nonzero polynomials
of degree r such that meT) = m'(T) = O. Then by the proof of part (2)
of Theorem (22.3) we have m(x) I m'(x) and m'(x) I m(x). It follows from
the discussion in Section 20 that m(x) and m'(x) differ by a unit factor in
F [ x ] and, since the units in F [ x ] are simply the constant polynomials,
the uniqueness of m(x) is proved.
We remark that Theorem (22.3) also holds for any matrix A E Mn(F).
If T E L(V, V) has the matrix A with respect to the basis {V1' ... , vn } of V,
it follows from Theorem (13.3) that T and A have the same minimal
polynomial.
A thorough understanding of the definition and properties of the
minimal polynomial will be absolutely essential in the rest of this chapter.
SEC. 22 BASIC CONCEPTS 187
A = (; ~), a, p, ", aE F,
We have
Therefore,
(22,5) A2 - (a + a)A - (fJ" - aa)! = O.
We have shown that A satisfies the equation
(~ ~) x-3
(~ -~)
(-! ~) x2 - 4x + 5
188 THE THEORY OF A SINGLE LINEAR TRANSFORMATION CHAP. 7
Then
a2 + f38 + YTJ af3 + f3£ + y8 ay + f3t + y~)
A2 = ( a8 + 8£ + tTJ tf3 + £2 + ~8 8y + £t + a ;
aTJ + 88 + ~TJ TJf3 + £8 + ~8 TJY + 8t + ~2
G ~) 0
0
x3 - 2X2
G -~)
0
0 x2
0
When using formulas such as (22.5) and (22.6) to find the minimal
polynomial of A, it must be checked that A satisfies no polynomial
equation of lower degree. In the case of the third matrix above, this
check shows that x 2 is actually the minimal polynomial.
equations of the first order shows that these are the only characteristic
vectors of d. (See Section 34.)
Example D. Test the functions in :?F(R), {e"'lt, e"'2 t , ... , e"'st}, with distinct ai,
for linear dependence. This is not too easy to do directly. But consider
the vector space generated by the functions,
v= S(e""t, ... , e"s!).
Then the derivative dE L( V, V)(why?). Moreover, the functions
e""! , ... ,e"st are characteristic vectors of d belonging to distinct
characteristic roots, by Example C. Therefore, the functions are linearly
independent by Theorem (22.8).
A= (
-3
2
-2)2 .
First of all, we check that A is the matrix of T with respect to the basis
= -2el + 2e2 •
D(A - aI) = 1- 32 - a 1
2 -=..2 a = (- 3 - a)(2 - a) + 4
= a2 + a - 2 = (a + 2)(a - 1).
Therefore - 2 and 1 are characteristic roots.
We shall now find a characteristic vector for the characteristic
root - 2. This means we have to solve the equation
(22.11) Tx = -2x.
192 THE THEORY OF A SINGLE LINEAR TRANSFORMATION CHAP. 7
We have
-3
Tx = ( 2
EXERCISES
J) (~ J)
-1 0 2 0
and
o 2 o 2
o 0 o 0
have the same minimal polynomials.
5. Prove that if T E L( V, V) then T is invertible if and only if the constant
term of the minimal polynomial of T is different from zero. Describe
SEC. 23 INVARIANT SUBSPACES 193
We can see now that T is a reflection with respect to the line through the
origin in the direction of the vector Wi ; it sends each vector in S(Wi)
194 THE THEORY OF A SINGLE LINEAR TRANSFORMATION CHAP. 7
W=XWI+~W2
W2 I
r---------/
I
I
I
WI = T(wd
I
I
I
I
I
I
T(w) = X[T(wdJ+ ~[T(W2)]
FIGURE 7.1
onto itself and sends W2 onto its mirror image - W2 with respect to the
line S(WI)' Figure 7.1 shows how the image T(w) of an arbitrary vector
w can be described geometrically.
The concept illustrated here is the simplest case of the following basic
idea.
(23.2) LEMMA. Let T E L(V, V) and let f(x) E F [ x] ; then the set of
all vectors v E V such that f(T)(v) = 0 [ that is, the null space off(T) ] is a
T-invariant subspace-notation, n [f(T) ] .
Proof Sincef(T) EL(V, V), the null space n [f(T)] is a subspace of V.
We have to prove that if WEn [f(T)] then T(w) En [f(T)]. We have
f(T)[ T(w)] = [f(T)T](w) = [Tf(T) ](w) = T [f(T)(w)] = 0,
since f(T)T = Tf(T) in L(V, V), and the lemma is proved.
and if, second, the expressions (23.4) are unique, in the sense that if
V1 + ... + Vs = v~ + ... + v;, Vi' v; E V;, 1::::;; i ::::;; s
then : : ; i : : ; s.
(23.6) LEMMA. Let V be a vector space over F, and suppose there exist
nonzero linear transformations {E1 , ... , E 8} in L(V, V) such that the
following conditions are satisfied.
a. 1 = E1 + ... + E 8,
h. EjEj = EjEj = 0 ifi =F j, l::s; i, j::s; s.
Then we have Ef = E;, 1 ::s; i ::s; s. Moreover, V is the direct sum
V = E1 V EEl ..• EEl E8 V, and each subspace E; V is different from zero.
Proof From (a) and (b) we have
E; = E j . 1 = E;(E1 + ... + E 8) = Ef + 2: E;Ej = Ef,
j*;
proving the first statement. For the second statement, we note that E j V
is a nonzero subspace for 1 ::s; i ::s; s, since E; is a nonzero linear trans-
formation. Let v E V; then
v = 1v = E1 V + ... + Esv ,
proving that V = E1 V + ... + E8 V. Now suppose V1 E E1 V, ... , V8 E E8 V,
and V1 + ... + V8 = O. Then
(23.7) E;(V1 + ... + v8) = 0, 1 ::s; i ::s; s.
Moreover, EiVj = 0 if i =F j because Vj E EjV, and E;Ej = O. Finally
Ejvj= V;, since v; = Ejv for some v E V, and E;v; = Erv = Eiv = Vj,
using the fact that Ef = Ei , 1 ::s; i ::s; s. The equation (23.7) implies that
V1 = ... = V8 = 0, and the lemma follows from Lemma (23.5).
Proof. Let
m(x)
qt(x) = Pt(X)"I' l:S;i:S;s.
EtE, = qt(T)at(T)qlT)a,(T) = 0
since m(x) I qt(x)qj(x) if i =1= j, and hence qi(T)qlT) = O.
We have, for all v E V,
pt(T)",Eiv = Pt(T)e'q;(T)a;(T)v = m(T)aj(T)v
= 0,
proving that E; V c n(pj(T)"'). Moreover, V = E1 V + ... + Es V from
what has been proved. If Ei V = 0 for some i, then V = LN; E JV, and
qj(T) V = LN;q;(T)EjV = 0, since m(x) I qj(x)qlx), so that
q;(T)Ej = qj(T)qlT)aj(T) = 0
198 THE THEORY OF A SINGLE UNEAR TRANSFORMATION CHAP. 7
ai EF,
(23.13) THEOREM. Let {Sl' S2, ... ,Sk} be a set of diagonable linear
transformations of V, such that SISj = SjSj for 1 :s; i,j :s; k. Then there
exists a basis of V such that the basis vectors are characteristic vectors
simultaneously for the linear transformations Sl, ... , Sk.
r > 1,
be the minimal polynomial of Sl. Then by Theorem (23.9),
I:S;i:S;r,
where each of the subspaces is different from zero and invariant relative
to Sl. We shall prove that because the {Sj} commute, each subspace V; is
invariant with respect to all the St. 1 :s; i :s; k. By Theorem (23.9), we
have VI = Ej V, where Ei = ft(Sl) for some polynomial ft(x) E F [ x ] .
Then SISj = SjSi implies SjEj = EiSj and hence Sj Vj = SjEi V =
EiSj V C EI V = V;, for 1 :s; i :s; r, 1 :s; j :s; k. Each subspace Vi has
smaller dimension than V, because r > 1 and Vi # O. Moreover, each
Sj acts as a diagonable linear transformation on VI, because the minimal
polynomial of each Sj acting on V; will divide the minimal polynomial of
Sj on V, so that the conditions of Theorem (23.11) will be satisfied for the
transformations {Sf} acting on V;, 1 :s; i :s; r. By the induction hypothesis.
each subspace {VI}, 1 :s; i :s; r, has a basis consisting of vectors which are
characteristic vectors for all the {Sj}. These bases taken together form a
basis of V with the required property, and the theorem is proved.
SEC. 24 THE TRIANGULAR FORM THEOREM 201
EXERCISES
I. Test the following matrices to determine whether or not they are similar to
diagonal matrices in M 2(R). If so, find the matrices D and S, as in (23.12).
-2)
-1 '
-2) .
(23 -1
2. Show that the matrices
-2)
-1 '
are similar to diagonal matrices in M 2(C), where C is the complex field, but
not in M 2(R). In each case, find the matrices D and S in M z(C), as in (23.12).
3. Show that the matrix
(01 0001)
010
is similar to a diagonal matrix in M 3 (C) but not in M 3 (R).
4. Show that the differentiation transformation D: P n ~ P n is not diagonable.
(Recall that P n has been used to denote the set of all polynomials
ao + a1X + ... + anxn , with al E R.)
5. Let V1 and V2 be nonzero subs paces of a vector space V. Prove that
V = V1 EB V2 if and only if V = V1 + V2 and V1 (') V2 = {O}.
6. Let T E L( V, V) be a linear transformation such that T2 = 1. Prove that
V= V+ EB V_, where V+ = {VE VI T(v) = v} and V_ = {VE VI
T(v) = -v}.
7. Show that the following converse of Lemma (23.6) holds. Let V be a
direct sum, V = V1 EB ••• EB Vs , of nonzero subspaces VI. Show that
there exist linear transformations E 1 , ••• , Es such that 2: EI = 1, EIE, =
E,EI = 0, i -=f. j, and VI = EI V, 1 :s; i :s; s. (Hint: If v = 2: VI, VI E VI,
set Elv = VI')
8. Let T E L( V, V) have the minimal polynomial m(x) E F [ x ] . Let
I(x) be an arbitrary polynomial in F [ x ]. Prove that
n [J(T)] = n [ d(T) ],
where d(x) is the greatest common divisor of I(x) and m(x).
where R is the real field); the other is that m(x) = (x - glYl ... (x _ g.)e.
with some et > 1. In the latter case it is desirable to have a theorem that
applies to all linear transformations and that comes as close to the
diagonal form theorem (23.11) as possible.
The main theorem is the following one.
A-
_ (A1 A2 .
0 )
o A.
where each A; is a d;-by-d; block for some integer dl ~ el, where 1 ::;; i ::;; s,
and each At can be expressed in the form
where the matrix AI has zeros below the diagonal and, possibly, nonzero
entries (*) above. All entries of A not contained in one of the blocks {AI}
are zero. To express it all in another way, given a square matrix B whose
minimal polynomial is m(x), there exists an invertible S such that SBS -1 =
A where A has the above-given form.
the next d2 elements form a basis for V 2 , and so on. Since each subspace
VI is invariant relative to T, it is clear that the matrix of T relative to this
basis has the form
(
Al ... 0).
o As
It remains only to prove that the blocks AI can be chosen to have the
required form and that the inequalities dl ~ el hold, for 1 ::;; i ::;; s.
Each space Vj is the null space of (T - al· 1)61 • In other words, if
we let NI = T - ai· 1, then NI EL(V;, Vj) and we have Nfl = O.
Let us give a formal definition of this important concept.
so that the lemma is a special case of the triangular form theorem. The
point is that this special case implies the whole theorem. We prove
Lemma (24.2) by induction. First find Wl # 0 such that NWl = O.
Any vector will do for Wl if N = 0 and, if Nk # 0 and Nk+l = 0, then
let Wl = Nk(w) # O. Then N(Wl) = Nk+l(W) = O. Suppose (as an
204 THE THEORY OF A SINGLE LINEAR TRANSFORMATION CHAP. 7
and we have shown that, relative to this basis, the matrix of T on the
space V; has the required form.
It remains to prove the inequalities dt ~ el, 1 ::;; i ::;; r. From Lemma
(24.2) it follows that Nfl = 0, where d, is the dimension of V;. Since
T = al . 1 + NI on VI, we have (T - al . 1)<11 = 0 on VI; and since V is
the direct sum of the subspaces VI, we have
(24.4)
1. m(x) I hex).
2. Every zero of hex) is a zero of m(x).
3. h(T) = O.
(The last statement is called the Cayley-Hamilton theorem.)
t The matrix xl - A actually has entries in F [ x ]; the statement about the quotient
field is needed because the determinant of a matrix has been defined only for matrices
with entries in a field.
206 THE THEORY OF A SINGLE LINEAR TRANSFORMATION CHAP. 7
*
xl - A =
we obtain
D' - D = N- N'.
Since Nand N' commute, we can apply the binomial theorem to show that
(N - N')k = Nk - mNk-lN' + ... (-l)k(N')k.
A typical term is Nt(N')J, with i + j = k. If k is large enough, then it
will follow that either Nt = 0 or (N')' = 0, and hence N - N' is nilpotent.
On the other hand, by Theorem (23.13), there exists a basis of V consisting
of characteristic vectors for D and D', and hence for D' - D. The
matrix of D' - D with respect to this basis will have the form
where {ai' ... , as} are the distinct characteristic roots ofT. Letting
yIEF,
be the expansion of hex) in terms of the powers of x, we have
and
Moreover, ifB = (fJlj) is the matrix of T with respect to an arbitrary basis
of the vector space, then
n
Y1 L: fJu
= 1=1 and Yn = D(B).
are obtained by expanding the right-hand side of the formula for hex)
and comparing coefficients.
Now let B = (fJli) be the matrix of T with respect to an arbitrary
basis. We have shown that the characteristic polynomial is independent
of the choice of the basis. Therefore
hex) =
-fJn1 x - fJnn
The constant term Yn of hex) is h(O) = (-I)n D(B). One way to obtain the
fact that Yn-1 = fJll + ... + fJnn is to use the complete expansion of the
determinant [see (17.12) or (19.22) ]. The terms in the complete expansion
are products of elements from the first row, j1 column, second row, j2
column, etc., multiplied by D(eh' ... ,ej)' In order to have a nonzero
term, all the columns must be different. The only nonzero term in the
complete expansion of h(x) which can contribute to the coefficient of
x n - 1 is
(x - fJu)(x - fJ22) ... (x - fJnn).
The sign associated with this term is D(e1' ... , en) = I. The coefficient
of xn -1 in (x - fJll) ... (x - fJnn) is - (fJll + ... + fJnn). This completes
the proof of the theorem.
x + 1 -3 o
h(x) = D(xI - A) = 0 x-2 o = (x + 1)2 (x - 2).
-2 -1 x +1
At this point we know from Corollary (24.7) that the distinct characteristic
roots of Tare { - 1, 2} and that the minimal polynomial of T is either
(1 + x)(2 - x) + x)2(2 - x).
or (1
Step 2. Find the null spaces of T + 1, (T + 1)2, T - 2. If V turns
out to be the direct sum of the null spaces of T + 1 and T - 2, we will
know that the minimal polynomial is (x + 1) (x - 2) (why?) and if not
then we will know that the minimal polynomial is (x + 1)2(x - 2) and
will have to find the null space of (T + 1)2.
SEC. 24 THE TRIANGULAR FORM THEOREM 211
We have
(T + 1)Vl = 2V3
(T + 1)V2 = 3Vl + 3V2 + V3
(T + l)v3 = 0
The rank of T + I can now be found by determining the maximal
number of linearly independent vectors among <0,0,2), <3, 3, I),
<0,0,0). In this case the number is obviously two and, by Theorem
(13.9), the null space of T + I has dimension 3 - 2 = l.
Similarly, we have
(T - 2)Vl = -3Vl
(T - 2)V2 = 3Vl
(T - 2)V3 =
Step 3. Find the matrix of T with respect to the new basis. According
to Theorem (24.1), we should find a basis {WI, W2} for n [(T + 1)2] such
that (T + l)wl = 0, (T + l)w2 E S(Wl), and let W3 be a basis of neT - 2).
212 THE THEORY OF A SINGLE LINEAR TRANSFORMATION CHAP. 7
The matrix whose columns express the new basis in terms of the original
one (see (13.6)') is
S =
(0° °1 1) 1
1 ° 1
and the matrix of Twith respect to {WI, Wz, W3} is, by the equations above,
B =
(
-1
~ -1
2 °0) .
°2
We should now recall that either SB = AS or BS = SA (to remember
which of the two holds is much too hard !). Checking the multiplications
we see that
SB = AS
or B = S-IAS.
B =
(
-1
~ -1
2 °0) (-1° 00)° + (020)
= ° °. -1 0
Letting D and N
°2 ° °2 0 °°
be the linear transformations whose matrices with
respect to the basis {WI, W2, W3} are
-1
° °0) and (°0 °2 °0) ,
°2 °°°
respectively, we have T = D + N, with D and N diagonable and nilpotent,
respectively, and DN = N D. By the uniqueness part of Theorem
(24.9), D and N give the Jordan decomposition of T.
We shall now give two examples which show how the trace function
is useful in some other parts of mathematics.
The pairs of points in E are called edges of the graph. A vertex v and an
edge (v', v") are said to be incident if v = v' or v = v".
A graph can be represented by a diagram. For example, the figure
below represents a graph with 5 vertices, and 5 edges {(I, 2), (2, 3), (2,4),
(3,4), and (4, 5)}. (It is unnecessary to list the other pairs (2, I), (3,2),
etc., because of our assumptions that (2, 1) E E if and only if (l, 2) E E.)
3~ _ _ _ _~4
(~
0 0
0
M= 1
1 0
0 0 1
0
r)-
As an experiment, let's compute some powers of M. We have
~)
0
3 1 1
M2=
(f 2 1
3
0
3 1
(l l)
2 4 5
M3= 4 2 4
5 4 2
1 3
214 THE THEORY OF A SINGLE LINEAR TRANSFORMATION CHAP. 7
n n n
Tr (M3) = L L L
1=1 j=l k=l
mlfmfkmkl.
mum" "* 0
if and only if (i, j) is an edge of the graph, and in that case mum" = 1.
In case mUmj! = 1, we shall also have mJlmfj = 1. Therefore,
Tr (M2) = 2e,
where e is the number of edges in the graph. In our example, Tr (M2) =
10, and there are 5 edges.
Turning to M3, we see that m "m",mkl "* 0 only when there is a
triangle in the graph, where (i, j, k) form a triangle if (ij), Uk) and (ki)
L k
are all edges in the graph. It is easy to check that for each such triangle,
the contribution to Tr M3 = 6. Therefore,
Tr (M3) = 61,
where t is the number of triangles. In our example,
Tr (M3) = 6,
and there is exactly one triangle.
Example D. Let V be a vector space with basis {Vl, ... , vn } over the field of
real numbers. Let a be a permutation of {l, 2, ... , n}, and let Aq be the
matrix of the linear transformation Tq (defined in Section 19),
TuCv,) = Vq(!).
This time, Tr (Aq) is equal to the number of basis vectors VI such that
Vq(f) = VI. In other words, Tr (Aq) counts the number of fixed points
of the permutation a, where a fixed point is defined to be an integer
i, 1 :::; i :::; n, such that aU) = i.
EXERCISES
T(Vl) = -Vl - V2
T(V2) = Vl - 3V2
T(Vl) = Vl + iV2
T(V2) = - iVl + V2,
where {Vl , V2} is a basis for C 2 .
3. Let V be a two-dimensional vector space over the real numbers Rand
let T E L( V, V) be defined by
T(Vl) = -aV2
T(V2) = (JVl
where a and f3 are positive real numbers. Does there exist a basis of V
consisting of characteristic vectors of T? Explain.
Note: In Exercises 4 to 8, V denotes a finite dimensional vector space over the
complex numbers C.
4. Let T E L( V, V) be a linear transformation whose characteristic roots
are all equal to zero. Prove that T is nilpotent: Tn = 0 for some n.
5. Let T E L( V, V) be a linear transformation such that T2 = T. Discuss
whether or not there exists a basis of V consisting of characteristic
vectors of T.
6. Answer the question of Exercise 5 for the case of a transformation
T such that TT = I for some positive integer r.
7. Let T be a linear transformation of rank I, that is, dim T(V) =.1.
Then T( V) = S(vo) for some vector Vo #- O. In particular, T(vo) =
AVo for some A E C. Prove that
T2 = AT.
216 THE THEORY OF A SINGLE LINEAR TRANSFORMATION CHAP. 7
b.Use (a) to show that if A and B are similar matrices, then Tr (A) =
Tr (B).
c. Show that the matrices of trace zero form a subspace of Mn(F) of
dimension n 2 - I. [Hint: The mapping Tr: M n(F) -+ F is a linear
transformation. 1
d. Prove that the subspace of MiF) defined in (c) is generated by the
matrices AB - BA, A, B E Mn(F).
10. A linear transformation T E L( V, V), where V is a finite dimensional
vector space over an algebraically closed field F, is called unipotent if
T - 1 is nilpotent.
a. Show that T is unipotent if and only if 1 is the only characteristic
root of T.
b. Let T E L( V, V), and suppose that T is invertible. Prove that there
exist linear transformations D and U such that T = DU, D is
diagonable, U is unipotent, and D and U can both be expressed as
polynomials in T.
c. Prove that D and U given in (b) are uniquely determined in the
sense that if T = D'U', with D' diagonable, U' unipotent, and
D'U' = U'D', then D = D', and U = U'.
A basic question about matrices is the following one. Given two n-by-n
matrices A and B with coefficients in F, how do we decide whether or not
A is similar to B?
and
SEC. 15 THE RATIONAL AND JORDAN CANONICAL FORMS 217
and
(25.6) LEMMA. Let <v) be a nonzero cyclic subspace of V, and let T<v> be
the restrictiont ofT to the subspace <v). Then the order mv(x) ofv is equal
to the minimal polynomial of T<v> (up to a constant factor).
Proof A polynomialf(x) E F [ x] has the property thatf(T)v = 0 if and
only if f(T<v» = 0, because <v) is generated by <v, Tv, T 2v, ... and >
f(T)T = Tf(T). Moreover, f(T)v = f(T<v»v. Letting q(x) be the
minimal polynomial of T<v> , we have mv(T) = 0 on <v), and hence
q(x) I mv(x). Conversely, q(T)v = 0, so that mv(x) I q(x), and hence
mv(x) and q(x) differ by a scalar factor. This completes the proof of the
lemma.
The proof of the theorem will be given in Section 29. We shall spend
the remainder of this section on what the theorem means and how it is
used.
We remark first that the orders {Pl(X)"l, P2(X)e 2 , • • • } may contain
repetitions. The uniqueness part of the theorem, in more detail, states
that if
for some other nonzero vectors {va with prime power orders {ql(X)'l, ... ,
qs(x)'s}, then r = s, and the sets of prime powers,
0 ao
1 0 al
0 1 0
Ap(x) =
0
0 0 1 ad-l
SEC.2S THE RATIONAL AND JORDAN CANONICAL FORMS 221
Proof By the proof of Lemma (25.4), a basis for <v> is given by {v,
Tv, ... , Td-1V}, and we have
T(Td-1V) = aov + al(Tv) + ... +ad_l(Td-lv).
The form of the matrix is clear from these remarks.
(01 -1)O'
(25.12) LEMMA. Let p(x) = x d - ad_lxd-l - ... - ao, at E F, and let
<v> be a cyclic subspace such that the order of v is p(x)e for some positive
integer e. Then the matrix of T<v> with respect to the basis
{p(TY -lV, Tp(TY -lV, ... , Td -lp(TY -lV, p(TY - 2V, Tp(TY - 2 V,
... , T d- lp(TY- 2 v, ... , v, Tv, ... , Td-1V}
is
i (oA BA 0B
e bTckS b '.
where A is the companion matrix A = Ap(x) and B is the d-by-d matrix
B ='
(?··· 0 1)
0. .
..
o 0
Proof The number of candidates given in the statement of the Lemma,
for basis elements of <v>, is de = deg p(xy. Therefore the vectors will
be a basis of <v> if we can show that they are linearly independent. A
relation of linear dependence, however, would produce a nonzero
polynomial g(x) such that g(T)v = 0 and deg g(x) < degp(xY, contrary
to the assumption that the order of v is p(xy. We now compute the matrix
of T <v> with respect to the basis. We see that T maps each basis vector
onto the next one, except for the basis vectors
Td-lp(TY-lv, Td-lp (TY- 2 v, . .. , Td-1 V .
222 THE THEORY OF A SINGLE LINEAR TRANSFORMATION CHAP. 7
p(T)!v 0 ... ao 0 .. ·0 I
I 0
I 0 0
Td-lp(T)iv
1 ad-l
p(TY-1v 0 ... ao
I 0
Td-lp(TY-1v I
I ad-l
!).
Cr
where CI , 1 ::s; i ::s; r, is the companion matrix of the elementary divisor
PI (x)", . The rational canonical form of an n-by-n matrix A over F is defined
to be the rational canonical form of a linear transformation T on an
n-dimensional vector space over F, whose matrix with respect to some
basis is A.
Notice that except for the ordering of the basis vectors, every entry
of C is determined by a knowledge of the {PI(X)"'}.
COROLLARY. Two n-by-n matrices with entries in F are similar if and only
if they have the same set of elementary divisors.
(25.17) DEFINITION. The Jordan normal form (or Jordan canonical form)
of a linear transformation (or a matrix) is defined to be the rational canon-
ical form in case all the characteristic roots belong to the field F. In that
case the elementary divisors all have the form (x - al)e, for al E F, and
the companion matrix of (x - ai)" is the e-by-e matrix
Example D. What is the rational canonical form of the matrix with rational
coefficients with elementary divisors
(x - 1)2,
(x + 1)2 and x - 2.
Therefore the Jordan canonical form is
EXERCISES
2. Find the rational canonical forms, over the field of rational numberst,
of the matrices
-1) '
(~ -1 (0 0 1)
1 0 0 .
010
3. Find the rational canonical forms, over the field of real numbers, of the
matrices given in Exercise 2.
4. Find the Jordan canonical forms, over the field of complex numbers, of
the matrices given in Exercise 2.
5. Prove that V is cyclic relative to a linear transformation T E L( V, V) if
and only if the minimal polynomial of T is equal to the characteristic
polynomial.
6. Find the Jordan canonical forms of the following matrices, over C. Which
of these pairs of matrices are similar?
a.
(~ ~), (~ ~).
b.
(- ~ ~), (~ -~).
c.
C' 0 0 D·
010
001
000 Go
-1
o 0 0
2 0 .
001
0 0)
7. Show that two n-by-n diagonal matrices with coefficients in F are similar
if and only if the diagonal elements of one matrix are a rearrangement of
the diagonal elements of the other.
8. Let F be an arbitrary field.
a. Let Ap(x) be the companion matrix of a prime polynomial p(x) E F [ x ] .
Prove that
D(xI - Ap(x)} = p(x}.
(Hint: Expand the determinant along the last column.)
b. Let A be a matrix in block triangular form,
t The rational canonical form of a matrix A over a field F is the rational canonical
form of a linear transformation corresponding to A, on a vector space over F.
SEC. 25 THE RATIONAL AND JORDAN CANONICAL FORMS 227
Prove that
D(xI - A) = Pl(X)el ... p,(x)", .
It is useful to begin our discussion with some ideas from set theory. Let
X be a set. A relation on X is an arbitrary set of ordered pairs 9£ =
{(a, b)}, a, b E X, where ordered pair means that we identify (a, b) with
(a', b') only if a = a' and b = b'. For example, the ordered pairs (1,2)
and (2, 1) are not equal. An example of a relation is the set of ordered
pairs of integers 9£ = {(a, b)}, where (a, b) E 9£ if and only if a < b. It
is often convenient to use the notation a 9£ b to mean that the pair
(a, b) E 9£.
228
SEC. 26 QUOTIENT SPACES AND DUAL VECTOR SPACES 229
Proof The reflexive property of ~ states that a~a for all a E X. Thus
a E [ a ] , and we have proved the first statement, that X is the union of the
equivalence classes [a]. Now suppose bE [a] n [a' ]. We have to
prove that [a] = [a']. First let c E [ a ]. Then c~a. Since b E [ a ] ,
we have b~a, and hence a~b by the symmetric property of~. Applying
the transitivity, c~a and a~b imply that c~b. Finally, c~b and b~a'
imply c~a', and c E [a' ], again by transitivity and the fact that
[ a ] c [a' ]. The same argument, with a and a' interchanged, shows
that [a' ] c [a]. Therefore [a ] = [a'] , and the theorem is proved.
For example, if ~
is the relation of equality, the equivalence class
[a] consists of the single element a. Iff: X ~ Yis a function, and R the
equivalence relation defined previously,
a~b if f(a) = f(b) ,
230 DUAL VECTOR SPACES AND MULTILINEAR ALGEBRA CHAP. 8
then the equivalence class [ a ] is the set of all b E X which are mapped by
f onto the same element as a, that is, [a] = {b E X If(b} = f(a}}.
be defined by
Tv /y([ v]) = [Tv] .
Then T y E L(Y, Y) and T v /y E L(VI Y, VI Y). The transformation Ty is
called the restriction of I to Y, while T v/y is called the transformation on
VI Y induced by T.
T E L(V, V). Let {V1' .•• , vn} be a basis of V such that {V1' ••• , Vk} is a
basis for Y. If Y i= V, then k < n, and [ Vk + 1 ] , ••• , [ Vn ] is a basis of
VI Y. Let A = (aif) be the matrix ofT with respect to the basis {V1' ••• , vn}.
Then A has the form
and hence
for some al E F. Since {V1' .•• ,vn } are linearly independent, we have
f3n + 1 = ... = f3n = o.
Now let us compute the matrix of T with respect to the basis
{V1' ••• ,vn}. Since Y is invariant relative to T, T(vI) E Y for I :::; i :::; k.
Therefore we have
SEC. 26 QUOTIENT SPACES AND DUAL VECTOR SPACES 233
Then
T v /y ( [Vk+l ] ) = ak+l,k+l [Vk+l ] + ... + a n ,k+l [V n ]
Let
We have shown that the matrix A of T with respect to the basis {VI' ... , v n}
has the form stated in the theorem, and that Al and A2 are the matrices of
Ty and T v /y, respectively. This completes the proof of the theorem.
A = ( ~!
0
~
0
-1 4)
6
2
1
1 '
o 0 -1 1
and let V be a vector space over R with basis {Vl,"" V4}' Let
T E L( V, V) be the linear transformation whose matrix with respect to
the basis is A. By Theorem (26.7) we see that Y = S(Vl. V2) is invariant
relative to T and that the matrices of Ty and Tv / y with respect to the bases
{VI. V2} and { [Va] , [V4 ]} are
and
respectively.
(26.9) LEMMA. Let {Vl' ... , vn} be a basis for V over F. Then there
exist linear functions {h , ... ,fn} such that for each i,
j =F i.
The linear functions {fl, ... ,fn} form a basis for V* over F, called the
dual basis to {Vl' ... , vn}.
Proof First of all, the linear functions exist because of Theorem (13.1),
which allows us to define linear transformations which map basis elements
of a vector space onto arbitrary vectors in the image space. We next
show that {fl, ... ,fn} are linearly independent. Suppose
Then applying both sides to the vector Vl , and using the definition of the
vector space operations in V*, we have
adl(vl) + ad2(vl) + ... + an/n(v l ) = o.
Therefore al = 0 because fl(Vl) = I and f2(Vl) = ... = fn(Vl) = O.
Similarly a2 = ... = an = O.
Finally we check that {fl, ... ,fn} form a set of generators for V*.
Let f E V*, and let f( VI) = at. i = I, 2, ... , n. Then it is easily checked
by applying both sides to the basis elements {Vl' ... , vn } in turn, that
f= adl + ... + an/no
This completes the proof of the lemma.
(26.11) THEOREM. Let TEL(V, V), let {VI, ... , vn } be a basis of V, and
{f1 , ... ,In} the dual basis of V*, in the sense of Lemma (26.9). Let A be
the matrix of T with respect to the basis {VI' ... ,vn }. Then the matrix of
T* with respect to the basis {fl, ... ,In} is the transpose matrix tAo (We
recall that if ati is the element in the ith row, jth column of A, then the
element in the ith row, jth column oftA is ait.)
Proof We have
Suppose
T*ji = L:
. flith,
i=1
where {fl, ... ,In} is the dual basis of V*, with respect to the basis
{VI' ... , vn } of V. We have to show that flit = aii' for all i and j. Apply
both sides of the second equation to an arbitrary basis vector Vic of V
to obtain
and we have shown that alk = fllcl for all i and k, completing the proof of
the theorem.
236 DUAL VECfOR SPACES AND MULTILINEAR ALGEBRA CHAP. 8
EXERCISES
We begin with a different and more general way oflooking at the relation-
ship between a vector space and its dual.
V E V, v' E V', a uniquely determined elem!!nt of F, B(v, v'), such that the
following conditions are satisfied:
Example A. Let V be a vector space over F, and V* the dual vector space,
V* = L( V, F). Let
B(v,!) = f(v) , fE V*, VE V.
and define
Then B(v, w) defines a bilinear form on (V, W). The verification of this
fact is left to the exercises.
Since the usual arguments using the vector space axioms, applied to
bilinear forms, show that
for all v, v' in Vand V', respectively, the nondegeneracy condition asserts
that the equation B(v, v') = 0 for all v', holds only in the unique case
when it is forced to hold, that is, when v = O.
238 DUAL VECTOR SPACES AND MULTllINEAR ALGEBRA CHAP. 8
In other words, Vand Ware dual with respect to B if, by the preced-
ing theorem, each vector space is isomorphic to the dual space of the
other, via the mappings defined by B.
(27.7) THEOREM. Let V and V' be finite dimensional vector spaces which
are dual with respect to a bilinear form B. Let T E L(V, V); then there
exists a uniquely determined linear transformation T' E L(V', V') such that
T' and T are transposes of each other. Similarly each linear transforma-
tion S E L( V', V') has a unique transpose in L( V, V).
Proof Because of the symmetry in the situation, it is sufficient to prove
only the first statement. Let T E L( V, V); we have to show that for each
v' E V' there exists a unique element v~ E V' such that
(27.8) B(v, v~) = B(T(v), v'), VE V,
so that we can define T'(v') to be v~. Because Tis a linear transformation,
the mapping
v -+ B(T(v), v')
is an element of the dual space of V. By Theorem (27.3) there exists a
unique element v{ E V' such that (27.8) holds. Now we define T': V' -+ V'
as T'(v') = v~. Then we have
(27.9) B(v, T' (v'» = B(T(v), v')
for all v E V, v' E V. We have to prove that T' E L(V', V'). The proof of
this fact is a good illustration of how dual vector spaces are used. We have
to show that for all a, fJ E F, v~, v; E V',
(27.10) T'(avl + fJv;) = aT'(v~) + fJT'(v;).
240 DUAL VECTOR SPACES AND MULTILINEAR ALGEBRA CHAP. 8
(27.11) DEFINITION. Let V and V' be dual vector spaces over F, with
respect to a nondegenerate bilinear form B. Let V1 and V~ be
subspaces of Vand V', respectively. We define the annihilator VI of V1
(with respect to B) to be the subspace of V' consisting of all v' such that
B(Vl' v') = 0 for all Vl E V1 , or simply, B(V1 , v') = O. Similarly, we
define (V~).L = {v E V I B(v, V~) = O}.
We note that VI and (V~).L are really subspaces because of the fact
that B is a bilinear form. The nondegeneracy of B is equivalent to the
statements V.L = 0 and (V').L = O. We remark also that there should be
no confusion from using the same symbol .L for annihilators of subspaces
of Vor V', since the meaning will be clear from the situation.
The following result is the main theorem on annihilators. In the
statement, reference is made to quotient spaces and induced linear
transformations (introduced in Section 26).
EXERCISES
We are already familiar with direct sums of vector spaces from Chapter 7.
We shall start out in this section with an approach to direct sums from a
different point of view, to prepare the way for the more difficult concept of
tensor product.
Proof We shall give only a sketch of the proof, leaving some details to
the reader.
(a) First, we show that the vectors are linearly independent. If
<X1(U1, 0) + ... + <Xk(Uk, 0) + fJ1(0, VI) + ... + fJI(O, VI) = 0,
then
and
Since {Uj} and {Vj} are linearly independent sets in U and V, we have
<Xl = ... = <Xk = fJ1 = ... = fJz = O. Now let (u, v) E U V. +Then
U = L <XjUI> V = L fJjVi, and hence (u, v) = L <Xj(Uj, 0) + L fJlO, Vi), and
part (a) is proved.
(b) Follows from part (a).
(c) The proof that U1 and VI are subs paces is omitted. To show that
U+ V is their direct sum, we have to check that every vector in U V +
is a sum of vectors in U1 and V1 , and that U1 n V1 = o. First, (u, v) =
(u,O) + (0, v) E U 1 + V 1 • Next, (u, v) E U 1 n VI implies U = 0 and
v = 0, and hence (u, v) = O.
SEC. 28 DIREcr SUMS AND TENSOR PRODUcrS 245
We are going to show that there exists, for each pair of vector spaces
U and V, a vector space U ® V, called their tensor product, with the
property that for every bilinear map A: U x V -+ W, there exists a linear
transformation L: U ® V -+ W which in a certain sense is equivalent to
the bilinear function A. In order to make this idea more precise, it is
convenient to introduce the notation fog, for the composite of two
mappings of sets, g:X -+ Y,f: Y -+ Z, given by
(f 0 g )(x) = f(g(x».
This is the usual "function of a function" idea from calculus, and was
already used in Chapter 3 to define the product of linear transformations.
as we see by applying both sides to (U1' V1), ... , (Uk' Vk) in turn, and
using the definition of u, * Vi' Moreover the coefficients in such a linear
combination are uniquely determined, because if
a1(u1 * v1) + ... + ak(uk * Vk) = a~(u1 * V1) + ... + a~(uk * Vk);
then, applying both sides to (U1, V1), (u 2 , V2), etc., we see that a1 = a~,
a2 = a;, etc.
Now we are ready to define U 0 V. Let Y be the subspace of
ff(U x V) generated by all functions
(U1 + U2) * v - U1 * V - U2 * v,
U * (V1 + V2) - U * V1 - U * V2,
aU * V - a(u * v),
u * aV - a(u * v),
with a, E F and u, E U, v, E V.
We next define a bilinear function t: U x V --+ U 0 V by setting
(28.6) t(u, v) = [u * v]
for all u E U, V E V. To show that t is bilinear, we start with the fact that
since U 0 V = ff(U, V)/ Y, we have [y] = 0 in U 0 V, for all y E Y.
Because of the definition of Y, we have
for all the vectors and scalars involved. The equations translate into the
formulas
t(Ul + U2, v) = t(Ul' v) + t(U2' v),
t(U, Vl + V2) = t(U, Vl) + t(U, V2),
t(au, v) = at(u, v) = t(U, av),
which state precisely that t is a bilinear function.
Moreover, since every IE:F(U, V) is a linear combination of the
functions U * v, it follows that U 0 V is generated by the elements t(u, v),
UEU,VEV.
Finally, we have to check that bilinear functions can be factored in
the required way. Let A: U x V --+ W be an arbitrary bilinear function.
Then we can define a linear transformation Al: :F(U x V) --+ W by
setting
(28.7) Al(al(ul * Vl) + ... + ak(uk * Vk»
= alA(ul, Vl) + ... + akA(uk , Vk),
for all at E F, and Ui E V, Vt E V. The definition of Al is legitimate because
we have shown that every element of :F(U x V) is a linear combination
of the functions Ut * Vt, and that the coefficients in such a linear com-
bination are uniquely determined. Checking that Al is a linear transforma-
tion is by now a routine matter, and we shall omit the details.
The important point is to observe that because A is bilinear, the
subspace Y is contained in the nullspace of Al. For example, using the
definition of Al we have
Al«Ul + U2) * v - Ul *V - U2 * v)
= A(Ul + U2, v) - A(Ul' v) - A(U2' v) = 0,
since A is bilinear. Similarly the other generators of Y belong to the null
space of Al.
Now we can define a linear transformation L: U 0 V --+ W by setting
(28.8) IE:F(U x V).
The definition of L is justified because if [f] = [I' ] , then I - I' E Y,
and hence Al(l - 1') = Al(l) - Al(l') = O. The fact that L is linear is
easily checked. (See also Exercise 5 of Section 26.)
We have to verify now that
Lot = A.
For U E U, V E V, we have by (28.6), (28.7) and (28.8),
(L 0 t)(u, v) = L( [u * v ]) = Al(U * v)
= A(U, v),
and the theorem is proved.
SEC. 28 DIREcr SUMS AND TENSOR PRODUcrS 249
:J(UX V)~l
U® V = :J (V X ~
V) / Y ___________
~ L~W
t -.............. UX V ..
X
FIGURE 8.1
Figure 8.1 illustrates the various steps of the proof. This kind of a
proof is sometimes called a proof by "general nonsense," which means
that the proof is an application of the general ideas and constructions that
we have made and is not easy to illustrate in terms of numerical examples.
We shall now proceed to get a more concrete hold on U 0 V.
will be denoted by
t(u,v) = u0v.
u®v
'1
u x v - - - - -..
FIGURE 8.2
Therefore,
n
T= L
1=1
Uj X VI.
Then
and we have proved that L is onto. Since dim L(U*, V) = mn, we have
dim (U ® V) ;:::: mn, and it follows that the vectors {u j ® Vj} are linearly
independent. Tj1is completes the proof of part (a).
(b) Part (b) is an immediate consequence of part (a).
(c) We first use the fact that Sand T are linear, and that U ® V is a
bilinear function, to show that the function
p.: (u, v) -+ S(u) ® T(v) , UEU, VEV,
where A = (au), B = (fJk/) are the matrices of Sand T with respect to the
given bases. Then
Ul ® VI
UI® Vn
Um®Vn
FIGURE 8.3
Their sum is
(all + ... + amm)(fJll + ... + fJnn) = (Tr A)(Tr B)
as we wished to show. This completes the proof of the theorem.
Proof Let {vu} be a basis for VI, {V2}} a basis for V 2 , and {V3k} a basis
for V 3 . By Theorem (28.10), the vectors {vu 0 (V2} 0 V3k)} form a basis
for VI ® (V2 ® V 3), while the vectors {(Vu ® V2}) ® V3k} form a basis for
(VI 0 V 2) 0 V3 . There exists a linear transformation T which carries
V1I ® (V2} 0 V3k) onto (VlI 0 V2') ® V3k, for all i,j, k and T is an iso-
morphism of vector spaces. The fact that the isomorphism carries
VI ® (V2 0 V3) to (VI 0 V2) 0 V3 for all VI E VI, V2 E V 2 and V3 E V3 is easily
checked by expanding VI, V2, V3 as linear combinations of the basis
elements {VlI}, {V2}}, and {V 3k}, respectively.
We shall identify VI 0 (V2 0 V 3) with (VI 0 V 2) 0 V 3 , and call this
vector space VI ® V 2 ® V3 (without parentheses). We shall also write
VI 0 V2 ® V3 for the element in VI 0 V 2 ® V3 corresponding to VI ®
(V2 ® V3)' Of course there is nothing special about a tensor product of
three vector spaces; we can equally well form the tensor product vector
space VI ® V 2 ® ... ® Vm of m vector spaces VI"", V m. The
elements of VI ® V 2 0 .. , ® Vm are sums
L vU l 0 V212 ® .. . ® Vmlm
with Vli l E VI, V21 2 E V 2 , ... , Vmlm E V m•
where the sum is taken over some set of k-tuples (i) = (iI, ... , i k ), which
are used to index the vectors from V which are involved in the tensor t.
for all {VI1 , . . • , Vlk} E V. Notice that Sq does not change the set of
vectors {VI1 , ••• , Vlk} in the tensor Vil 0 . . . ® Vlk ; it simply rearranges
their order of occurrence. For example, in <.8J2 V, let a = (12); then
256 DUAL VECTOR SPACES AND MULTILINEAR ALGEBRA CHAP. 8
(28.15) THEOREM.
a. The set of skew symmetric tensors in Q9k (V) form a subspace, which
will be denoted by /\k (V) (read: the k-fold wedge product of V).
b. Let Vi, ... , Vk be vectors in V. Then Vi 1\ .•. 1\ Vk = Lq €(a)(vq(l) ®
... ® Vq(k» is an element of /\k (V), for all {VI} in V.
c. The wedge product Vi 1\ ... 1\ Vk has the following properties:
(i) Vi 1\ ... 1\ Vk is linear, when viewed as a function of anyone
factor,for example,
Remark. Before giving the proof, we point out that (d) provides a far-
reaching extension of the idea which led to the determinant function.
The determinant provides a test for linear independence of a set of n
vectors in an n-dimensional space. Part (d) provides a similar test which
applies to k vectors in n-dimensional space, for k = I, 2, ....
SEC. 28 DIRECT SUMS AND TENSOR PRODUCTS 257
Proof (a) The fact that the skew symmetric tensors form a subspace of
0k V follows at once from the fact that the symmetry operators Sa are
linear transformations.
(b) Let T be a permutation of {I, 2, ... , k}. Then
since €(aT) = €(a)€(T) from Section 19, and since the mapping a~ aT
is simply a rearrangement of the permutations of {I, 2, ... , k}.
(c) Property (i) is clear, and (ii) follows by applying Sa, with a the
transposition (ij), to Vl A .•. A Vk'
(d) If Vl, ... , Vk are linearly dependent, then Vl A ••• A v k = 0, by
part (c) and exactly the reasoning given in Section 16 to prove the analogous
result for determinants.
Now suppose Vl,"" Vk are linearly independent. Find vectors
Vk + l ' . . . ,Vk such that {Vl' ... , v n } is a basis of V. Then the vectors
appearing in the sum defining Vl A ... A Vk are distinct and form part of a
basis of 0k V. Hence v l A ... A Vk =1= O.
(e) The vectors {vh 0 ... 0 vik I I ::;; jl, ... ,jk ::;; n} form a basis
for 0k V. Let
EXERCISES
In all the exercises, the vector spaces involved are assumed to be finite
dimensional.
A1 = (~ !), (~ ~),
A2 = (-~ ~), (-11 0)0 .
Check that the final result agrees with A1A2 X B1B2 .
2. Let {U1, ... , Uk} be linearly independent vectors in U, and {V1, ... , Vk}
arbitrary vectors in V. Show that Z UI ® VI = 0 in U ® V implies that
V1 = ... =Vk = O.
3. Let S E L( u, U), T E L( V, V), and let a and {3 be characteristic roots of
Sand T, respectively, belonging to characteristic vectors U E U and
V E V. Prove that U ® v =I 0 in U ® V (using Exercise 2, for example)
and that U ® v is a characteristic vector of S ® T belonging to the
characteristic root a{3.
4. Suppose A and B are matrices in triangular form, with zeros above the
diagonal. Show that A X B has the same property, and hence that
every characteristic root of AX B can be expressed in the form a{3,
where a is a characteristic root of A and {3 a characteristic root of B.
5. Let S E L( u, U), T E L( V, V) and suppose the base field is algebraically
closed. Prove that every characteristic root of S ® T can be expressed
in the form a{3, with a and {3 characteristic roots of Sand T, respectively.
6. a. Prove that every linear transformation T E L( u, V) can be expressed
in the form z/l x VI> with {II} in U* and {VI} in V, where I x v is
the linear transformation in L(U, V) defined by (f x v)(u) =
I(u)v, U E U,fE U*, V E V. [Hint: Follow the proof of Theorem
(28.10). ]
b. Let X E L( u, V) be expressed in the form zil x VI, II E U*, VI E V,
according to part (a). Let S E L( u, U), T E L( V, V). Show that
(S* 0 T-l)X = X,
b: V x W~B(V, W)*
defined by
b(v, w)(f) = [(v, w), jEB(V,W), VEV, WEW.
10. Let Ak (V) be the k-fold wedge product of V, and let dim V = n. Prove
that dim A k (V) = (Z) (binomial coefficient).
11. Let T E L( V, V). Extend Theorem (28.10) to prove that there exists a
linear transformation T(k) of 0k( V) such that
12. Let dim V = n, and let {Vl' ••• , vn } be a basis for V. Prove that the
tensor Vl A ••• A Vn is a basis for 1\ n (V). Let T E L( V, V). Prove
that
T<n)(Vl A ••• A Vn) = D(T)(Vl A '" A vn),
for all TEL(V, V). (Hint: Use the uniqueness of the determinant
function, proved in Section 17.)
(29.1) There exist vectors {Vl' ... ,Vr}, whose orders are prime powers
{Pl(XY', ... ,Pr(xY,} in F [x] , such that V is the direct sum of the cyclic
subspaces relative to T generated by {Vi' ... , vr}.
(29.2) Suppose
V = <Vi) EB ... EB <Vr >= <V~> EB ... EB <V;>,
where the vectors {VI} and {va have prime power orders
and
respectively. Then r = s, and the polynomials
and
are the same, up to a rearrangement.
V* = W* EB <Vl)l. ,
and that W* is invariant relative to T*. Then by Theorem (27.12),
where {vi!' ... , Via} and {V~l' ... , V~b} are precisely the generators of the
cyclic subspaces id (29.2) whose orde~s are powers of hJCx). Therefore it
is sufficient to prove (29.2) for the case in which we have V expressed as a
direct sum of cyclic spaces in two different ways,
and the orders of the generators {VI} and {va are all powers of single prime,
which we shall call p(x). We let the orders of the {Vj} be {p(x)a 1 , • • • ,
p(x)ar} , with aj > 0, and the orders of the {va will be denoted by
{p(X)b 1 , • • • ,p(X)bs} with all bj > O. To prove the uniqueness part of the
theorem, it will be sufficient to prove that first, r = s, then that the number
264 DUAL VEcrOR SPACES AND MULTILINEAR ALGEBRA CHAP. 8
of a's and b's equal to one is the same, then that the number of a's and b's
equal to two is the same, etc.
We begin by computing n(p(T)), using the two decompositions in
turn. Let
v = fl(T)Vl + ... + J..(T)v r E<Vl> E8 ... E8 <Vr>,
where the hex) E F [ x ]. Suppose v E n(p(T)). Then
p(T)fl(T)Vl + ... + p(T)J..(T)vr = 0,
and because of the direct sum we have
p(T)fl(T)Vl = ... = p(T)J..(T)v r = O.
Therefore
and the dimension of n(p(T)) is rd, where d is the degree of p(x). A similar
calculation using the subspaces <v;> shows that the dimension of n(p(T))
is sd, and we conclude that r = s. -
Now let Xl denote the number of ai's equal to one, X2 the number of
ai's equal to two, etc., Similarly, let Yl denote the number of bi's equal
to one, Y2 the number equal to two, etc.
We can use the same argument used to compute n(p(T)) to find
n(p(T)2) and obtain
(direct sum)
(direct sum).
Computing the dimension of n(p(T2)) using the facts that the dimension
of <VI> is d if a l = 1 and the dimension of <PI(T)a,-2Vt> is 2d, if al ~ 2,
we have
EXERCISES
266
SEC. 30 THE STRUCTURE OF ORTHOGONAL TRANSFORMATIONS 267
The matrix
A= (
COS
. 21T
sm 3
21T
3 -sin 2;) =
21T
cos 3
(_~
\13
""2-'2
_
1
'?)
is an orthogonal matrix such that A3 = I. Its minimal polynomial is
x2+x+l
which is a prime in the polynomial ring R [ x ]. Therefore, by Theorem
(23.11), A cannot be diagonalized over the real field and cannot even be
put in triangular form, since a triangular 2-by-2 orthogonal matrix can
easily be shown to be diagonal. Thus the methods of Chapter 7 yield
little new information, even in this simple case.
The difficulty is that the real field is not algebraically closed. It is
here that, as Hermann Weyl remarked, Euclid enters the scene, brandishing
his ruler and his compass. The ideas necessary to treat orthogonal trans-
formations on a real vector space will throw additional light on the
problems considered in Chapter 7, as well. The existence of an inner
product allows us to get information about minimal polynomials, etc.,
that would have seemed almost impossible from the methods of Chapter
7 alone. We begin with some general definitions and theorems.
(
COS 8 -sin 8)
cos 8 .
sin 8
In other words, T w is a rotation in the two-dimensional space W (see
Section 14).
I T(w) I = Ilwll
implies IAI = 1. Therefore, A = ± 1 and T(w) = ± w.
If dim W = 2, then the minimal polynomial of T has the form
x 2 +ax+{3, a2 -4{3<0.
Let {Wl' W2} be an orthonormal basis for Wand let
T( Wl) = AWl + fLW2'
Then A2 + fL2 = 1 and, since (T(w l ), T(W2» = 0, T(W2) is either - fLWl +
AW2 or fLWl - AW2' In the first case, the matrix of T is
SEC. 30 THE STRUCTURE OF ORTHOGONAL TRANSFORMATIONS 269
and we can find e such that cos e= '\, sin e = jL, since ,\2 + jL2 = 1. In
the latter case, the matrix is
W= Wi EEl Wi.
Clearly, Wi n Wi = {O}, since (w, w) =F 0 if w =F O. Now let WE W
and let {Wi' ... , ws} be an orthonormal basis for Wi where s = 1 or 2.
Then
and since 2:1 (w, w,)Wj E Wi and W - 2:1 (w, Wi)W, E Wi, we have W =
Wi + Wi. This fact together with the result that Wi n Wi = {O}
implies that W = Wi EEl Wi.
Now we prove the key result that Wi is also an invariant subspace.
Let w' E Wi. Since T is orthogonal, T(Wi ) = Wi and (Wi' w') =
(T(Wi ), T(w')) = (Wi' T(w')) = O. Therefore T(w') is also orthogonal
to all the vectors in Wi.
Since T(Wt) c WI. T is an orthogonal transformation of Wi and,
by the induction hypothesis, Wi is a direct sum of pairwise orthogonal
irreducible invariant subspaces. Since V = Wi EEl Wt, the same is true
for V, and the theorem is proved.
270 ORTHOGONAL AND UNITARY TRANSFORMATIONS CHAP. 9
-1
-1
cos 81 - sin 81
sin (h cos 81
cos 82 - sin 82
sin 82 cos 82
with zeros except in the 1-by-1 or 2-by-2 blocks along the diagonal.
Proof The proof is immediate by Theorems (30.3) and (30.4) since we
can choose orthonormal bases for the individual subspaces WI in Theorem
(30.4) which, when taken together, will form an orthonormal basis of V.
EXERCISES
(-1o 0)1 .
2. An orthogonal transformation T of Ra is called a rotation if D(T) = 1.
Prove that if T is a rotation in Ra there exists an orthonormal basis of Ra
such that the matrix of T with respect to this basis is
(o1 cosO 0)
0 -sinO
o sin 0 cos 8
for some real number 8.
3. Prove that an orthogonal transformation Tin Rm has 1 as a characteristic
root if D(T) = 1 and m is odd. What can you say if m is even?
SEC. 31 THE PRINCIPAL-AXIS THEOREM 271
(31.5) THEOREM. Let Q be a quadratic form on V and let {el , ... , en} be
a basis of V over R. Define an n-by-n matrix S = (Uii) by setting
l::;i, j::;n,
where B is the bilinear form in Definition (31.4). Then'S = S, and for all
a = L O(iei E V we have
n n
(31.6) Q(a) = B(a, a) = L ftiftFii = i L=1 a.7<Tii + 2 i L
i,i =1 <i
ftiftFii'
(!~ !~).
Proof of Theorem (31.5) The first part of the theorem is immediate from
Definition (31.4). For the converse, let S = ts be given and define a
function
B(a, b) = ~ ftif3Fii
t,i = 1
for a = L O(ie;, b = L Pie;. Then it is readily checked that B is a symmetric
bilinear form on V such that B(e;, e) = (Jij' and defines a quadratic form
Q(a) = R(a, a), as in Definition (31.4).
1 ~ i ~ n.
Then the matrix of Q with respect to the basis {II, ... .!n} is given by
S'=tcsc
where C = ('Ylj)'
Proof. Let S' = (a;j). Then
1 ~ i ~ n,
t For a different proof and an application of this theorem to mechanics, see Synge
and Griffith, p. 318 (listed in the Bibliography).
274 ORTHOGONAL AND UNITARY TRANSFORMATIONS CHAP. 9
The proof is similar to the proof of part (4) of Theorem (15.11) and
will be omitted.
since Tw E Wand w' E W.l. We may now conclude that V is the direct
sum of pairwise orthogonal irreducible invariant subspaces {Wl' ... , Ws}.
It is now sufficient to prove that for all i, where 1 :s; i :s; s, dim Wt = 1.
Since dim Wt = 1 or 2, it is sufficient to prove that a symmetric transforma-
tion T on a two-dimensional real vector space Walways has a characteristic
vector. Let {Wl' W2} be an orthonormal basis for W; then, relative to
{Wl' W2}, T has a symmetric matrix
EXERCISES
X = G! ~).
a. Find the characteristic roots of X.
b. Define a symmetric transformation T in R3 whose matrix with
respect to the orthonormal basis of unit vectors {el' e2, e3} is X.
Find by the methods of Chapter 7 a basis of R3 consisting of charac-
teristic vectors of T [we know that can be done, by Theorem
(31.12) ]. Modify this basis to obtain an orthonormal basis
{/l ,/2 ,/3} of R3 consisting of characteristic vectors of T. Let
and
Tit = adi, al E R, i = 1, 2, 3 .
Then M = (/Lu) is an orthogonal matrix, such that M-IXM is
diagonal. Since M is orthogonal, M-l = 1M, so this computational
procedure also can be applied to find the principal axes of a vector
space with respect to a quadratic form.
276 ORTHOGONAL AND UNITARY TRANSFORMATIONS CHAP. 9
t Exercises 9 and 10 require some familiarity with the calculus of functions of several
variables.
SEC. 31 THE PRINCIPAL-AXIS THEOREM 277
Xl
FIGURE 9.1
carries Q(X1, ... , xn) into AIY~ + ... + "nY~, where the "I
are the
(positive) characteristic roots of S. Then use the formula for changing
the variable in a multiple integral (see R. C. Buck, loco cit.) to show that
278 ORTHOGONAL AND UNITARY TRANSFORMATIONS CHAP. 9
(32.1) THEOREM. Let u = <a1, ... , an), V = <b1 , •.. , b n) belong to the
vector space V = en, and define
n
(u, v) = 2: aib;.
i= 1
for (u, u). If (u, u) = 0, then because all aia; ~ 0, we have aia; = 0, and
aj = 0 for all i. Therefore (u, u) = 0 if and only if u = 0, and the theorem
is proved.
The reader should note that because of the property (u, av) = ii(u, v),
the function (u, v) is not a bilinear form on the vector space V.
Note that since (v, w) = 0 implies (w, v) = 0, the definition could have
been worded just as well
W1. = {v E V I (w, v) = 0 for all WE W}.
The properties of (u, v) imply that W1. is a subspace. Since W1. is defined
in terms of a particular scalar product on V, there is no danger of confusion
with the annihilators vt defined in Chapter 8, with reference to a bilinear
form on a pair of vector spaces.
Proof Statement (a) is clear from the definition of W1.. Statement (c)
follows by the same argument used in the proof of Theorem (30.4) to
show the corresponding result for real vector spaces, and we shall not
repeat the details. From part (b), we have
dim W1. = dim V - dim W,
and hence
dim Wl.l. = dim W.
Since We Wl.l. from the definition of W1., we conclude that W = Wl.l.,
and part (b) is proved. This completes the proof of the theorem.
A·tA = I,
Proof Once again the proof is similar to the corresponding result for
orthogonal transformations. First suppose U is unitary. Applying the
condition I Uvil = Ilvll to V + wand V + iw yields
(Uv, Uw) + (Uw, Uv) = (v, w) + (w, v)
and
(iU(w), U(v» + (Uv, iu(w» = (iw, v) + (v, iw).
Using the fact that i = - i, the second equation yields
i(U(w), U(v» - i(U(v), U(w» = i(w, v) - i(v, w).
Upon canceling i, we can add the equations to obtain
(U(v), U(w» = (v, w),
which is (a).
Now assume (a). If {VI' ... , Vn} is an orthonormal basis of V, then
(a) implies that {U(VI), ... , U(V n )} is an orthonormal set, and hence an
orthonormal basis since vectors in an orthonormal set are linearly in-
dependent. Therefore (a) implies (b).
Next assume (b), and let {VI' ... , vn} be an orthonormal basis. Letting
A = (a ij) be the matrix of U with respect to the basis, we have
and hence ali = I, proving that lal = 1. We prove the second part of
the theorem by induction on dim V, the result being clear if dim V = 1.
Suppose dim V > I. Since C is an algebraically closed field, there exists
a characteristic vector V 1 belonging to some characteristic root of U, and
we may assume that Ilvd = 1. Let W = S(V1); then by Theorem (32.5),
V = W EEl Wi, and dim Wi = dim V - I. If we can show that Wi is
invariant relative to U, then the restriction of U to Wi will be a unitary
transformation of Wi, and the theorem will follow from the induction
hypothesis. Let (V1' w) = 0; we have to show that (V1' U(w» = o.
Since UV1 = aV1 for some a # 0, we have
product (u, v). Let T E L(V, V). A linear transformation T' E L(V, V)
is called an adjoint of T if
(Tu, v) = (u, T' v)
for all u, v E V.
The next theorem shows that the adjoint always exists, and is uniquely
determined.
and
while
Therefore T' is an adjoint of T, and has the matrix tA with respect to the
basis. The uniqueness of T' follows from the fact that if T" is another
adjoint, then
(v, (T' - T")w) = 0
for all v and w, and hence T' = T". The proofs of the formulas for
(aT)', (Tl + T2)', (Tl T 2 ), and T" follow directly from the uniqueness of
the adjoint and are left to the reader as exercises. This completes the
proof.
and
E; = E; . 1 = E;(El + ... + Es) = E;Et •
Therefore E t = E;, and the lemma is proved.
This result has essentially been proved along with Theorem (32.16),
and we leave the proof of the corollary as an exercise.
We conclude this section with one of the striking applications of the
Spectral Theorem. We recall that each complex number z = a + f3i =1= 0
can be expressed in polar form, z = r(cos e + i sin e), where r is a
positive real number, and cos e + i sin e is a cOl11plex number of absolute
value one. Our application is a far-reaching generalization of this fact
about complex numbers, to transformations on a vector space with a
hermitian scalar product. The analog of a complex number of absolute
value one is certainly a unitary transformation. We now define the
analog of a positive real number.
(32.18) DEFINITION. A linear transformation on V is called positive if
T is self-~djoint and (Tv, v) is real and positive for all vectors v =1= O.
EXERCISES
2. Show that a unitary matrix in triangular form (with zeros below the
diagonal) must be in diagonal form.
3. Show that
A= (
-,
~ i)0
is a unitary matrix. Find a unitary matrix B such that BAB -1 is a
diagonal matrix.
4. Let V* be the dual space of V. Prove that if f E V*, then there exists a
unique vector WE V such that f(v) = (v, w) for all v E V.
5. Show that U E L(V, V) is unitary if and only if UU' = 1, where U ' is
the adjoint of U.
6. Show that if T E L( V, V) is normal, then W is a T-invariant subspace of
V if and only if WJ. is a T'-invariant subspace.
7. Prove that if T is a normal transformation with spectral decomposition
T = L aiEl according to Theorem (32.16), then T' = L aiEl is a spectral
decomposition of T'.
8. Show that if U is a unitary transformation, then U is normal.
9. Show that a normal transformation T with spectral decomposition
T = L aiEl is unitary if and only if Iad = 1 for all characteristic roots
al of T.
Some Applications of
Linear Algebra
289
290 SOME APPUCATIONS OF UNEAR ALGEBRA CHAP. 10
so that we have
( J
T2 = 1, D(T) =-1
for all reflections T. Our first important result is the following theorem,
due to E. Cartan and J. Dieudonne, which asserts that every orthogonal
transformation is a product of reflections.
and since dim (Hi + H 2) :os; 3 we have dim (Hi n H 2) ;;:: l. Let x be a
nonzero vector in Hl n H2 ; then Tx = x, and the corollary is proved.
F E V
tetrahedron 4 6 4
cube 6 12 8
oCtahedron 8 12 6
dodecahedron 12 30 20
icosahedron 20 30 12
Dodecahedron Icosahedron
FIGURE 10.1
FIGURE 10.2
Proof Let p ,..., p', let.?/!' be the subgroup of (g consisting of the identity
I together with the elements which have p as a pole, and let .?/!" be the
corresponding subgroup for p'. Let p = Tp'. Then an easy verification
shows that the mapping X ~ TXT- 1 is a one-to-one mapping of .?/!"
onto .?/!', and this establishes the lemma.
Proof Let.?/!' be the subgroup associated with p and, for X E G, let X.?/!'
denote the set of all elements Y E (g such that Y = XT for some T E.?/!'.
We observe first that, for all X E (g, Xp is a pole and that poles Xp and
X'p are the same if and only if X' E X.?/!'. Thus the number of poles
equivalent to p is equal to the number of distinct sets of the form X.?/!'.
The set X.?/!' is called the left coset of.?/!' containing X. We prove now
that each element of (g belongs to one and only one left coset and that the
number of elements in each left coset is Vp. If X E (g, then X = X· I E
X.?/!' since 1 E.?/!'. Now suppose XE X'.?/!' n X".?/!'; we show that
X'.?/!' = X".?/!'. We have, for YE X'.?/!', Y = X'T for some TE.?/!'.
Moreover, X E X'.?/!' n X".?/!' implies that X = X'T' = X"T" for T' and
Til E.?/!'. Then Y = X'T = X(T')-lT = X"T"(T')-lT E X".?/!'. Thus,
X'.?/!' c X".?/!' and, similarly, X".?/!' c X'.?/!'. Therefore X'.?/!' = X".?/!',
and we have proved that each element of (g belongs to a unique left coset.
Now let X.?/!' be a left coset. Then the mapping T~ XTis a one-to-one
mapping of .?/!' onto X.?/!', and each left coset contains vp elements where
vp is the order of .?/!'.
We have shown that the number of poles equivalent to p is equal to
the number of left cosets of .?/!' in (g. From what has been shown, this
number is equal to Njv p , and the lemma is proved.
Now we can finish the proof of Theorem (33.4). Consider the set of
all pairs (T, p) with T E (g, T i= I, and p a pole of T. Counting the pairs
in two different ways, t we obtain
2(N - 1) = 2: (v p
p
- 1)
tWe use first the fact that T =F 1 is associated with two poles and next that with each
pole are associated (v - 1) elements T E @ such that T =F 1.
296 SOME APPUCATIONS OF UNEAR ALGEBRA CHAP. 10
(33.7) 2- ~N Lc (1 - 1.).
=
Vc
The left side is > I and < 2; therefore there are at least two classes C
and at most three classes. The rest of the argument is an arithmetical
study of the equation (33.7).
Case 1. There are two classes of poles of order VI , V2 • Then (33.7) yields
211
---+-
N - VI V2'
and we have NlvI = Nlv2 = 1. This case occurs if and only if qj is cyclic.
Case 2. There are three classes of poles of orders VI, V2, V3, where we
may assume VI :s; V2 :s; V3 . Then (33.7) implies that
2- } (1 - ;J + (1- ;J + (I - ;J
=
or
1 +~ = 1- + 1- + 1-.
N VI V2 V3
Not all the VI can be greater than 2; hence VI = 2 and we have
1 -' 2 I 1
2+ N = V2 + V3'
Not both V2 and V3 can be ;::: 4; hence V2 = 2 or 3.
EXERCISES
2. Let r:§ be a finite group of orthogonal transformations and let .Yt' be the
rotations contained in r:§. Prove that .Yt' is a subgroup of r:§ and that if
.Yt' '# r:§ then, for any element X E r:§ and X ¢.Yt', we have
and prove that «x, y» has the required properties. Note that the same
argument can be applied to finite groups of invertible linear transforma-
tions in Rn for n arbitrary.]
4. Let r:§ be the set of linear transformations on C2 where C is the complex
field whose matrices with respect to a basis are
± (i
0 i)
0 ' ± (0
i 0) .
-:-i
Prove that r:§ forms a finite group and that there exists no basis of C2
such that the matrices of the elements of r:§ with respect to the new basis
alI have real coefficients. (Hint: If such a basis did exist, then r:§ would
be isomorphic either with a cyclic group or with a dihedral group, by
Exercise 3 and the results of Section 14.)
298 SOME APPLICATIONS OF LINEAR ALGEBRA CHAP. 10
(34.1)
and in this case we know from elementary calculus that the function
yet) = y(O)eat
solves the differential equation and takes on the initial value yeO) when
t = O.
We shall show how matrix theory can be used to solve a general
system (34.1) in an equally simple way. We should also point out that
our discussion includes as a special case the problem of solving an nth-
order linear differential equation with constant coefficients.
(34.2)
where the at are real constants. This equation can be replaced by a system
of the form (34.1) if we view y(t) as the unknown functionh(t) and rename
the derivatives as follows:
l::S;i~n-1.
SEC. 34 APPLICATION TO DIFFERENTIAL EQUATIONS 299
al
-dYn = - -an Yl - -
an-l
- Y2 - ... - - Yn
dt ao ao ao
and, conversely, any set of solutions of the system (34.3) will also yield a
solutionYl(t) = yet) of the original equation (34.2). The initial conditions
in this case amount to specifying the values of yet) and its first n - I
derivatives at t = O.
Now let us proceed with the discussion. We may represent the
functions Yl(t), ... , Yn(t) in vector form (or as an n-by-I matrix):
yet) = (Yl~t»).
Yn(t)
We have here a function which, viewed abstractly, assigns to a real
number t the vector yet) ERn. We may define limits of such functions
as follows:
t-+to Yl(t»)
lim
lim yet) = ( :
t"'lo 1·1m Yn (t )
t ... to
for some complex number u, provided that for each € > 0 there exists
a I) > 0 such that 0 < It - tol < I) implies la(t) - ul < € where la(t) -
ul denotes the distance between the points a(t) and u in the complex plane.
By using the fact (proved in Section 15) that
lu + vi ~ lui + Ivl
300 SOME APPUCATIONS OF LINEAR ALGEBRA CHAP. 10
for complex numbers u and v, it is easy to show that the usual limit
theorems of elementary calculus carryover to complex-valued functions.
We may then define, as in the case of a vector function y(t),
lim A(t) ,
t-+to
da
dt
ll
... dalT)
dt
dA = lim A(t
dt h-+O
+ h)
h
- A(t) = (
d~;l ... d;;r
and
lim An(t) =
n-+ 00
(34.4) dy = Ay
dt
where
Yl(t»)
y(t) =( :
Yn(t)
the limit of the sequence {E(n)} exists. Let p be some upper bound for the
{!,8Ii!}; then !,8li!
~ p for all (i,j). Let Bn = (,8\1». Then the (i,j) entry
ofE(n) is
and we have to show that this sequence tends to a limit. This means that
the infinite series
~ Af!l~)
k.
k=O
i
k=O
A,8f})
k.
is dominated in absolute value by the corresponding term in the series of
positive terms
and this series converges for all p by the ratio test. This completes the
proof that eB exists for all matrices B.
We now list some properties of the function B -+ eB , whose proofs
are left as exercises,
al 0
0
0 al
D= a2 0
0 a2
t 2 N2 t T - 1Nr-1
while eN = I + tN + -2- + ... + (r - I)!
if Nr = O. The solution
vector yet) = S etD eN S- 1yo.
Example A. Consider the vector differential equation
dy
dt = Ay, with A= (2 2)
-2 -3 .
We shall calculate a solution y of the differential equation satisfying the
initial condition yeO) = Yo. The minimal polynomial of A is x 2 + X - 2,
and we see that A is diagonable (why?). The characteristic roots of A
are - 2 and 1, with corresponding characteristic vectors,
and
where e Dt = (e- 2t 0 )
o et •
D=
(-10 -10) ' N = (~ ~).
Since ND = DN, we have
where eDt =
EXERCISES
1. Let
and hence that y = eAt. Yo. Thus the solution of the differential equation
dy/dt = Ay satisfying the given initial condition is uniquely determined.
5. Prove that t(eA) = etA, where tA is the transpose of A.
6. Define a matrix A to be skew symmetric if tA = - A, for example,
7. Apply the methods of this section to prove that the differential equation
(considered in Section 1)
m a positive constant,
-3)
-2 '
(0o 01 0)1 .
100
10. Let A be the coefficient matrix of the vector differential equation
equivalent to
: = Ay + f(t)
where f(t) is a given continuous vector function of t. Attempt to find
a solution of the form yet) = eAte(t), where e(t) is a vector function to-
be determined. Differentiate to obtain
dy de(t)
- = A eAte(t) + eAt - - = Ay + f(t).
dt dt
Since AeAte(t) = Ay, we obtain [since (eAt)-l = e- At ]
de
dt = e-Atf(t).
Since lxi, Iyl and Ix * yl are the square roots of the sums of squares of
their components, Problem I can now be reformulated as follows.
(~ (~
0 0 1 0
C1 =
-1
0
0
-1
0
0 -1
g). C2 =
0
0
0
0
0
-1
!) ,
~ (! -~)o ' ~ (~
0 I 0 0
0 0 0 I
C, 0 0
0 0
C,
(35.5) LEMMA.
a. Every n-by-n matrix A can be expressed
A = t(A + tA) + -!(A - tA)
as a sum of a symmetric matrix -!(A + tA) and a skew symmetric
matrix -!(A - tA). Moreover, if A = 8 1 + 8 2 , with 8 1 symmetric
and 8 2 skew symmetric, then 8 1 = t(A + tA), and 8 2 = -!(A - tA).
b. Suppose 8 is an invertible n-by-n skew symmetric matrix. Then n is
even, n = 2m, for some m.
Proof We first check that t(A + tA) and -!(A - tA) are symmetric and
skew symmetric, respectively. We have
t [ -!(A + tA) ] = WA + ttA) = t(A + tA),
and t [t(A - tA)] = !(tA - ttA) = -t(A - tA),
as required. The fact that A is the sum of the two matrices in question
is clear. Now suppose
A = 81 + 82 = T1 + T 2,
where 8 1 and T 1 are symmetric, and 8 2 and T 2 are skew symmetric. Then
81 - T1 = T2 - 82
is both symmetric and skew symmetric, and the only such matrix is the
zero matrix. This completes the proof of part (a).
To prove (b), we have, from Chapter 5,
D(8) = D(t8) = D( - 8) = (- I )nD(8) .
Since D(8) # 0 by assumption, we have
(-l)n = 1,
and n is even.
Proof of Theorem (35.3). We derive consequences of the assumption that
an identity as in Problem II holds for all x's and y's. We may assume
n > 1.
We first put, for a given set of x's,
~i, j~n.
310 SOME APPLICATIONS OF LINEAR ALGEBRA CHAP. 10
becomes
For a fixed set of x's, this identity holds for all y's. Therefore we can set
each Y1 in turn equal to one, and the rest equal to zero, to obtain the
equations
Cancelling the terms involving yr from both sides of the equation (35.6),
we have
n n
o= 2 L: L: (CX1lCX 11 + CX21CX21 + ... + CXnICXn1)YIY1'
1=11=1
Now set YI = Y1 = 1 and the other y's equal to zero to obtain
o= 2(a1lCX11 + ... + CX nICX n1)'
for 1 :::; i,j :::; n. The equations we have derived can be put in the
economical form:
and hence
(35.9) (x!tB l + ... + xn_ltBn_l + xnI)(xlB l + ... + xn-1Bn- l + xnI)
= (i xl)I,
1=1
Cancelling these terms, and setting XI = Xi = 1 for i '# j, and the other
x's equal to zero, we have
l~i~n-l,
and ~ i, j ~ n - 1, i '# j.
which can be formed from the BI's. Notice that in such a product, if
Bl occurs it can be moved past the other BI's to the first position, by
(35.10), leaving the product otherwise unchanged except for a ± sign in
312 SOME APPLICATIONS OF LINEAR ALGEBRA CHAP. 10
G) + G) + 1 = 16
skew symmetric ones. But we can prove directly (see the exercise) that
there are exactly
1 +2+3+4+5= 15
314 SOME APPLICATIONS OF LINEAR ALGEBRA CHAP. 10
315
316 BIBUOGRAPHY
Some books which influenced the writing of the present one and which
develop the subject of linear algebra more deeply in some respects, are
those of Bourbaki, Halmos, Jacobson, Kaplansky, Noble (Applied Linear
Algebra) and Schreier and Sperner. The books of Birkhoff and MacLane,
MacLane and Birkhoff, and Van der Waerden all contain thorough
investigations of the axiomatic systems of modern algebra (groups, rings,
fields, modules, etc.), which have all appeared naturally, and in a concrete
way, in this book.
The real purpose of these notes, however, is to show the reader
where he can study the connections between linear algebra and other
parts of mathematics. The books of Artin, Benson and Grove, Gruenberg
and Weir, and Kaplansky explore many aspects of the fascinating interplay
between geometry and linear algebra. The two books by Noble will
provide the reader with an orientation towards the computational aspects
of linear algebra, and to the many, and sometimes unexpected, applications
of linear algebra to the sciences and engineering. The usefulness of linear
algebra in analysis is illustrated from several points of view in the books
of Boyce and DePrima, Halmos, and Smith.
Solutions of
Selected Exercises
SECTION 2
1. (a) The statement holds for k = 1. Assume it holds for k ~ 1. Then
1 + 3 + 5 + ... + (2k - 1) + [2(k + 1) - 1 ]
= k2 + 2(k + 1) - 1 = (k + 1)2.
2. (a) u = alfJ and v = yl8 are solutions of the equations fJu = a and
8v = y, respectively. Multiply the equations by 8 and fJ and add,
obtaining fJ8(u + v) = a8 + fJr.
3. Suppose for some n,
Then
(a + fJ)n+l = (a + fJ)n(a + fJ) = (~)an+l + (~)anfJ + ...
Collect terms and use the definition of (n Z1) to complete the proof.
317
318 SOLUTIONS OF SELECTED EXERCISES
SECTION 3
1. (a) (1,4, - 2>.
(b) <-2,4,2> + <-2, -1,3> + <0,1,0> = <-4,4,5>.
(d) <- a + 2f3, 2a + f3 + y, a - 3f3>.
2. Check your answers by substitution in the equations.
~ ~ ~ ~
SECTION 4
1. (a) Not a subspace; (b) subspace; (c) subspace; (d) not a subspace;
(e) subspace; (f) this set is a subspace if and only if B = 0; (g) not a
subspace.
3. (a) Subspace; (b) not a subspace; (c) subspace; (d) not a subspace;
(e) subspace; (f) subspace; (g) subspace; (h) this set is a subspace if and
only if g is the zero function: g(x) = 0 for all x.
4. (a) Linearly independent; (b) linearly dependent, -3(1,1> + <2,1> +
(1,2> = 0; (c) linearly independent; (d) linearly dependent, f3<0, I>
+ a(1,O> - <a, f3> = 0; (e) linearly dependent, <1, 1,2> - (3, 1,2> -
2< -1,0,0> = 0; (f) linearly independent; (g) lineafly dependent.
S. {(1, 1,0>, <0, 1, I>} is one solution of the problem.
6. Let 11 , ... ,In be a finite set of polynomial functions. Let xn be the
highest power of x appearing with a nonzero coefficient in any of the
polynomials {It}. Then every linear combination of 11, ... ,in has the
form ao + alX + ... + anxn. But there certainly exist polynomials,
such as x n + 1 , which cannot be exptessed in this form. To be certain
on this point, we observe that upon differentiating n + 1 times, all linear
combinations of 11, ... ,In become zero, while there are polynomials
whose (n + l)st derivative is djfferent from zero,
7. Yes. Let Sand T be subspaces. Let a, b E SilT. Then a + bE S
and a + bET so a + b E SilT. Similarly, if a E SilT and a E F,
aaESIlT.
8. No. For example, in R 2 , let S = S«I, 0», T = S«O, 1». Then
(1, 1> = (1,0> + <0, I) ¢ S U T. .
SECTION S
1. Suppose b 1 = 0, for example. Then 1 . b1 + 0 . b2 + ... + 0 . br = 0,
and the vectors {b 1 , ••• , br} are linearly dependent.
SOLUTIONS OF SELECTED EXERCISES 319
SECTION 6
SECTION 7
1. It does not. The subspace spanned by (I, 3, 4>, (4,0, I>, and (3, 1,2>
has a basis (I, 3,4>, (0,4,5>. An arbitrary linear combination of these
vectors has the form <a, 3a + 4/1,4a + 5/1>. If (1, I, 1> is one of these
linear combinations, then a = I, 3 + 4/1 = I, /1 = -t, and 4a + 5/1 =
4--i-=f.1.
2. It does. A basis for the subspace, in echelon form, is <I, -I, 1,0>,
(0,2,1, -I>, (0,0, -t, -1->. A typical element in the subspace is
(a, -a + 2/1, a + /1-!Y, -/1- tr>. Comparingwith(2,O, -4, -2>,
we obtain a = 2, /1 = I, y = 2.
3. Let {l1, ... ,1m} be a basis for T, and let SeT. Then every set of
m + 1 vectors in S is linearly dependent, by Theorem (5.1). Let
{Sl , ... , SIc} be a set of linearly independent vectors in S such that every
set of k + 1 vectors is linearly dependent. By Lemma (7.1), every
vector in S is a linear combination of {Sl, ... ,s,,}. Therefore {Sl, .•. , SIc}
is a basis for S, and dim S :5 dim T. Finally, let dim S = dim T and
SeT. Let {Sl, ... ,sm} be a basis for S. Then by Theorem (5.1),
{Sl, ... , Sm, t} is linearly dependent for all lET, since dim T = m.
By Lemma (7.\) again, t E Sand S = T.
4. dim (S + T) :5 3. Therefore dim (S fl T) = dim S + dim T - dim
(S + T) ~ I, by Theorem (7.5).
S. dim S = dim T = 3, dim (S + T) = 4, dim (S fl T) = 2.
6. There are 4 vectors in V, 3 one-dimensional subspaces, and 3 different
bases.
320 SOLUTIONS OF SELECfED EXERCISES
SECTION 8
1. (a) Solvable; (b) solvable; (c) solvable; (d) solvable; (e) solvable; (f)
not solvable; (g) not solvable.
2. There is a solution if and only if a "# 1.
3. It is sufficient to prove that the column vectors of an m-by-n matrix, with
n > m, are linearly dependent. The column vectors belong to R m ,
and since there are n of them, they are linearly dependent by Theorem
(5.1).
4. This result is also a consequence of Theorem (5.1).
SECTION 9
1. The dimension of the solution space is 2. Let Cl , C2 , C3 , C4 be the column
vectors. Then Cl + 3C2 - 4C3 = 0 and Cl + C2 - 2C4 = O. Therefore
a basis for the solution space is <1,3, -4,0) and (I, 1,0, -2).
2. The dimensions of the solution spaces are as follows. The actual
solutions should be checked:
(a) zero (b) one (c) two (d) one
(e) two (f) one (g) zero
3. By Theorem (8.9), every solution has the form Xo + x where Xo is the
solution of the nonhomogeneous system and x is a solution of the
homogeneous system. The dimension of the solution space of the
homogeneous system is two, and the actual solutions found may be
checked by substitution.
S. A, B, and C must satisfy the equations
3A + B + C = 0
-A +C= 0
or
A (-D +B (~) + C G) = O.
The solution space of the system has dimension one, so that any two
nontrivial solutions are multiples of each other.
6. We have to find a nontrivial solution to the system
A (;) + B (~) + C G) = O.
The rank of the matrix is at most two, so there certainly exists a nonzero
solution. To show that two such solutions are proportional, we have to
show that the rank is two. If the rank is one, then a = y and f3 = 8,
and the points are not distinct, contrary to assumption.
SECTION 10
3. The result is immediate from Theorem (10.6).
4. Since L is one-dimensional, L = p + V where V is the directing space,
and pEL. By (10.2) q - p E V, and since dim V = 1, V consists of all
scalar multiples of q - p.
SOLUTIONS OF SELECTED EXERCISES 321
SECTION 11
1. The mappings in (c), (d), (e) are linear transformations; the others are
not.
3. 2T: Y1 = 6X1 + 2X2
Y2 = 2X1 - 2X2
T - U: Y1 = - 4X1
Y2 =-X2
T2: Y1 =IOX1 - 4X2
Y2 =-4X1 + 2X2
To find the system for TU, we let U: <Xl, X2> -+ <Y1, Y2> where Y1 =
Xl + X2, Y2 = Xl ; T: <Y1, Y2> -+ <Zl , Z2> where Zl = - 3Y1 + Y2, Z2 =
Y1 - Y2. Then TU: <Xl, X2> - ; ? <Zl, Z2>, where Zl = -2X1 - 3X2,
Z2 = X2. TU # UT.
4. (DM)f(x) = D [xf(x) ] = xf'(x) + f(x).
MDf(x) = (Mf')(x) = xf'(x). DM # MD
5. From 0 + 0 = 0, we obtain T(O) + T(O) = T(O), hence T(O) = O.
From v + (- v) = 0, we obtain T(v) + T( - v) = T(O) = 0, hence
T(-v)=-T(v).
6. The linear transformations defined in (a) and (c) are one to one; the
others are not.
8. The linear transformations defined in (a) and (b) are onto; the others are
not.
9. Suppose T is one-to-one. Then the homogeneous system of equations
defined by setting all YI'S equal to zero, has only the trivial solution.
Therefore the rank of the coefficient matrix is n, and it follows from
Section 8 that T is onto. Conversely, if T is onto, the rank of the co-
efficient matrix is n, and from Section 9, the homogeneous system has
only the trivial solution. Therefore T is one-to-one. These remarks
prove the equivalence of parts (a), (b) and (c). The equivalence with
(d) is proved in Theorem (11.13).
322 SOLUTIONS OF SELECTED EXERCISES
10. D maps the constant polynomials into zero, so that D is not one-to-one.
I maps no polynomial onto a constant polynomial -=f. 0, so that I is not
onto. The equation DI = 1 implies that D is onto and that I is one-to-
one.
SECTION 12
1.
( - 1 2) (1 1) = ( - 1
-1 0 0 1 -1
7. The matrices in (b), (c), (d) and (e) are invertible; (a) is not. Check the
formulas for the inverses by matrix multiplication.
8. Every linear transformation on the space of column vectors has the form
X ~ B • X for some n-by-n matrix B. The linear transformation
X ~ A • x is invertible if and only if there exists a matrix B such that
B(A . x) = A(B • x) = x for all x. By the associative law (Exercise 3),
these equations are equivalent to (BA)x = (AB)x = x for all x. Then
show that for an n-by-n matrix C, Cx = x for all x is equivalent to
C = I. Thus the equations simply mean that the matrix A is invertible.
If A is invertible, then x = A -lb is a solution of the equation Ax = b
because A(A -lb) = (AA -l)b = Ib = b by the associative law. If x'
is another solution, then Ax = Ax', and multiplying by A -1, we obtain
A -l(Ax) = A -l(Ax). It follows that x = x'.
SECTION 13
1. The matrices with respect to the basis {U1 , U2} are
2. a b c d
S rank 1, nullity 1 not invertible
T rank 2, nullity 0 invertible
U rank 3, nullity 0 invertible
3.
(~
1 0
0 2
D=
;)- rank
nullity
=
=
k,
1.
SECTION 14
1. (e) Two vectors are perpendicular if and only if the cosine of the angle
between them is zero. From the law of cosines, we have
~b - al1 2 = IIal1 2 + IIbl1 2 - 211all Ilbll cos o.
/1-a b
Since
lib - al1 2 = (b - a, b - a) = IIal1 2 + IIbl1 2 - 2(a, b)
324 SOLUTIONS OF SELECIED EXERCISES
we have
(a, b)
cos 8 = "all lib II·
Therefore a .1 b if and only if (a, b) = o.
(f)Iia + bll = Iia - bll if and only if (a + b, a + b) = (a - b, a - b).
This statement holds if and only if (a, b) = O.
2. (a) If x belongs to both p + Sand q + S, then x = p + Sl = q + S2,
with Sl and S2 E S. It follows that p E q + Sand q E P + S, and hence
that p + S = q + S.
(b) The set L = p + S of all vectors of the form {p + A(q - p)} is easily
shown to be a line containing p and q. Let L' = p + S' be any line
containing p and q. Then q - PES' and since Sand S' are one-
dimensional S = S'. Since the lines Land L' intersect and have the
same one-dimensional subspace, L = L' by (a).
(c) p, q, r are collinear if they belong to the same line L = p + S. In
that case q - PES and q - rES and since S is one-dimensional,
Seq - p) = Seq - r). Conversely, if Seq - p) = Seq - r), the lines
determined by the points p and q, and q and r have the same one-
dimensional spaces. Hence they coincide by (a).
(d) Let L = p + Sand L' = p' + S' be the given lines. From parts
(a) and (b) we may assume S # S'. Since dim S = dim S' = 1,
S ("\ S' = O. If Xl and X2 belong to L ("\ L', then Xl - X2 E S ("\ S',
hence Xl = X2. In order to show that L ("\ L' is not empty, let S = S(s),
s' = S(s'); then we have to find Aand A' such thatp + As = p' + A's',
and this is true since any 3 vectors in R2 are linearly dependent.
3• \ -(p - r, q - p) Th d· I d" .
1\ = Ilq _ p 112 . e perpen ICU ar Istance IS
SECTION 15
1 1
1. (a).;- (1,1,1,0), . /_ (5, -1, -4, -3).
'v3 'v51
2. 1, v12 (x - t). a(x2 - x + i),
where a is chosen to make Ilfll = 1.
3. The distance is given by Irv - (v, u1)u111, where v and U1 are as follows:
(a) v = (-I. -I). U1 = (2, -1)/v5;
(b) v = (1,0). U1 = (I. 1)/V2;
(c) v = (2, 2), U1 = (I, -1)/V2.
5. (a) (v, w) = 2:1. i f.1Ji(ul. "1) = 2: fl1Jl.
(b) Let v = 2:r=l flul. Then (v, Uk) = 2:r=l fl(UIt Uk) = fk
SOLUTIONS OF SELECfED EXERCISES 325
10. It can be shown that there exist orthonormal bases {VI} and {WI} of Vsuch
that {Vl, . . . , va} is a basis for W l , and {Wl' ... , wa} is a basis for W 2 •
By Theorem 15.11, there exists an orthogonal transformation T such
that TVI = WI for each i. Then T( Wl) = W2 .
11. (a) By Exercise 7, dim S(n)J. = 2. It is sufficient to prove that the set
P of all p such that (p, n) = a is the set of solutions of a linear equation.
Let be an orthonormal basis for R 3 , and let n = alVl +
{Vl, V2, V3}
a2V2 + Then x = XlVI + X2V2 + X3V3 satisfies (p, n) = a if and
a3V3.
only if alXl + a2X2 + a3X3 = a. This shows that the set of all p such
that (p, n) = a is a plane. Using the results of Section 10, we can assert
that P = Po + S(n)l., where Po is a fixed solution of (p, n) = a.
(b) Normal vector: (3, -I, I).
(c) Xl - X2 + X3 = -2, or (n,p) = -2.
(d) The plane with normal vector n, passing through p is the set of all
vectors x such that (x, n) = (p, n), or (x - p, n) = 0.
(f) The normal vector n must be perpendicular to <2, 0, -I) - (1, I, I)
and to <2,0, -I) - <0,0, I) by (d). Then use the method of part (e).
(g) Following the hint, we write the second equation in the form
Po - P = An + (u - p). Taking the inner product with n yields
A(n, n) +
(u - p, n).
= °
(h) We have U = (1, 1,2), p = <0,0, I), n = (1, 1, -I). Then 3A +
(u - p, n) = 0, A = -to Then the distance is jjpo - ujj = tIInjj.
SECTION 16
1. (a) 2; (b) 0; (c) 0; (d) the determinants are: (a) 0, (b) 1, (c) 13, (d) - 2,
(e) 3.
326 SOLUTIONS OF SELECTED EXERCISES
A - '-(~~
:
..... '.'
'.'
0 .. ·0 I
) .
o • * 1 0 o
D(A/) =
1 • • 010 o
o o o
3. Show that by applying row operations of type II only (see Definition (6.9»
to the rows involving AI' we obtain
all • •
•
D(A) = 0'" *
(lId!
0 A2
0 0 Ar
all· •
*
0 .. · •
ald!
D(A) = a2l * *
0
0··· a:d2
0 0
and this is immediate from assumptions (c) and Cd) about D·.
SOLUTIONS OF SELECl'ED EXERCISES 327
SECTION 17
1. Using the definition and Theorem 16.6. we have D«g. "I). <A. p.» =
D«g. 0). <A. 0» + D«g. 0). <0. p.» + D«O. "I). <A. 0» + D«O. "I).
<0. p.» = gp. - 7JA. D«g. 0). <>'.0» = 0 because the vectors involved
are linearly dependent. D«~. 0). <0. jL» = ~jL D(eb e2). while D«O. 7J>.
<..\.0» =-D«A.0).<0.7J» = -A7J.
SECTION 18
SECTION 19
4. Expanding the determinant along the first row. we see that it has the
form AXI + BX2 + C. Substituting (a. fJ) or (y. 8) for (Xl. X2) makes
two rows of the determinant equal. and hence the equation AXI + BX2
+ C = 0 is satisfied by the points (a. fJ) and (y. 8).
6. The image of the square is the parallelogram {AT(el) + p.T(e2): 0::::;; A.
p. ::::;; 1}. The area of the parallelogram is ID(T(el). T(e2»I. where
T(el) and T(e2) are the columns of the matrix of T with respect to the
basis {el • e2} of R 2 •
7. Since
al a2 al a2 1
fJl fJ2 fJl - al fJ2 - a2 O.
Yl Y2 Yl - al Y2 - a2 0
the determinant is
SECTION 20
1. Q = tx - l, R = -lx2 - X - t.
2. Let (Xl, ... , (x" be distinct zeros of f Then I = (x - (Xl)gl' Since
ct2 #:- ctl, X . - ct2 is a prime polynomial which divides I but not x - ctl'
Therefore x - (X2 divides gl and I = (x - ctl)(X - c(2)g2. Continuing
in this way, I = (x - ctl) .,. (x - ct,,)g, and hence degl ~ k.
4. If the degree of lis two or three, any nontrivial factorization will involve
a linear factor, and hence a zero of f The result is false if deg I > 3.
For example, 1= (x 2 + 1)2 has no zeros in R, but is not a prime in
R [x].
5. Suppose min satisfies the equation. Multiplying the resulting equation
by n' we obtain
Then n divides aom', and since n does not divide m' , nlao. Similarly
mla,.
7. In Q [x] , the prime factors are: (a) (2x + 1)(X2 - X + 1); (c) (x 2 + 1)
(X4 - x 2 + 1).
In R [x] , the prime factors are: (a) (2x + 1)(x2 - X + 1); (c) (x 2 + 1)
(x 2 + V3 x + 1)(x2 - v3 x + 1).
8. The process must terminate, otherwise we have an infinite decreasing
sequence of nonnegative integers, contrary to the principle of well-
ordering. Now suppose rio #:- 0 and rlo+l = O. From the way the
rl's are defined, rlolrl o and rlo-l' From the preceding equation we see
that rl olrto-2. Continuing in this way we obtain rlola and r'olb. On the
other hand, starting from the top, if dla and dlb, then dlro. From the
next equation we get dh. Continuing we obtain eventually dlrl o '
Thus rio = (a, b).
9. (a) 2x + 1.
SECTION 21
ct + i~ ~ (; -~).
9. The equation x 2 = -1 has a solution in the field of complex numbers,
but cannot have a solution in an ordered field.
SOLUTIONS OF SELECTED EXERCISES 329
SECTION 22
2. Let {Vl, .... ,vm} be a basis for V and {Wl, ... , wn } a basis for W.
For each pair (i, j), let Elf be the linear transformation such that
Elf V! = w" and Elf v" = 0 if k "# j. Then the mn linear transformations
{Elf} form a basis for L( V, W).
3. The minimal polynomials are (x - 2)(x + 1), x 3 - I, x 2 + X - I, and
x 3 , respectively.
4. (a) For each VI, f(T)vl = (T - ~l) ... (T - ~n)VI = 0 since the factors
T - ~k commute, and (T - ~I)VI = o.
(b) Let m(x) = Il(x - ~j), where the ~j are the distinct characteristic
roots of T. By the argument of part (a), m(T) = o. Then the minimal
polynomial of T divides m(x). It is enough to show that if m'(x) =
Il~!H" (x - ~j), then m'(T) "# O. We have
SECTION 23
SECTION 24
1. (a) (x + 2)2.
(b) (x + 2)2.
(c) -2 (appearing twice in the characteristic and minimal polynomials).
(d) No. Because the minimal polynomial is not a product of distinct
linear factors.
(e) Let v = X1Vl + X2V2 be a vector with unknown coefficients such
that (T + 2)v = O. Using the definition of T, this leads to a system of
homogeneous equations with the nontrivial solution, (1, -1). Then
Vl - V2 is a characteristic vector for T.
(f) A basis which puts the matrix of Tin triangular form is Wl = Vl - V2,
W2 = V2. Then SB = AS where,
-2
B= (
o
(g) The Jordan decomposition of Tis T = D + N, where D and N are
the linear transformations whose matrices are - 21 and (~ ~) with
respect to the basis {Wl' W2}.
3. The minimal polynomial of T is x 2 + afJ, where afJ > O. The minimal
polynomial is not a product of distinct linear factors in R [ x ] , and the
answer to the question is no.
4. Use the triangular form theorem.
S. The minimal polynomial of T divides x 2 - x, and therefore has distinct
linear factors in C [ x ]. There does exist a basis of V consisting of
characteristic vectors.
SOLUTIONS OF SELECfED EXERCISES 331
SECTION 25
1. (a)
(b) -1 0 0 0 0
0 0 -1 0 1
0 1 -1 0 0
0 -1
0 0
1 -1
2. The rational canonical forms are
-~),
0
(~ !) , G 0
1 -1
respectively.
3. The rational canonical forms over Rare
-~),
0
(~+ -: and
respectively.
t _0 -:)
G 0
1 -1
respectively.
and
(~ -t + 0/-2 3
o
5. Let {v, Tv, ... , Td-1 V } be a basis for V, and suppose Td V = aoV +
a1Tv + ... + ad_1Td-1v. The matrix of T with respect to this basis is
A= (~ 0 ::: :: )
.
o ... acl-1
IX alpha
f1 beta
i' gamma
I) delta
,
€ epsilon
zeta
eta
7J
e theta
,\ lambda
P- mu
v nu
g xi (ksi)
7T pi
p rho
a sigma
T tau
q> phi
psi
'w" omega
332
SYMBOLS 333
B
D
Basis of vector space, 36
Bilinear form, 236 DeMoivre's theorem, 180
non degenerate, 237 Determinant, column expansion of, 150
skew symmetric, 243 as volume function, 154
symmetric, 243 complete expansion of, 144, 159
vector spaces dual to, 239 definition of, 134
Bilinear function, 119,245 Hadamard's inequality, 155
Binomial coefficient, 15 minor, 153
row expansion of, 151
van der Monde, 161
c Diagonable linear transformation, 198
Diagonal matrix, 198
Cartesian product, 243 Dihedral group Dn, 114
Cauchy-Schwartz inequality, 120 Dimension of vector space, 37
Cayley-Hamilton theorem, 206 Direct sum, 195,243
Characteristic polynomial, 205 DiviSion process (for polynomials), 167
Characteristic root, 189 Dual space, 234
Characteristic vector, 189 and bilinear form, 239
Coefficient matrix, 53
Cofactor, 150
Column expansion, 150 E
Column subspace, 54
Column vector, 54 Echelon form, 44
Commutative ring, 81 Eigenvalue, 189
Companion matrix, 221, 222 Eigenvector, 189
334
INDEX 335
Solution space, 62 u
Solution vector, 54
Spectral theorem, 285 Unique factorization theorem 171
Spectrum, 284 Unit, 169 '
Subfield,9 Unit vectors, 37
Subspace, 26 Unitary transformation, 280
column, 54
cyclic,217
dual,234
finitely generated, 28 v
generators of, 28
van der Monde determinant, 161
indecomposable, 261
Vector, characteristic, 189
invariant, 194
column, 54
quotient, 231
echelon form, 44
row, 54
length, 1-20
spanned by vectors, 28
linearly dependent, 30
Symmetric bilinear form, 243
linearly independent, 30
Symmetric transformation, 274
orthonormal set, 123
Symmetry group of figure, 112
System of linear equations coefficient proper, 189
matrix of, 53 ' row, 53
Cramer's rule, 54 solution, 54
first-order differential, 298 unit,37
homogeneous, 54 Vector space, 18
min n unknowns, 53 basis of, 36
nonhomogeneous, 54 completely reducible, 265
dimension of, 37
dual,239
T Fn- 19
of functions, 19
Tensor, k-fold, 255 iueducible, 265
skew symmetric, 256 isometric, 130
Tensor product, 249, 250 R n ,19
k-fold,255 Volume function, 154
Trace of linear transformation, 210
Transpose, of linear transformation 234
of matrix, 146 '
Transposition, 158 w
Triangle inequality, 121
Triangular form theorem 202 Wedge product, 256
Trivial solution, 54 ' Well-ordering principle, 10