Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Linear Algebra
1 Vector spaces
You are no doubt familiar with vectors in R2 or R3 , i.e.
1.1
2
x= , y = 0 . (1)
3
5
From the point of view of algebra, vectors are much more general objects. They are elements
of sets called vector spaces that satisfy the following definition.
Definition 1.1 (Vector space). A vector space consists of a set V and two operations + and
satisfying the following conditions.
4. For any x V there exists an additive inverse y such that x + y = 0, usually denoted
as x.
x + y = y + x, (x + y) + z = x + (y + z). (2)
( x) = ( ) x. (3)
7. Scalar and vector sums are both distributive, i.e. for all , R and x, y V
( + ) x = x + x, (x + y) = x + y. (4)
A subspace of a vector space V is a subset of V that is also itself a vector space.
From now on, for ease of notation we will ignore the symbol for the scalar product , writing
x as x.
Remark 1.2 (More general definition). We can define vector spaces over an arbitrary field,
instead of R, such as the complex numbers C. We refer to any linear algebra text for more
details.
We can easily check that Rn is a valid vector space together with the usual vector addition
T
and vector-scalar product. In this case the zero vector is the all-zero vector 0 0 0 . . . .
When thinking about vector spaces it is a good idea to have R2 or R3 in mind to gain
intuition, but it is also important to bear in mind that we can define vector sets over many
other objects, such as infinite sequences, polynomials, functions and even random variables
as in the following example.
The definition of vector space guarantees that any linear combination of vectors in a vector
space V, obtained by adding the vectors after multiplying by scalar coefficients, belongs to
V. Given a set of vectors, a natural question to ask is whether they can be expressed as
linear combinations of each other, i.e. if they are linearly dependent or independent.
Definition 1.3 (Linear dependence/independence). A set of m vectors x1 , x2 , . . . , xm is
linearly dependent if there exist m scalar coefficients 1 , 2 , . . . , m which are not all equal
to zero and such that
Xm
i xi = 0. (5)
i=1
Equivalently, at least one vector in a linearly dependent set can be expressed as the linear
combination of the rest, whereas this is not the case for linearly independent sets.
Let us check the equivalence. Equation (5) holds with j 6= 0 for some j if and only if
1 X
xj = i xi . (6)
j
i{1,...,m}/{j}
We define the span of a set of vectors {x1 , . . . , xm } as the set of all possible linear combina-
tions of the vectors:
( m
)
X
span (x1 , . . . , xm ) := y | y = i xi for some 1 , 2 , . . . , m R . (7)
i=1
2
Lemma 1.4. The span of any set of vectors x1 , . . . , xm belonging to a vector space V is a
subspace of V.
Proof. The span is a subset of V due to Conditions 1 and 2 in Definition 1.1. We now show
that it is a vector space. Conditions 5, 6 and 7 in Definition 1.1 hold because V is a vector
space. We check Conditions 1, 2, 3 and 4 by proving that for two arbitrary elements of the
span
m
X m
X
y1 = i xi , y2 = i xi , 1 , . . . , m , 1 , . . . , m R, (8)
i=1 i=1
When working with a vector space, it is useful to consider the set of vectors with the smallest
cardinality that spans the space. This is called a basis of the vector space.
Definition 1.5 (Basis). A basis of a vector space V is a set of independent vectors {x1 , . . . , xm }
such that
An important property of all bases in a vector space is that they have the same cardinality.
Theorem 1.6. If a vector space V has a basis with finite cardinality then every basis of V
contains the same number of vectors.
This theorem, which is proven in Section A of the appendix, allows us to define the dimen-
sion of a vector space.
Definition 1.7 (Dimension). The dimension dim (V) of a vector space V is the cardinality
of any of its bases, or equivalently the smallest number of linearly independent vectors that
span V.
This definition coincides with the usual geometric notion of dimension in R2 and R3 : a
line has dimension 1, whereas a plane has dimension 2 (as long as they contain the origin).
Note that there exist infinite-dimensional vector spaces, such as the continuous real-valued
functions defined on [0, 1] or an iid sequence X1 , X2 , . . ..
3
The vector space that we use to model a certain problem is usually called the ambient
space and its dimension the ambient dimension. In the case of Rn the ambient dimension
is n.
Lemma 1.8 (Dimension of Rn ). The dimension of Rn is n.
Example 2.2 (Dot product). We define the dot product between two vectors x, y Rn
as
X
x y := x [i] y [i] , (14)
i
where x [i] is the ith entry of x. It is easy to check that the dot product is a valid inner
product. Rn endowed with the dot product is usually called an Euclidean space of dimension
n.
4
The norm of a vector is a generalization of the concept of length.
Definition 2.3 (Norm). Let V be a vector space, a norm is a function |||| from V to R that
satisfies the following conditions.
A vector space equipped with a norm is called a normed space. Distances in a normed space
can be measured using the norm of the difference between vectors.
Definition 2.4 (Distance). The distance between two vectors in a normed space with norm
|||| is
Inner-product spaces are normed spaces because we can define a valid norm using the inner
product. The norm induced by an inner product is obtained by taking the square root of
the inner product of the vector with itself,
p
||x||h,i := hx, xi. (18)
The norm induced by an inner product is clearly homogeneous by linearity and symmetry
of the inner product. ||x||h,i = 0 implies x = 0 because the inner product is positive
semidefinite. We only need to establish that the triangle inequality holds to ensure that the
inner-product is a valid norm. This follows from a classic inequality in linear algebra, which
is proved in Section B of the appendix.
Theorem 2.5 (Cauchy-Schwarz inequality). For any two vectors x and y in an inner-product
space
5
Assume ||x||h,i 6= 0,
||y||h,i
hx, yi = ||x||h,i ||y||h,i y = x, (20)
||x||h,i
||y||h,i
hx, yi = ||x||h,i ||y||h,i y = x. (21)
||x||h,i
Corollary 2.6. The norm induced by an inner product satisfies the triangle inequality.
Proof.
||x + y||2h,i = ||x||2h,i + ||y||2h,i + 2 hx, yi (22)
||x||2h,i + ||y||2h,i
+ 2 ||x||h,i ||y||h,i by the Cauchy-Schwarz inequality (23)
2
= ||x||h,i + ||y||h,i . (24)
Example 2.7 (Euclidean norm). The Euclidean or `2 norm is the norm induced by the dot
product in Rn ,
v
u n
uX
||x||2 := x x = t x2i . (25)
i=1
3 Orthogonality
An important concept in linear algebra is orthogonality.
Definition 3.1 (Orthogonality). Two vectors x and y are orthogonal if
hx, yi = 0. (26)
6
Two sets of S1 , S2 are orthogonal if for any x S1 , y S2
hx, yi = 0. (28)
Distances between orthogonal vectors measured in terms of the norm induced by the inner
product are easy to compute.
Theorem 3.2 (Pythagorean theorem). If x and y are orthogonal vectors
hx, bi i = 0, 1 i n, (33)
then x is orthogonal to S.
P n
Proof. Any vector v S can be represented as v = i i=1 bi for 1 , . . . , n R, from (33)
* +
X X
n n
hx, vi = x, i=1 bi = i=1 hx, bi i = 0. (34)
i i
7
It is very easy to find the coefficients of a vector in an orthonormal basis: we just need to
compute the dot products with the basis vectors.
Lemma 3.5 (Coefficients in an orthonormal basis). If {u1 , . . . , un } is an orthonormal basis
of a vector space V, for any vector x V
n
X
x= hui , xi ui . (35)
i=1
Immediately,
* m
+ m
X X
hui , xi = ui , i ui = i hui , ui i = i (37)
i=1 i=1
For any subspace of Rn we can obtain an orthonormal basis by applying the Gram-Schmidt
method to a set of linearly independent vectors spanning the subspace.
Algorithm 3.6 (Gram-Schmidt).
This implies in particular that we can always assume that a subspace has an orthonormal
basis.
Theorem 3.7. Every finite-dimensional vector space has an orthonormal basis.
Proof. To see that the Gram-Schmidt method produces an orthonormal basis for the span of
the input vectors we can check that span (x1 , . . . , xi ) = span (u1 , . . . , ui ) and that u1 , . . . , ui
is set of orthonormal vectors.
8
4 Projections
The projection of a vector x onto a subspace S is the vector in S that is closest to x. In
order to define this rigorously, we start by introducing the concept of direct sum. If two
subspaces are disjoint, i.e. their only common point is the origin, then a vector that can be
written as a sum of a vector from each subspace is said to belong to their direct sum.
Definition 4.1 (Direct sum). Let V be a vector space. For any subspaces S1 , S2 V such
that
S1 S2 = {0} (39)
the direct sum is defined as
S1 S2 := {x | x = s1 + s2 s1 S1 , s2 S2 } . (40)
We can now define the projection of a vector x onto a subspace S by separating the vector
into a component that belongs to S and another that belongs to its orthogonal complement.
Definition 4.3 (Orthogonal projection). Let V be a vector space. The orthogonal projection
of a vector x V onto a subspace S V is a vector denoted by PS x such that xPS x S .
Theorem 4.4 (Properties of orthogonal projections). Let V be a vector space. Every vector
x V has a unique orthogonal projection PS x onto any subspace S V of finite dimension.
In particular x can be expressed as
x = PS x + PS x. (42)
For any vector s S
hx, si = hPS x, si . (43)
For any orthonormal basis b1 , . . . , bm of S,
m
X
PS x = hx, bi i bi . (44)
i=1
9
Proof. Since V has finite dimension, so does S, which consequently has an orthonormal basis
with finite cardinality b01 , . . . , b0m . Consider the vector
m
X
p := hx, b0i i b0i . (45)
i=1
Computing the norm of the projection of a vector onto a subspace is easy if we have access
to an orthonormal basis (as long as the norm is induced by the inner product).
Lemma 4.5 (Norm of the projection). The norm of the projection of an arbitrary vector
x V onto a subspace S V of dimension d can be written as
v
u d
uX
||PS x||h,i = t hbi , xi2 (49)
i
10
Proof. By (44)
||PS x||2h,i = hPS x, PS xi (50)
* d d
+
X X
= hbi , xi bi , hbj , xi bj (51)
i j
d X
X d
= hbi , xi hbj , xi hbi , bj i (52)
i j
d
X
= hbi , xi2 . (53)
i
Finally, we prove indeed that of a vector x onto a subspace S is the vector in S that is closest
to x in the distance induced by the inner-product norm.
Theorem 4.7 (The orthogonal projection is closest). The orthogonal projection of a vector
x onto a subspace S belonging to the same inner-product space is the closest vector to x that
belongs to S in terms of the norm induced by the inner product. More formally, PS x is the
solution to the optimization problem
minimize ||x u||h,i (55)
u
subject to u S. (56)
11
5 Matrices
A matrix is a rectangular array of numbers. We denote the vector space of m n matrices
by Rmn . We denote the ith row of a matrix A by Ai: , the jth column by A:j and the (i, j)
entry by Aij . The transpose of a matrix is obtained by switching its rows and columns.
AT ij = Aji .
(60)
Definition 5.2 (Matrix-vector product). The product of a matrix A Rmn and a vector
x Rn is a vector in Ax Rn , such that
n
X
(Ax)i = Aij x [j] (61)
j=1
= hAi: , xi , (62)
i.e. the ith entry of Ax is the dot product between the ith row of A and x.
Equivalently,
n
X
Ax = A:j x [j] , (63)
j=1
One can easily check that the transpose of the product of two matrices A and B is equal to
the the transposes multiplied in the inverse order,
(AB)T = B T AT . (64)
We can now we can express the dot product between two vectors x and y as
hx, yi = xT y = y T x. (65)
12
Definition 5.3 (Identity matrix). The identity matrix in Rnn is
1 0 0
0 1 0
I= . (66)
0 0 1
Definition 5.4 (Matrix multiplication). The product of two matrices A Rmn and B
Rnp is a matrix AB Rmp , such that
n
X
(AB)ij = Aik Bkj = hAi: , B:,j i , (67)
k=1
i.e. the (i, j) entry of AB is the dot product between the ith row of A and the jth column of
B.
Equivalently, the jth column of AB is the result of multiplying A and the jth column of B
n
X
AB = Aik Bkj = hAi: , B:,j i , (68)
k=1
and ith row of AB is the result of multiplying the ith row of A and B.
Square matrices may have an inverse. If they do, the inverse is a matrix that reverses the
effect of the matrix of any vector.
Definition 5.5 (Matrix inverse). The inverse of a square matrix A Rnn is a matrix
A1 Rnn such that
AA1 = A1 A = I. (69)
M = A1 AM by (69) (70)
= A1 . (71)
13
Definition 5.7 (Orthogonal matrix). An orthogonal matrix is a square matrix such that its
inverse is equal to its transpose,
UT U = UUT = I (72)
By definition, the columns U:1 , U:2 , . . . , U:n of any orthogonal matrix have unit norm and
orthogonal to each other, so they form an orthonormal basis (its somewhat confusing that
orthogonal matrices are not called orthonormal matrices instead). We can interpret applying
U T to a vector x as computing the coefficients of its representation in the basis formed by
the columns of U . Applying U to U T x recovers x by scaling each basis vector with the
corresponding coefficient:
n
X
T
x = UU x = hU:i , xi U:i . (73)
i=1
Applying an orthogonal matrix to a vector does not affect its norm, it just rotates the vector.
Lemma 5.8 (Orthogonal matrices preserve the norm). For any orthogonal matrix U Rnn
and any vector x Rn ,
6 Eigendecomposition
An eigenvector v of A satisfies
Av = v (78)
for a real number which is the corresponding eigenvalue. Even if A is real, its eigenvectors
and eigenvalues can be complex.
14
Lemma 6.1 (Eigendecomposition). If a square matrix A Rnn has n linearly independent
eigenvectors v1 , . . . , vn with eigenvalues 1 , . . . , n it can be expressed in terms of a matrix
Q, whose columns are the eigenvectors, and a diagonal matrix containing the eigenvalues,
1 0 0
0 2 0
v1 v2 vn 1
A = v1 v2 vn (79)
0 0 n
= QQ1 (80)
Proof.
AQ = Av1 Av2 Avn (81)
= 1 v1 2 v2 2 vn (82)
= Q. (83)
As we will establish later on, if the columns of a square matrix are all linearly independent,
then the matrix has an inverse, so multiplying the expression by Q1 on both sides completes
the proof.
which implies that v2 = 0 and hence v1 = 0, since we have assumed that 6= 0. This implies
that the matrix does not have nonzero eigenvalues associated to nonzero eigenvectors.
AA Ax = Ak x, (86)
15
i.e. we want to apply A to x k times. Ak cannot be computed by taking the power of its
entries (try out a simple example to convince yourself). However, if A has an eigendecom-
position,
using the fact that for diagonal matrices applying the matrix repeatedly is equivalent to
taking the power of the diagonal entries. This allows to compute the k matrix products
using just 3 matrix products and taking the power of n numbers.
From high-school or undergraduate algebra you probably remember how to compute eigen-
vectors using determinants. In practice, this is usually not a viable option due to stability
issues. A popular technique to compute eigenvectors is based on the following insight. Let
A Rnn be a matrix with eigendecomposition QQ1 and let x be an arbitrary vector in
Rn . Since the columns of Q are linearly independent, they form a basis for Rn , so we can
represent x as
n
X
x= i Q:i , i R, 1 i n. (90)
i=1
If we assume that the eigenvectors are ordered according to their magnitudes and that the
magnitude of one of them is larger than the rest, |1 | > |2 | . . ., and that 1 6= 0 (which
happens with high probability if we draw a random x) then as k grows larger the term
1 k1 Q:1 dominates. The term will blow up or tend to zero unless we normalize every time
before applying A. Adding the normalization step to this procedure results in the power
method or power iteration, an algorithm of great importance in numerical linear algebra.
16
v2 x1 v2 v2
x2 x3
v1 v1 v1
Figure 1: Illustration of the first three iterations of the power method for a matrix with eigen-
vectors v1 and v2 , whose corresponding eigenvalues are 1 = 1.05 and 2 = 0.1661.
Figure 1 illustrates the power method on a simple example, where the matrix which was
just drawn at random is equal to
0.930 0.388
A= . (94)
0.237 0.286
The convergence to the eigenvector corresponding to the eigenvalue with the largest magni-
tude is very fast.
uTi uj = 0. (95)
17
Proof. Since A = AT
1
uTi uj = (Aui )T uj (96)
i
1
= uTi AT uj (97)
i
1
= uTi Auj (98)
i
j
= uTi uj . (99)
i
This is only possible if uTi uj = 0.
It turns out that every nn symmetric matrix has n linearly independent vectors. The proof
of this is beyond the scope of these notes. An important consequence is that all symmetric
matrices have an eigendecomposition of the form
A = U DU T (100)
where U = u1 u2 un is an orthogonal matrix.
The eigenvalues of a symmetric matrix 1 , 2 , . . . , n can be positive, negative or zero. They
determine the value of the quadratic form:
n
T
X 2
q (x) := x Ax = i xT ui (101)
i=1
18
A Proof of Theorem 1.6
We prove the claim by contradiction. Assume that we have two bases {x1 , . . . , xm } and
{y1 , . . . , yn } such that m < n (or the second set has infinite cardinality). The proof follows
from applying the following lemma m times (setting r = 0, 1, . . . , m 1) to show that
{y1 , . . . , ym } spans V and hence {y1 , . . . , yn } must be linearly dependent.
Lemma A.1. Under the assumptions of the theorem, if {y1 , y2 , . . . , yr , xr+1 , . . . , xm } spans
V then {y1 , . . . , yr+1 , xr+2 , . . . , xm } also spans V (possibly after rearranging the indices r +
1, . . . , m) for r = 0, 1, . . . , m 1.
where at least one of the j is non zero, as {y1 , . . . , yn } is linearly independent by assumption.
Without loss of generality (here is where we might need to rearrange the indices) we assume
that r+1 6= 0, so that
r m
!
1 X X
xr+1 = i yi i xi . (107)
r+1 i=1 i=r+2
This implies that any vector in the span of {y1 , y2 , . . . , yr , xr+1 , . . . , xm }, i.e. in V, can be rep-
resented as a linear combination of vectors in {y1 , . . . , yr+1 , xr+2 , . . . , xm }, which completes
the proof.
19
Let us prove (20) by proving both implications.
() Assume hx, yi = ||x||h,i ||y||h,i . Then (108) equals zero, so ||y||h,i x = ||x||h,i y
because the inner product is positive semidefinite.
() Assume ||y||h,i x = ||x||h,i y. Then one can easily check that (108) equals zero, which
implies hx, yi = ||x||h,i ||y||h,i .
The proof of (21) is identical (using (109) instead of (108)).
where
m
X
||hk ||22 = i2 = 1, (111)
i=k
20