A Tutorial On Principal Component Analysis

A Tutorial on Principal Component Analysis
Jonathon Shlens
Center for Neural Science, New York University

New York City, NY 10003-6603 and
Systems Neurobiology Laboratory, Salk Insitute for Biological Studies
La Jolla, CA 92037
(Dated: April 22, 2009; Version 3.01)
Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used
but (sometimes) poorly understood. The goal of this paper is to dispel the magic behind this black box. This
manuscript focuses on building a solid intuition for how and why principal component analysis works. This
manuscript crystallizes this knowledge by deriving from simple intuitions, the mathematics behind PCA. This
tutorial does not shy away from explaining the ideas informally, nor does it shy away from the mathematics. The
hope is that by addressing both aspects, readers of all levels will be able to gain a better understanding of PCA as
well as the when, the how and the why of applying this technique.
I. INTRODUCTION
Principal component analysis (PCA) is a standard tool in mod-
ern data analysis - in diverse elds from neuroscience to com-
puter graphics - because it is a simple, non-parametric method
for extracting relevant information from confusing data sets.
With minimal effort PCA provides a roadmap for how to re-
duce a complex data set to a lower dimension to reveal the
sometimes hidden, simplied structures that often underlie it.
The goal of this tutorial is to provide both an intuitive feel for
PCA, and a thorough discussion of this topic. We will begin
with a simple example and provide an intuitive explanation
of the goal of PCA. We will continue by adding mathemati-
cal rigor to place it within the framework of linear algebra to
provide an explicit solution. We will see how and why PCA
is intimately related to the mathematical technique of singular
value decomposition (SVD). This understanding will lead us
to a prescription for howto apply PCAin the real world and an
appreciation for the underlying assumptions. My hope is that
a thorough understanding of PCA provides a foundation for
approaching the elds of machine learning and dimensional
reduction.
The discussion and explanations in this paper are informal in
the spirit of a tutorial. The goal of this paper is to educate.
Occasionally, rigorous mathematical proofs are necessary al-
though relegated to the Appendix. Although not as vital to the
tutorial, the proofs are presented for the adventurous reader
who desires a more complete understanding of the math. My
only assumption is that the reader has a working knowledge
of linear algebra. My goal is to provide a thorough discussion
by largely building on ideas from linear algebra and avoiding
challenging topics in statistics and optimization theory (but
see Discussion). Please feel free to contact me with any sug-
gestions, corrections or comments.
Electronic address: shlens@salk.edu

II. MOTIVATION: A TOY EXAMPLE
Here is the perspective: we are an experimenter. We are trying
to understand some phenomenon by measuring various quan-
tities (e.g. spectra, voltages, velocities, etc.) in our system.
Unfortunately, we can not gure out what is happening be-
cause the data appears clouded, unclear and even redundant.
This is not a trivial problem, but rather a fundamental obstacle
in empirical science. Examples abound from complex sys-
tems such as neuroscience, web indexing, meteorology and
oceanography - the number of variables to measure can be
unwieldy and at times even deceptive, because the underlying
relationships can often be quite simple.
Take for example a simple toy problem from physics dia-
grammed in Figure 1. Pretend we are studying the motion
of the physicists ideal spring. This system consists of a ball
of mass m attached to a massless, frictionless spring. The ball
is released a small distance away from equilibrium (i.e. the
spring is stretched). Because the spring is ideal, it oscillates
indenitely along the x-axis about its equilibrium at a set fre-
quency.
This is a standard problem in physics in which the motion
along the x direction is solved by an explicit function of time.
In other words, the underlying dynamics can be expressed as
a function of a single variable x.
However, being ignorant experimenters we do not know any
of this. We do not know which, let alone how many, axes
and dimensions are important to measure. Thus, we decide to
measure the balls position in a three-dimensional space (since
we live in a three dimensional world). Specically, we place
three movie cameras around our system of interest. At 120 Hz
each movie camera records an image indicating a two dimen-
sional position of the ball (a projection). Unfortunately, be-
cause of our ignorance, we do not even know what are the real
x, y and z axes, so we choose three camera positionsa,
b andc
at some arbitrary angles with respect to the system. The angles
between our measurements might not even be 90
o
! Now, we
2
camera A camera B camera C
FIG. 1 A toy example. The position of a ball attached to an oscillat-
ing spring is recorded using three cameras A, B and C. The position
of the ball tracked by each camera is depicted in each panel below.
record with the cameras for several minutes. The big question
remains: how do we get from this data set to a simple equation
of x?
We know a-priori that if we were smart experimenters, we
would have just measured the position along the x-axis with
one camera. But this is not what happens in the real world.
We often do not know which measurements best reect the
dynamics of our system in question. Furthermore, we some-
times record more dimensions than we actually need.
Also, we have to deal with that pesky, real-world problem of
noise. In the toy example this means that we need to deal
with air, imperfect cameras or even friction in a less-than-ideal
spring. Noise contaminates our data set only serving to obfus-
cate the dynamics further. This toy example is the challenge
experimenters face everyday. Keep this example in mind as
we delve further into abstract concepts. Hopefully, by the end
of this paper we will have a good understanding of how to
systematically extract x using principal component analysis.
III. FRAMEWORK: CHANGE OF BASIS
The goal of principal component analysis is to identify the
most meaningful basis to re-express a data set. The hope is
that this new basis will lter out the noise and reveal hidden
structure. In the example of the spring, the explicit goal of
PCA is to determine: the dynamics are along the x-axis. In
other words, the goal of PCA is to determine that x, i.e. the
unit basis vector along the x-axis, is the important dimension.
Determining this fact allows an experimenter to discern which
dynamics are important, redundant or noise.
A. A Naive Basis
With a more precise denition of our goal, we need a more
precise denition of our data as well. We treat every time
sample (or experimental trial) as an individual sample in our
data set. At each time sample we record a set of data consist-
ing of multiple measurements (e.g. voltage, position, etc.). In
our data set, at one point in time, camera A records a corre-
sponding ball position (x
A
, y
A
). One sample or trial can then
be expressed as a 6 dimensional column vector
X =
x
A
y
A
x
B
y
B
x
C
y
C
where each camera contributes a 2-dimensional projection of

the balls position to the entire vector
X. If we record the balls

position for 10 minutes at 120 Hz, then we have recorded 10
60120 = 72000 of these vectors.
With this concrete example, let us recast this problem in ab-
stract terms. Each sample

X is an m-dimensional vector,
where m is the number of measurement types. Equivalently,
every sample is a vector that lies in an m-dimensional vec-
tor space spanned by some orthonormal basis. From linear
algebra we know that all measurement vectors form a linear
combination of this set of unit length basis vectors. What is
this orthonormal basis?
This question is usually a tacit assumption often overlooked.
Pretend we gathered our toy example data above, but only
looked at camera A. What is an orthonormal basis for (x
A
, y
A
)?
A naive choice would be {(1, 0), (0, 1)}, but why select this
basis over {(
2
2
,
2
2
), (
2
2
,
2
2
)} or any other arbitrary rota-
tion? The reason is that the naive basis reects the method we
gathered the data. Pretend we record the position (2, 2). We
did not record 2
2 in the (
2
2
,
2
2
) direction and 0 in the per-
pendicular direction. Rather, we recorded the position (2, 2)
on our camera meaning 2 units up and 2 units to the left in our
camera window. Thus our original basis reects the method
we measured our data.
How do we express this naive basis in linear algebra? In the
two dimensional case, {(1, 0), (0, 1)} can be recast as individ-
ual row vectors. A matrix constructed out of these row vectors
is the 22 identity matrix I. We can generalize this to the m-
dimensional case by constructing an mm identity matrix
B =
b
1
b
2
.
.
.
b
m
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1
= I
3
where each row is an orthornormal basis vector b
i
with m
components. We can consider our naive basis as the effective
starting point. All of our data has been recorded in this basis
and thus it can be trivially expressed as a linear combination
of {b
i
}.
B. Change of Basis
With this rigor we may now state more precisely what PCA
asks: Is there another basis, which is a linear combination of
the original basis, that best re-expresses our data set?
A close reader might have noticed the conspicuous addition of
the word linear. Indeed, PCA makes one stringent but power-
ful assumption: linearity. Linearity vastly simplies the prob-
lem by restricting the set of potential bases. With this assump-
tion PCA is now limited to re-expressing the data as a linear
combination of its basis vectors.
Let X be the original data set, where each column is a single
sample (or moment in time) of our data set (i.e.

X). In the toy
example X is an mn matrix where m = 6 and n = 72000.
Let Y be another mn matrix related by a linear transfor-
mation P. X is the original recorded data set and Y is a new
representation of that data set.
PX = Y (1)
Also let us dene the following quantities.
1
p
i
are the rows of P
x
i
are the columns of X (or individual

X).
y
i
are the columns of Y.
Equation 1 represents a change of basis and thus can have
many interpretations.
1. P is a matrix that transforms X into Y.
2. Geometrically, P is a rotation and a stretch which again
transforms X into Y.
3. The rows of P, {p
1
, . . . , p
m
}, are a set of new basis vec-
tors for expressing the columns of X.
The latter interpretation is not obvious but can be seen by writ-
1
In this section x
i
and y
i
are column vectors, but be forewarned. In all other
sections x
i
and y
i
are row vectors.
ing out the explicit dot products of PX.
PX =
p
1
.
.
.
p
m
x
1
x
n
Y =
p
1
x
1
p
1
x
n
.
.
.
.
.
.
.
.
.
p
m
x
1
p
m
x
n
We can note the form of each column of Y.

y
i
=
p
1
x
i
.
.
.
p
m
x
i
We recognize that each coefcient of y

i
is a dot-product of
x
i
with the corresponding row in P. In other words, the j
th
coefcient of y
i
is a projection on to the j
th
row of P. This is
in fact the very form of an equation where y
i
is a projection
on to the basis of {p
1
, . . . , p
m
}. Therefore, the rows of P are a
new set of basis vectors for representing of columns of X.
C. Questions Remaining
By assuming linearity the problem reduces to nding the ap-
propriate change of basis. The row vectors {p
1
, . . . , p
m
} in
this transformation will become the principal components of
X. Several questions now arise.
What is the best way to re-express X?
What is a good choice of basis P?
These questions must be answered by next asking ourselves
what features we would like Y to exhibit. Evidently, addi-
tional assumptions beyond linearity are required to arrive at
a reasonable result. The selection of these assumptions is the
subject of the next section.
IV. VARIANCE AND THE GOAL
Now comes the most important question: what does best ex-
press the data mean? This section will build up an intuitive
answer to this question and along the way tack on additional
assumptions.
A. Noise and Rotation
Measurement noise in any data set must be low or else, no
matter the analysis technique, no information about a signal
4
!
2
signal
!
2
noise
x
y
FIG. 2 Simulated data of (x, y) for camera A. The signal and noise
variances
2
signal
and
2
noise
are graphically represented by the two
lines subtending the cloud of data. Note that the largest direction
of variance does not lie along the basis of the recording (x
A
, y
A
) but
rather along the best-t line.
can be extracted. There exists no absolute scale for noise but
rather all noise is quantied relative to the signal strength. A
common measure is the signal-to-noise ratio (SNR), or a ratio
of variances
2
,
SNR =

2
signal
2
noise
.
A high SNR (1) indicates a high precision measurement,
while a low SNR indicates very noisy data.
Lets take a closer examination of the data from camera
A in Figure 2. Remembering that the spring travels in a
straight line, every individual camera should record motion in
a straight line as well. Therefore, any spread deviating from
straight-line motion is noise. The variance due to the signal
and noise are indicated by each line in the diagram. The ratio
of the two lengths measures how skinny the cloud is: possibil-
ities include a thin line (SNR 1), a circle (SNR =1) or even
worse. By positing reasonably good measurements, quantita-
tively we assume that directions with largest variances in our
measurement space contain the dynamics of interest. In Fig-
ure 2 the direction with the largest variance is not x
A
= (1, 0)
nor y
A
= (0, 1), but the direction along the long axis of the
cloud. Thus, by assumption the dynamics of interest exist
along directions with largest variance and presumably high-
est SNR.
Our assumption suggests that the basis for which we are
searching is not the naive basis because these directions (i.e.
(x
A
, y
A
)) do not correspond to the directions of largest vari-
ance. Maximizing the variance (and by assumption the SNR)
corresponds to nding the appropriate rotation of the naive
basis. This intuition corresponds to nding the direction indi-
cated by the line
2
signal
in Figure 2. In the 2-dimensional case
of Figure 2 the direction of largest variance corresponds to the
best-t line for the data cloud. Thus, rotating the naive basis
to lie parallel to the best-t line would reveal the direction of
motion of the spring for the 2-D case. How do we generalize
this notion to an arbitrary number of dimensions? Before we
approach this question we need to examine this issue from a
second perspective.
low redundancy high redundancy
r
1
r
2
r
1
r
2
r
1
r
2
FIG. 3 A spectrum of possible redundancies in data from the two
separate measurements r
1
and r
2
. The two measurements on the
left are uncorrelated because one can not predict one from the other.
Conversely, the two measurements on the right are highly correlated
indicating highly redundant measurements.
B. Redundancy
Figure 2 hints at an additional confounding factor in our data
- redundancy. This issue is particularly evident in the example
of the spring. In this case multiple sensors record the same
dynamic information. Reexamine Figure 2 and ask whether
it was really necessary to record 2 variables. Figure 3 might
reect a range of possibile plots between two arbitrary mea-
surement types r
1
and r
2
. The left-hand panel depicts two
recordings with no apparent relationship. Because one can not
predict r
1
from r
2
, one says that r
1
and r
2
are uncorrelated.
On the other extreme, the right-hand panel of Figure 3 de-
picts highly correlated recordings. This extremity might be
achieved by several means:
A plot of (x
A
, x
B
) if cameras A and B are very nearby.
A plot of (x
A
, x
A
) where x
A
is in meters and x
A
is in
inches.
Clearly in the right panel of Figure 3 it would be more mean-
ingful to just have recorded a single variable, not both. Why?
Because one can calculate r
1
from r
2
(or vice versa) using the
best-t line. Recording solely one response would express the
data more concisely and reduce the number of sensor record-
ings (2 1 variables). Indeed, this is the central idea behind
dimensional reduction.
C. Covariance Matrix
In a 2 variable case it is simple to identify redundant cases by
nding the slope of the best-t line and judging the quality of
the t. How do we quantify and generalize these notions to
arbitrarily higher dimensions? Consider two sets of measure-
ments with zero means
A ={a
1
, a
2
, . . . , a
n
} , B ={b
1
, b
2
, . . . , b
n
}
5
where the subscript denotes the sample number. The variance
of A and B are individually dened as,
2
A
=
1
n
i
a
2
i
,
2
B
=
1
n
i
b
2
i
The covariance between A and B is a straight-forward gener-
alization.
covariance o f A and B
2
AB
=
1
n
i
a
i
b
i
The covariance measures the degree of the linear relationship
between two variables. A large positive value indicates pos-
itively correlated data. Likewise, a large negative value de-
notes negatively correlated data. The absolute magnitude of
the covariance measures the degree of redundancy. Some ad-
ditional facts about the covariance.

AB
is zero if and only if A and B are uncorrelated (e.g.
Figure 2, left panel).

2
AB
=
2
A
if A = B.
We can equivalently convert A and B into corresponding row
vectors.
a = [a
1
a
2
. . . a
n
]
b = [b
1
b
2
. . . b
n
]
so that we may express the covariance as a dot product matrix
computation.
2
2
ab
1
n
ab
T
(2)
Finally, we can generalize from two vectors to an arbitrary
number. Rename the row vectors a and b as x
1
and x
2
, respec-
tively, and consider additional indexed row vectors x
3
, . . . , x
m
.
Dene a new mn matrix X.
X =
x
1
.
.
.
x
m
One interpretation of X is the following. Each row of X corre-

sponds to all measurements of a particular type. Each column
of X corresponds to a set of measurements from one particular
trial (this is

X from section 3.1). We now arrive at a denition
for the covariance matrix C
X
.
C
X
1
n
XX
T
.
2
Note that in practice, the covariance
2
AB
is calculated as
1
n1

i
a
i
b
i
. The
slight change in normalization constant arises from estimation theory, but
that is beyond the scope of this tutorial.
Consider the matrix C
X
=
1
n
XX
T
. The i j
th
element of C
X
is the dot product between the vector of the i
th
measurement
type with the vector of the j
th
measurement type. We can
summarize several properties of C
X
:
C
X
is a square symmetric mm matrix (Theorem 2 of
Appendix A)
The diagonal terms of C
X
are the variance of particular
measurement types.
The off-diagonal terms of C
X
are the covariance be-
tween measurement types.
C
X
captures the covariance between all possible pairs of mea-
surements. The covariance values reect the noise and redun-
dancy in our measurements.
In the diagonal terms, by assumption, large values cor-
respond to interesting structure.
In the off-diagonal terms large magnitudes correspond
to high redundancy.
Pretend we have the option of manipulating C
X
. We will sug-
gestively dene our manipulated covariance matrix C
Y
. What
features do we want to optimize in C
Y
?
D. Diagonalize the Covariance Matrix
We can summarize the last two sections by stating that our
goals are (1) to minimize redundancy, measured by the mag-
nitude of the covariance, and (2) maximize the signal, mea-
sured by the variance. What would the optimized covariance
matrix C
Y
look like?
All off-diagonal terms in C
Y
should be zero. Thus, C
Y
must be a diagonal matrix. Or, said another way, Y is
decorrelated.
Each successive dimension in Y should be rank-ordered
according to variance.
There are many methods for diagonalizing C
Y
. It is curious to
note that PCA arguably selects the easiest method: PCA as-
sumes that all basis vectors {p
1
, . . . , p
m
} are orthonormal, i.e.
P is an orthonormal matrix. Why is this assumption easiest?
Envision how PCA works. In our simple example in Figure 2,
P acts as a generalized rotation to align a basis with the axis
of maximal variance. In multiple dimensions this could be
performed by a simple algorithm:
1. Select a normalized direction in m-dimensional space
along which the variance in X is maximized. Save this
vector as p
1
.
6
2. Find another direction along which variance is maxi-
mized, however, because of the orthonormality condi-
tion, restrict the search to all directions orthogonal to
all previous selected directions. Save this vector as p
i
3. Repeat this procedure until m vectors are selected.
The resulting ordered set of ps are the principal components.
In principle this simple algorithm works, however that would
bely the true reason why the orthonormality assumption is ju-
dicious. The true benet to this assumption is that there exists
an efcient, analytical solution to this problem. We will dis-
cuss two solutions in the following sections.
Notice what we gained with the stipulation of rank-ordered
variance. We have a method for judging the importance of
the principal direction. Namely, the variances associated with
each direction p
i
quantify how principal each direction is
by rank-ordering each basis vector p
i
according to the corre-
sponding variances.We will now pause to review the implica-
tions of all the assumptions made to arrive at this mathemati-
cal goal.
E. Summary of Assumptions
This section provides a summary of the assumptions be-
hind PCA and hint at when these assumptions might perform
poorly.
I. Linearity
Linearity frames the problem as a change of ba-
sis. Several areas of research have explored how
extending these notions to nonlinear regimes (see
Discussion).
II. Large variances have important structure.
This assumption also encompasses the belief that
the data has a high SNR. Hence, principal compo-
nents with larger associated variances represent
interesting structure, while those with lower vari-
ances represent noise. Note that this is a strong,
and sometimes, incorrect assumption (see Dis-
cussion).
III. The principal components are orthogonal.
This assumption provides an intuitive simplica-
tion that makes PCA soluble with linear algebra
decomposition techniques. These techniques are
highlighted in the two following sections.
We have discussed all aspects of deriving PCA - what remain
are the linear algebra solutions. The rst solution is some-
what straightforward while the second solution involves un-
derstanding an important algebraic decomposition.
V. SOLVING PCA USING EIGENVECTOR DECOMPOSITION
We derive our rst algebraic solution to PCA based on an im-
portant property of eigenvector decomposition. Once again,
the data set is X, an mn matrix, where m is the number of
measurement types and n is the number of samples. The goal
is summarized as follows.
Find some orthonormal matrix P in Y = PX such
that C
Y
1
n
YY
T
is a diagonal matrix. The rows
of P are the principal components of X.
We begin by rewriting C
Y
in terms of the unknown variable.
C
Y
=
1
n
YY
T
=
1
n
(PX)(PX)
T
=
1
n
PXX
T
P
T
= P(
1
n
XX
T
)P
T
C
Y
= PC
X
P
T
Note that we have identied the covariance matrix of X in the
last line.
Our plan is to recognize that any symmetric matrix A is diag-
onalized by an orthogonal matrix of its eigenvectors (by The-
orems 3 and 4 from Appendix A). For a symmetric matrix A
Theorem 4 provides A=EDE
T
, where D is a diagonal matrix
and E is a matrix of eigenvectors of A arranged as columns.
3
Now comes the trick. We select the matrix P to be a matrix
where each row p
i
is an eigenvector of
1
n
XX
T
. By this selec-
tion, P E
T
. With this relation and Theorem 1 of Appendix
A (P
1
= P
T
) we can nish evaluating C
Y
.
C
Y
= PC
X
P
T
= P(E
T
DE)P
T
= P(P
T
DP)P
T
= (PP
T
)D(PP
T
)
= (PP
1
)D(PP
1
)
C
Y
= D
It is evident that the choice of P diagonalizes C
Y
. This was
the goal for PCA. We can summarize the results of PCA in the
matrices P and C
Y
.
3
The matrix A might have r m orthonormal eigenvectors where r is the
rank of the matrix. When the rank of A is less than m, A is degenerate or all
data occupy a subspace of dimension r m. Maintaining the constraint of
orthogonality, we can remedy this situation by selecting (mr) additional
orthonormal vectors to ll up the matrix E. These additional vectors
do not effect the nal solution because the variances associated with these
directions are zero.
7
The principal components of X are the eigenvectors of
C
X
=
1
n
XX
T
.
The i
th
diagonal value of C
Y
is the variance of X along
p
i
.
In practice computing PCA of a data set X entails (1) subtract-
ing off the mean of each measurement type and (2) computing
the eigenvectors of C
X
. This solution is demonstrated in Mat-
lab code included in Appendix B.
VI. A MORE GENERAL SOLUTION USING SVD
This section is the most mathematically involved and can be
skipped without much loss of continuity. It is presented solely
for completeness. We derive another algebraic solution for
PCA and in the process, nd that PCA is closely related to
singular value decomposition (SVD). In fact, the two are so
intimately related that the names are often used interchange-
ably. What we will see though is that SVD is a more general
method of understanding change of basis.
We begin by quickly deriving the decomposition. In the fol-
lowing section we interpret the decomposition and in the last
section we relate these results to PCA.
A. Singular Value Decomposition
Let X be an arbitrary n m matrix
4
and X
T
X be a rank r,
square, symmetric mm matrix. In a seemingly unmotivated
fashion, let us dene all of the quantities of interest.
{ v
1
, v
2
, . . . , v
r
} is the set of orthonormal m1 eigen-
vectors with associated eigenvalues {
1
,
2
, . . . ,
r
} for
the symmetric matrix X
T
X.
(X
T
X) v
i
=
i
v
i

i
i
are positive real and termed the singular val-
ues.
{ u
1
, u
2
, . . . , u
r
} is the set of n 1 vectors dened by
u
i
i
X v
i
.
The nal denition includes two new and unexpected proper-
ties.
u
i
u
j
=
1 if i = j
0 otherwise
4
Notice that in this section only we are reversing convention from mn to
nm. The reason for this derivation will become clear in section 6.3.
X v
i
=
i
These properties are both proven in Theorem 5. We now have
all of the pieces to construct the decomposition. The scalar
version of singular value decomposition is just a restatement
of the third denition.
X v
i
=
i
u
i
(3)
This result says a quite a bit. X multiplied by an eigen-
vector of X
T
X is equal to a scalar times another vector.
The set of eigenvectors { v
1
, v
2
, . . . , v
r
} and the set of vec-
tors { u
1
, u
2
, . . . , u
r
} are both orthonormal sets or bases in r-
dimensional space.
We can summarize this result for all vectors in one matrix
multiplication by following the prescribed construction in Fig-
ure 4. We start by constructing a new diagonal matrix .

1
.
.
.
0
r
0
0
.
.
.
0
where
2
. . .
r
are the rank-ordered set of singu-
lar values. Likewise we construct accompanying orthogonal
matrices,
V =

v
1
v
2
. . . v
m
U =

u
1
u
2
. . . u
n
where we have appended an additional (mr) and (nr) or-

thonormal vectors to ll up the matrices for V and U respec-
tively (i.e. to deal with degeneracy issues). Figure 4 provides
a graphical representation of how all of the pieces t together
to form the matrix version of SVD.
XV = U
where each column of V and U perform the scalar version of
the decomposition (Equation 3). Because V is orthogonal, we
can multiply both sides by V
1
=V
T
to arrive at the nal form
of the decomposition.
X = UV
T
(4)
Although derived without motivation, this decomposition is
quite powerful. Equation 4 states that any arbitrary matrix X
can be converted to an orthogonal matrix, a diagonal matrix
and another orthogonal matrix (or a rotation, a stretch and a
second rotation). Making sense of Equation 4 is the subject of
the next section.
B. Interpreting SVD
The nal form of SVD is a concise but thick statement. In-
stead let us reinterpret Equation 3 as
Xa = kb
8
The scalar form of SVD is expressed in equation 3.
X v
i
=
i
u
i
The mathematical intuition behind the construction of the matrix form is that we want to express all n scalar equations in just one
equation. It is easiest to understand this process graphically. Drawing the matrices of equation 3 looks likes the following.
We can construct three new matrices V, U and . All singular values are rst rank-ordered
2
. . .
r
, and the corre-
sponding vectors are indexed in the same rank order. Each pair of associated vectors v
i
and u
i
is stacked in the i
th
column along
their respective matrices. The corresponding singular value
i
is placed along the diagonal (the ii
th
position) of . This generates
the equation XV = U, which looks like the following.
The matrices V and U are mm and n n matrices respectively and is a diagonal matrix with a few non-zero values (repre-
sented by the checkerboard) along its diagonal. Solving this single matrix equation solves all n value form equations.
FIG. 4 Construction of the matrix form of SVD (Equation 4) from the scalar form (Equation 3).
where a and b are column vectors and k is a scalar con-
stant. The set { v
1
, v
2
, . . . , v
m
} is analogous to a and the set
{ u
1
, u
2
, . . . , u
n
} is analogous to b. What is unique though is
that { v
1
, v
2
, . . . , v
m
} and { u
1
, u
2
, . . . , u
n
} are orthonormal sets
of vectors which span an m or n dimensional space, respec-
tively. In particular, loosely speaking these sets appear to span
all possible inputs (i.e. a) and outputs (i.e. b). Can we
formalize the view that { v
1
, v
2
, . . . , v
n
} and { u
1
, u
2
, . . . , u
n
}
span all possible inputs and outputs?
We can manipulate Equation 4 to make this fuzzy hypothesis
more precise.
X = UV
T
U
T
X = V
T
U
T
X = Z
where we have dened Z V
T
. Note that the previous
columns { u
1
, u
2
, . . . , u
n
} are now rows in U
T
. Comparing this
equation to Equation 1, { u
1
, u
2
, . . . , u
n
} perform the same role
as { p
1
, p
2
, . . . , p
m
}. Hence, U
T
is a change of basis from X to
Z. Just as before, we were transforming column vectors, we
can again infer that we are transforming column vectors. The
fact that the orthonormal basis U
T
(or P) transforms column
vectors means that U
T
is a basis that spans the columns of X.
Bases that span the columns are termed the column space of
X. The column space formalizes the notion of what are the
possible outputs of any matrix.
There is a funny symmetry to SVD such that we can dene a
similar quantity - the row space.
XV = U
(XV)
T
= (U)
T
V
T
X
T
= U
T
V
T
X
T
= Z
where we have dened Z U
T
. Again the rows of V
T
(or
the columns of V) are an orthonormal basis for transforming
X
T
into Z. Because of the transpose on X, it follows that V
is an orthonormal basis spanning the row space of X. The
row space likewise formalizes the notion of what are possible
inputs into an arbitrary matrix.
We are only scratching the surface for understanding the full
implications of SVD. For the purposes of this tutorial though,
we have enough information to understand how PCA will fall
within this framework.
C. SVD and PCA
It is evident that PCA and SVD are intimately related. Let us
return to the original mn data matrix X. We can dene a
9
Quick Summary of PCA
1. Organize data as an mn matrix, where m is the number
of measurement types and n is the number of samples.
2. Subtract off the mean for each measurement type.
3. Calculate the SVD or the eigenvectors of the covariance.
FIG. 5 A step-by-step instruction list on how to perform principal
component analysis
new matrix Y as an nm matrix.
5
Y
1
n
X
T
where each column of Y has zero mean. The choice of Y
becomes clear by analyzing Y
T
Y.
Y
T
Y =
n
X
T
n
X
T
=
1
n
XX
T
Y
T
Y = C
X
By construction Y
T
Yequals the covariance matrix of X. From
section 5 we know that the principal components of X are
the eigenvectors of C
X
. If we calculate the SVD of Y, the
columns of matrix V contain the eigenvectors of Y
T
Y = C
X
.
Therefore, the columns of V are the principal components of
X. This second algorithm is encapsulated in Matlab code in-
cluded in Appendix B.
What does this mean? V spans the row space of Y
1
n
X
T
.
Therefore, V must also span the column space of
1
n
X. We
can conclude that nding the principal components amounts
to nding an orthonormal basis that spans the column space
of X.
6
VII. DISCUSSION
Principal component analysis (PCA) has widespread applica-
tions because it reveals simple underlying structures in com-
plex data sets using analytical solutions from linear algebra.
Figure 5 provides a brief summary for implementing PCA.
A primary benet of PCA arises from quantifying the impor-
tance of each dimension for describing the variability of a data
set. In particular, the measurement of the variance along each
5
Yis of the appropriate nmdimensions laid out in the derivation of section
6.1. This is the reason for the ipping of dimensions in 6.1 and Figure 4.
6
If the nal goal is to nd an orthonormal basis for the coulmn space of
X then we can calculate it directly without constructing Y. By symmetry
the columns of U produced by the SVD of
1
n
X must also be the principal
components.
A B
x
y
x
y
z
!
FIG. 6 Example of when PCA fails (red lines). (a) Tracking a per-
son on a ferris wheel (black dots). All dynamics can be described
by the phase of the wheel , a non-linear combination of the naive
basis. (b) In this example data set, non-Gaussian distributed data and
non-orthogonal axes causes PCA to fail. The axes with the largest
variance do not correspond to the appropriate answer.
principle component provides a means for comparing the rel-
ative importance of each dimension. An implicit hope behind
employing this method is that the variance along a small num-
ber of principal components (i.e. less than the number of mea-
surement types) provides a reasonable characterization of the
complete data set. This statement is the precise intuition be-
hind any method of dimensional reduction a vast arena of
active research. In the example of the spring, PCA identi-
es that a majority of variation exists along a single dimen-
sion (the direction of motion x), eventhough 6 dimensions are
recorded.
Although PCA works on a multitude of real world prob-
lems, any diligent scientist or engineer must ask when does
PCA fail? Before we answer this question, let us note a re-
markable feature of this algorithm. PCA is completely non-
parametric: any data set can be plugged in and an answer
comes out, requiring no parameters to tweak and no regard for
how the data was recorded. From one perspective, the fact that
PCA is non-parametric (or plug-and-play) can be considered
a positive feature because the answer is unique and indepen-
dent of the user. From another perspective the fact that PCA
is agnostic to the source of the data is also a weakness. For
instance, consider tracking a person on a ferris wheel in Fig-
ure 6a. The data points can be cleanly described by a single
variable, the precession angle of the wheel , however PCA
would fail to recover this variable.
A. Limits and Statistics of Dimensional Reduction
A deeper appreciation of the limits of PCA requires some con-
sideration about the underlying assumptions and in tandem,
a more rigorous description of the source of data. Gener-
ally speaking, the primary motivation behind this method is
to decorrelate the data set, i.e. remove second-order depen-
dencies. The manner of approaching this goal is loosely akin
to how one might explore a town in the Western United States:
drive down the longest road running through the town. When
10
one sees another big road, turn left or right and drive down
this road, and so forth. In this analogy, PCA requires that each
new road explored must be perpendicular to the previous, but
clearly this requirement is overly stringent and the data (or
town) might be arranged along non-orthogonal axes, such as
Figure 6b. Figure 6 provides two examples of this type of data
where PCA provides unsatisfying results.
To address these problems, we must dene what we consider
optimal results. In the context of dimensional reduction, one
measure of success is the degree to which a reduced repre-
sentation can predict the original data. In statistical terms,
we must dene an error function (or loss function). It can
be proved that under a common loss function, mean squared
error (i.e. L
2
norm), PCA provides the optimal reduced rep-
resentation of the data. This means that selecting orthogonal
directions for principal components is the best solution to pre-
dicting the original data. Given the examples of Figure 6, how
could this statement be true? Our intuitions from Figure 6
suggest that this result is somehow misleading.
The solution to this paradox lies in the goal we selected for the
analysis. The goal of the analysis is to decorrelate the data, or
said in other terms, the goal is to remove second-order depen-
dencies in the data. In the data sets of Figure 6, higher order
dependencies exist between the variables. Therefore, remov-
ing second-order dependencies is insufcient at revealing all
structure in the data.
7
Multiple solutions exist for removing higher-order dependen-
cies. For instance, if prior knowledge is known about the
problem, then a nonlinearity (i.e. kernel) might be applied
to the data to transform the data to a more appropriate naive
basis. For instance, in Figure 6a, one might examine the po-
lar coordinate representation of the data. This parametric ap-
proach is often termed kernel PCA.
Another direction is to impose more general statistical deni-
tions of dependency within a data set, e.g. requiring that data
along reduced dimensions be statistically independent. This
class of algorithms, termed, independent component analysis
(ICA), has been demonstrated to succeed in many domains
where PCA fails. ICA has been applied to many areas of sig-
nal and image processing, but suffers from the fact that solu-
tions are (sometimes) difcult to compute.
Writing this paper has been an extremely instructional expe-
rience for me. I hope that this paper helps to demystify the
motivation and results of PCA, and the underlying assump-
tions behind this important analysis technique. Please send
me a note if this has been useful to you as it inspires me to
keep writing!
7
When are second order dependencies sufcient for revealing all dependen-
cies in a data set? This statistical condition is met when the rst and second
order statistics are sufcient statistics of the data. This occurs, for instance,
when a data set is Gaussian distributed.
APPENDIX A: Linear Algebra
This section proves a few unapparent theorems in linear
algebra, which are crucial to this paper.
1. The inverse of an orthogonal matrix is its transpose.
Let A be an mn orthogonal matrix where a
i
is the i
th
column
vector. The i j
th
element of A
T
A is
(A
T
A)
i j
= a
i
T
a
j
=
1 i f i = j
0 otherwise
Therefore, because A
T
A = I, it follows that A
1
= A
T
.
2. For any matrix A, A
T
A and AA
T
are symmetric.
(AA
T
)
T
= A
TT
A
T
= AA
T
(A
T
A)
T
= A
T
A
TT
= A
T
A
3. A matrix is symmetric if and only if it is orthogonally
diagonalizable.
Because this statement is bi-directional, it requires a two-part
if-and-only-if proof. One needs to prove the forward and
the backwards if-then cases.
Let us start with the forward case. If A is orthogonally di-
agonalizable, then A is a symmetric matrix. By hypothesis,
orthogonally diagonalizable means that there exists some E
such that A = EDE
T
, where D is a diagonal matrix and E is
some special matrix which diagonalizes A. Let us compute
A
T
.
A
T
= (EDE
T
)
T
= E
TT
D
T
E
T
= EDE
T
= A
Evidently, if A is orthogonally diagonalizable, it must also be
symmetric.
The reverse case is more involved and less clean so it will be
left to the reader. In lieu of this, hopefully the forward case
is suggestive if not somewhat convincing.
4. A symmetric matrix is diagonalized by a matrix of its
orthonormal eigenvectors.
Let A be a square nn symmetric matrix with associated
eigenvectors {e
1
, e
2
, . . . , e
n
}. Let E = [e
1
e
2
. . . e
n
] where the
i
th
column of E is the eigenvector e
i
. This theorem asserts that
there exists a diagonal matrix D such that A = EDE
T
.
This proof is in two parts. In the rst part, we see that the
any matrix can be orthogonally diagonalized if and only if
it that matrixs eigenvectors are all linearly independent. In
the second part of the proof, we see that a symmetric matrix
11
has the special property that all of its eigenvectors are not just
linearly independent but also orthogonal, thus completing our
proof.
In the rst part of the proof, let A be just some matrix, not
necessarily symmetric, and let it have independent eigenvec-
tors (i.e. no degeneracy). Furthermore, let E = [e
1
e
2
. . . e
n
]
be the matrix of eigenvectors placed in the columns. Let D be
a diagonal matrix where the i
th
eigenvalue is placed in the ii
th
position.
We will now show that AE = ED. We can examine the
columns of the right-hand and left-hand sides of the equation.
Left hand side : AE = [Ae
1
Ae
2
. . . Ae
n
]
Right hand side : ED = [
1
e
1
2
e
2
. . .
n
e
n
]
Evidently, if AE = ED then Ae
i
=
i
e
i
for all i. This equa-
tion is the denition of the eigenvalue equation. Therefore,
it must be that AE = ED. A little rearrangement provides
A = EDE
1
, completing the rst part the proof.
For the second part of the proof, we show that a symmetric
matrix always has orthogonal eigenvectors. For some sym-
metric matrix, let
1
and
2
be distinct eigenvalues for eigen-
vectors e
1
and e
2
.
1
e
1
e
2
= (
1
e
1
)
T
e
2
= (Ae
1
)
T
e
2
= e
1
T
A
T
e
2
= e
1
T
Ae
2
= e
1
T
(
2
e
2
)
1
e
1
e
2
=
2
e
1
e
2
By the last relation we can equate that (
1
2
)e
1
e
2
= 0.
Since we have conjectured that the eigenvalues are in fact
unique, it must be the case that e
1
e
2
= 0. Therefore, the
eigenvectors of a symmetric matrix are orthogonal.
Let us back up now to our original postulate that A is a sym-
metric matrix. By the second part of the proof, we know
that the eigenvectors of A are all orthonormal (we choose
the eigenvectors to be normalized). This means that E is an
orthogonal matrix so by theorem 1, E
T
= E
1
and we can
rewrite the nal result.
A = EDE
T
. Thus, a symmetric matrix is diagonalized by a matrix of its
eigenvectors.
5. For any arbitrary m n matrix X, the symmetric
matrix X
T
X has a set of orthonormal eigenvectors
of { v
1
, v
2
, . . . , v
n
} and a set of associated eigenvalues
{
1
,
2
, . . . ,
n
}. The set of vectors {X v
1
, X v
2
, . . . , X v
n
}
then form an orthogonal basis, where each vector X v
i
is of
length
i
.
All of these properties arise from the dot product of any two
vectors from this set.
(X v
i
) (X v
j
) = (X v
i
)
T
(X v
j
)
= v
T
i
X
T
X v
j
= v
T
i
(
j
v
j
)
=
j
v
i
v
j
(X v
i
) (X v
j
) =
j
i j
The last relation arises because the set of eigenvectors of X is
orthogonal resulting in the Kronecker delta. In more simpler
terms the last relation states:
(X v
i
) (X v
j
) =

j
i = j
0 i = j
This equation states that any two vectors in the set are orthog-
onal.
The second property arises from the above equation by realiz-
ing that the length squared of each vector is dened as:
X v
i
2
= (X v
i
) (X v
i
) =
i
APPENDIX B: Code
This code is written for Matlab 6.5 (Release 13) from
Mathworks
8
. The code is not computationally ef-
cient but explanatory (terse comments begin with a %).
This rst version follows Section 5 by examining the
covariance of the data set.
function [signals,PC,V] = pca1(data)
% PCA1: Perform PCA using covariance.
% data - MxN matrix of input data
% (M dimensions, N trials)
% signals - MxN matrix of projected data
% PC - each column is a PC
% V - Mx1 matrix of variances
[M,N] = size(data);
% subtract off the mean for each dimension
mn = mean(data,2);
data = data - repmat(mn,1,N);
% calculate the covariance matrix
covariance = 1 / (N-1) * data * data;
% find the eigenvectors and eigenvalues
8
http://www.mathworks.com
12
[PC, V] = eig(covariance);
% extract diagonal of matrix as vector
V = diag(V);
% sort the variances in decreasing order
[junk, rindices] = sort(-1*V);
V = V(rindices);
PC = PC(:,rindices);
% project the original data set
signals = PC * data;
This second version follows section 6 computing PCA
through SVD.
function [signals,PC,V] = pca2(data)
% PCA2: Perform PCA using SVD.
% data - MxN matrix of input data
% (M dimensions, N trials)
% signals - MxN matrix of projected data
% PC - each column is a PC
% V - Mx1 matrix of variances
[M,N] = size(data);
% subtract off the mean for each dimension
mn = mean(data,2);
data = data - repmat(mn,1,N);
% construct the matrix Y
Y = data / sqrt(N-1);
% SVD does it all
[u,S,PC] = svd(Y);
% calculate the variances
S = diag(S);
V = S .* S;
% project the original data
signals = PC * data;
Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]
Eigenvalues and eigenvectors
From Wikipedia, the free encyclopedia
An ei genvec t or of a square matrix is a non-zero vector
that, when multiplied by , yields the original vector multiplied by
a single number ; that is:
The number is called the ei genval ue of corresponding to
.
[1]
In analytic geometry, for example, a three-element vector may be
seen as an arrow in three-dimensional space starting at the origin.
In that case, an eigenvector of a 33 matrix is an arrow whose
direction is either preserved or exactly reversed after multiplication
by . The corresponding eigenvalue determines how the length of
the arrow is changed by the operation, and whether its direction is
reversed or not.
In abstract linear algebra, these concepts are naturally extended to
more general situations, where the set of real scale factors is replaced by any field of scalars (such as algebraic or
complex numbers); the set of Cartesian vectors is replaced by any vector space (such as the continuous
functions, the polynomials or the trigonometric series), and matrix multiplication is replaced by any linear operator that
maps vectors to vectors (such as the derivative from calculus). In such cases, the "vector" in "eigenvector" may be
replaced by a more specific term, such as "eigenfunction", "eigenmode", "eigenface", or "eigenstate". Thus, for
example, the exponential function is an eigenfunction of the derivative operator " ", with eigenvalue
, since its derivative is .
The set of all eigenvectors of a matrix (or linear operator), each paired with its corresponding eigenvalue, is called the
ei gensyst em of that matrix.
[2]
An ei genspac e of a matrix is the set of all eigenvectors with the same
eigenvalue, together with the zero vector.
[1]
An ei genbasi s for is any basis for the set of all vectors that
consists of linearly independent eigenvectors of . Not every real matrix has real eigenvalues, but every complex
matrix has at least one complex eigenvalue.
The terms c har ac t er i st i c vec t or , c har ac t er i st i c val ue, and c har ac t er i st i c spac e are also used for these
concepts. The prefix ei gen - is adopted from the German word eigen for "self" or "proper".
Eigenvalues and eigenvectors have many applications in both pure and applied mathematics. They are used in matrix
factorization, in quantum mechanics, and in many other areas.
Cont ent s [hide]
1 Definition
1.1 Eigenvectors and eigenvalues of a real matrix
1.1.1 An example
1.1.2 Another example
1.1.3 Trivial cases
1.2 General definition
1.3 Eigenspace and spectrum
1.4 Eigenbasis
2 Generalizations to infinite-dimensional spaces
2.1 Eigenfunctions
2.2 Spectral theory
2.3 Associative algebras and representation theory
3 Eigenvalues and eigenvectors of matrices
In this shear mapping the red arrow changes
direction but the blue arrow does not. The blue arrow
is an eigenvector of this shear mapping, and since its
length is unchanged its eigenvalue is 1.
Read Edit View history Article Talk
Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Help
About Wikipedia
Community portal
Recent changes
Contact Wikipedia
()
Catal
esky
Dansk
Deutsch
Espaol
Esperanto
Franais
Italiano
Latvieu
Lietuvi
Magyar
Nederlands
Norsk bokml
Norsk nynorsk
Polski
Portugus
Romn
Simple English
Slovenina
Suomi
Svenska
Interaction
Toolbox
Print/export
Languages
Create account Log in
3.1 Characteristic polynomial
3.1.1 In the real domain
3.1.2 In the complex domain
3.2 Algebraic multiplicities
3.2.1 Example
3.3 Diagonalization and eigendecomposition
3.4 Further properties
3.5 Left and right eigenvectors
4 Calculation
4.1 Computing the eigenvalues
4.2 Computing the eigenvectors
5 History
6 Applications
6.1 Eigenvalues of geometric transformations
6.2 Schrdinger equation
6.3 Molecular orbitals
6.4 Geology and glaciology
6.5 Principal components analysis
6.6 Vibration analysis
6.7 Eigenfaces
6.8 Tensor of moment of inertia
6.9 Stress tensor
6.10 Eigenvalues of a graph
6.11 Basic reproduction number
7 See also
8 Notes
9 References
10 External links
See also: Euclidean vector and Matrix (mathematics)
In many contexts, a vector can be assumed to be a list of real
numbers (called elements), written vertically with brackets around the
entire list, such as the vectors u and v below. Two vectors are said to
be scalar multiples of each other (also called parallel or collinear) if
they have the same number of elements, and if every element of one
vector is obtained by multiplying each corresponding element in the
other vector by the same number (known as a scaling factor, or a
scalar). For example, the vectors
and
are scalar multiples of each other, because each element of is 20
times the corresponding element of .
A vector with three elements, like or above, may represent a point in three-dimensional space, relative to some
Cartesian coordinate system. It helps to think of such a vector as the tip of an arrow whose tail is at the origin of the
coordinate system. In this case, the condition " is parallel to " means that the two arrows lie on the same straight
line, and may differ only in length and direction along that line.
If we multiply any square matrix with rows and columns by such a vector , the result will be another vector
, also with rows and one column. That is,
[edit]
Definition
[edit] Ei genvec t or s and ei genval ues of a r eal mat r i x
Matrix acts by stretching the vector , not
changing its direction, so is an eigenvector of
.
Ting Vit
Edit links
is mapped to
where, for each index ,
In general, if is not all zeros, the vectors and will not be parallel. When they are parallel (that is, when there
is some real number such that ) we say that is an ei genvec t or of . In that case, the scale factor
is said to be the ei genval ue corresponding to that eigenvector.
In particular, multiplication by a 33 matrix may change both the direction and the magnitude of an arrow in
three-dimensional space. However, if is an eigenvector of with eigenvalue , the operation may only change its
length, and either keep its direction or flip it (make the arrow point in the exact opposite direction). Specifically, the
length of the arrow will increase if , remain the same if , and decrease it if . Moreover,
the direction will be precisely the same if , and flipped if . If , then the length of the arrow
becomes zero.
For the transformation matrix
the vector
is an eigenvector with eigenvalue 2. Indeed,
On the other hand the vector
is not an eigenvector, since
and this vector is not a multiple of the original vector .
For the matrix
we have
[edit] An ex ampl e
The transformation matrix preserves the direction of
vectors parallel to (in blue) and (in violet). The
points that lie on the line through the origin, parallel to an
eigenvector, remain on the line after the transformation.
The vectors in red are not eigenvectors, therefore their
direction is altered by the transformation. See also: An
extended version, showing all four quadrants.
[edit] Anot her ex ampl e
and
Therefore, the vectors , and are eigenvectors of corresponding to the
eigenvalues 0, 3, and 2, respectively. (Here the symbol indicates matrix transposition, in this case turning the row
vectors into column vectors.)
The identity matrix (whose general element is 1 if , and 0 otherwise) maps every vector to itself.
Therefore, every vector is an eigenvector of , with eigenvalue 1.
More generally, if is a diagonal matrix (with whenever ), and is a vector parallel to axis (that
is, , and if ), then where . That is, the eigenvalues of a diagonal
matrix are the elements of its main diagonal. This is trivially the case of any 1 1 matrix.
The concept of eigenvectors and eigenvalues extends naturally to abstract linear transformations on abstract vector
spaces. Namely, let be any vector space over some field of scalars, and let be a linear transformation
mapping into . We say that a non-zero vector of is an ei genvec t or of if (and only if) there is a scalar
in such that
.
This equation is called the eigenvalue equation for , and the scalar is the ei genval ue of corresponding to
the eigenvector . Note that means the result of applying the operator to the vector , while means
the product of the scalar by .
[3]
The matrix-specific definition is a special case of this abstract definition. Namely, the vector space is the set of all
column vectors of a certain size 1, and is the linear transformation that consists in multiplying a vector by the
given matrix .
Some authors allow to be the zero vector in the definition of eigenvector.
[4]
This is reasonable as long as we
define eigenvalues and eigenvectors carefully: If we would like the zero vector to be an eigenvector, then we must
first define an eigenvalue of as a scalar in such that there is a nonzero vector in with . We
then define an eigenvector to be a vector in such that there is an eigenvalue in with . This
way, we ensure that it is not the case that every scalar is an eigenvalue corresponding to the zero vector.
If is an eigenvector of , with eigenvalue , then any scalar multiple of with nonzero is also an
eigenvector with eigenvalue , since . Moreover, if and are
eigenvectors with the same eigenvalue , then is also an eigenvector with the same eigenvalue .
Therefore, the set of all eigenvectors with the same eigenvalue , together with the zero vector, is a linear subspace
of , called the ei genspac e of associated to .
[5][6]
If that subspace has dimension 1, it is sometimes called
an ei genl i ne .
[7]
The geometric multiplicity of an eigenvalue is the dimension of the eigenspace associated to , i.e.
number of linearly independent eigenvectors with that eigenvalue. These eigenvectors can be chosen so that they are
pairwise orthogonal and have unit length under some arbitrary inner product defined on . In other words, every
eigenspace has an orthonormal basis of eigenvectors.
Conversely, any eigenvector with eigenvalue must be linearly independent from all eigenvectors that are associated
to a different eigenvalue . Therefore a linear transformation that operates on an -dimensional space cannot
[edit] Tr i vi al c ases
[edit] Gener al def i ni t i on
[edit] Ei genspac e and spec t r um
have more than distinct eigenvalues (or eigenspaces).
[8]
Any subspace spanned by eigenvectors of is an invariant subspace of .
The list of eigenvalues of is sometimes called the spec t r um of . The order of this list is arbitrary, but the
number of times that an eigenvalue appears is important.
There is no unique way to choose a basis for an eigenspace of an abstract linear operator based only on itself,
without some additional data such as a choice of coordinate basis for . Even for an eigenline, the basis vector is
indeterminate in both magnitude and orientation. If the scalar field is the real numbers , one can order the
eigenspaces by their eigenvalues. Since the modulus of an eigenvalue is important in many applications, the
eigenspaces are often ordered by that criterion.
An ei genbasi s for a linear operator that operates on a vector space is a basis for that consists entirely of
eigenvectors of (possibly with different eigenvalues). Such a basis may not exist.
Suppose has finite dimension , and let be the sum of the geometric multiplicities over all distinct
eigenvalues of . This integer is the maximum number of linearly independent eigenvectors of , and therefore
cannot exceed . If is exactly , then admits an eigenbasis; that is, there exists a basis for that consists
of eigenvectors. The matrix that represents relative to this basis is a diagonal matrix, whose diagonal
elements are the eigenvalues associated to each basis vector.
Conversely, if the sum is less than , then admits no eigenbasis, and there is no choice of coordinates that
will allow to be represented by a diagonal matrix.
Note that is at least equal to the number of distinct eigenvalues of , but may be larger than that.
[9]
For
example, the identity operator on has , and any basis of is an eigenbasis of ; but its only
eigenvalue is 1, with .
For more details on this topic, see Spectral theorem.
The definition of eigenvalue of a linear transformation remains valid even if the underlying space is an infinite
dimensional Hilbert or Banach space. Namely, a scalar is an eigenvalue if and only if there is some nonzero vector
such that .
A widely used class of linear operators acting on infinite dimensional spaces are the differential operators on function
spaces. Let be a linear differential operator in on the space of infinitely differentiable real functions of a real
argument . The eigenvalue equation for is the differential equation
The functions that satisfy this equation are commonly called ei genf unc t i ons . For the derivative operator , an
eigenfunction is a function that, when differentiated, yields a constant times the original function. If is zero, the
generic solution is a constant function. If is non-zero, the solution is an exponential function
Eigenfunctions are an essential tool in the solution of differential equations and many other applied and theoretical
fields. For instance, the exponential functions are eigenfunctions of any shift invariant linear operator. This fact is the
basis of powerful Fourier transform methods for solving all sorts of problems.
If is an eigenvalue of , then the operator is not one-to-one, and therefore its inverse
is not defined. The converse is true for finite-dimensional vector spaces, but not for infinite-dimensional ones. In
general, the operator may not have an inverse, even if is not an eigenvalue.
For this reason, in functional analysis one defines the spectrum of a linear operator as the set of all scalars for
which the operator has no bounded inverse. Thus the spectrum of an operator always contains all its
eigenvalues, but is not limited to them.
[edit] Ei genbasi s
[edit]
Generalizations to infinite-dimensional spaces
[edit] Ei genf unc t i ons
[edit] Spec t r al t heor y
More algebraically, rather than generalizing the vector space to an infinite dimensional space, one can generalize the
algebraic object that is acting on the space, replacing a single operator acting on a vector space with an algebra
representation an associative algebra acting on a module. The study of such actions is the field of representation
theory.
A closer analog of eigenvalues is given by the representation-theoretical concept of weight, with the analogs of
eigenvectors and eigenspaces being weight vectors and weight spaces.
The eigenvalue equation for a matrix is
which is equivalent to
where is the identity matrix. It is a fundamental result of linear algebra that an equation has a
non-zero solution if and only if the determinant of the matrix is zero. It follows that the eigenvalues
of are precisely the real numbers that satisfy the equation
The left-hand side of this equation can be seen (using Leibniz' rule for the determinant) to be a polynomial function of
the variable . The degree of this polynomial is , the order of the matrix. Its coefficients depend on the entries of
, except that its term of degree is always . This polynomial is called the characteristic polynomial of
; and the above equation is called the characteristic equation (or, less often, the secular equation) of .
For example, let be the matrix
The characteristic polynomial of is
which is
The roots of this polynomial are 2, 1, and 11. Indeed these are the only three eigenvalues of , corresponding to the
eigenvectors and (or any non-zero multiples thereof).
Since the eigenvalues are roots of the characteristic polynomial, an matrix has at most eigenvalues. If the
matrix has real entries, the coefficients of the characteristic polynomial are all real; but it may have fewer than real
roots, or no real roots at all.
For example, consider the cyclic permutation matrix
This matrix shifts the coordinates of the vector up by one position, and moves the first coordinate to the bottom. Its
characteristic polynomial is which has one real root . Any vector with three equal non-zero
elements is an eigenvector for this eigenvalue. For example,
[edit] Assoc i at i ve al gebr as and r epr esent at i on t heor y
[edit]
Eigenvalues and eigenvectors of matrices
[edit] Char ac t er i st i c pol ynomi al
[edit] I n t he r eal domai n
The fundamental theorem of algebra implies that the characteristic polynomial of an matrix , being a
polynomial of degree , has exactly complex roots. More precisely, it can be factored into the product of linear
terms,
where each is a complex number. The numbers , , ... , (which may not be all distinct) are roots of the
polynomial, and are precisely the eigenvalues of .
Even if the entries of are all real numbers, the eigenvalues may still have non-zero imaginary parts (and the
elements of the corresponding eigenvectors will therefore also have non-zero imaginary parts). Also, the eigenvalues
may be irrational numbers even if all the entries of are rational numbers, or all are integers. However, if the
entries of are algebraic numbers (which include the rationals), the eigenvalues will be (complex) algebraic
numbers too.
The non-real roots of a real polynomial with real coefficients can be grouped into pairs of complex conjugate values,
namely with the two members of each pair having the same real part and imaginary parts that differ only in sign. If
the degree is odd, then by the intermediate value theorem at least one of the roots will be real. Therefore, any real
matrix with odd order will have at least one real eigenvalue; whereas a real matrix with even order may have no real
eigenvalues.
In the example of the 33 cyclic permutation matrix , above, the characteristic polynomial has two
additional non-real roots, namely
and ,
where is the imaginary unit. Note that , , and . Then
and
Therefore, the vectors and are eigenvectors of , with eigenvalues 1, , and ,
respectively.
Let be an eigenvalue of an matrix . The algebraic multiplicity of is its multiplicity as a root
of the characteristic polynomial, that is, the largest integer such that divides evenly that polynomial.
Like the geometric multiplicity , the algebraic multiplicity is an integer between 1 and ; and the sum of
over all distinct eigenvalues also cannot exceed . If complex eigenvalues are considered, is exactly
.
It can be proved that the geometric multiplicity of an eigenvalue never exceeds its algebraic multiplicity
. Therefore, is at most .
For the matrix:
the characteristic polynomial of is
,
[edit] I n t he c ompl ex domai n
[edit] Al gebr ai c mul t i pl i c i t i es
[edit] Ex ampl e
being the product of the diagonal with a lower triangular matrix.
The roots of this polynomial, and hence the eigenvalues, are 2 and 3. The algebraic multiplicity of each eigenvalue is
2; in other words they are both double roots. On the other hand, the geometric multiplicity of the eigenvalue 2 is only
1, because its eigenspace is spanned by the vector , and is therefore 1 dimensional. Similarly, the
geometric multiplicity of the eigenvalue 3 is 1 because its eigenspace is spanned by . Hence, the total
algebraic multiplicity of A, denoted , is 4, which is the most it could be for a 4 by 4 matrix. The geometric
multiplicity is 2, which is the smallest it could be for a matrix which has two distinct eigenvalues.
If the sum of the geometric multiplicities of all eigenvalues is exactly , then has a set of linearly
independent eigenvectors. Let be a square matrix whose columns are those eigenvectors, in any order. Then we
will have , where is the diagonal matrix such that is the eigenvalue associated to column of
. Since the columns of are linearly independent, the matrix is invertible. Premultiplying both sides by
we get . By definition, therefore, the matrix is diagonalizable.
Conversely, if is diagonalizable, let be a non-singular square matrix such that is some diagonal
matrix . Multiplying both sides on the left by we get . Therefore each column of must be an
eigenvector of , whose eigenvalue is the corresponding element on the diagonal of . Since the columns of
must be linearly independent, it follows that . Thus is equal to if and only if is diagonalizable.
If is diagonalizable, the space of all -element vectors can be decomposed into the direct sum of the eigenspaces
of . This decomposition is called the eigendecomposition of , and it is the preserved under change of
coordinates.
A matrix that is not diagonalizable is said to be defective. For defective matrices, the notion of eigenvector can be
generalized to generalized eigenvectors, and that of diagonal matrix to a J ordan form matrix. Over an algebraically
closed field, any matrix has a J ordan form and therefore admits a basis of generalized eigenvectors, and a
decomposition into generalized eigenspaces
Let be an arbitrary matrix of complex numbers with eigenvalues , , ... . (Here it is understood
that an eigenvalue with algebraic multiplicity occurs times in this list.) Then
The trace of , defined as the sum of its diagonal elements, is also the sum of all eigenvalues:
.
The determinant of is the product of all eigenvalues:
.
The eigenvalues of the th power of , i.e. the eigenvalues of , for any positive integer , are
The matrix is invertible if and only if all the eigenvalues are nonzero.
If is invertible, then the eigenvalues of are
If is equal to its conjugate transpose (in other words, if is Hermitian), then every eigenvalue is real. The
same is true of any a symmetric real matrix. If is also positive-definite, positive-semidefinite, negative-definite,
or negative-semidefinite every eigenvalue is positive, non-negative, negative, or non-positive respectively.
Every eigenvalue of a unitary matrix has absolute value .
See also: left and right (algebra)
[edit] Di agonal i zat i on and ei gendec omposi t i on
[edit] Fur t her pr oper t i es
[edit] Lef t and r i ght ei genvec t or s
The use of matrices with a single column (rather than a single row) to represent vectors is traditional in many
disciplines. For that reason, the word "eigenvector" almost always means a r i ght ei genvec t or , namely a column
vector that must placed to the right of the matrix in the defining equation
.
There may be also single-row vectors that are unchanged when they occur on the left side of a product with a square
matrix ; that is, which satisfy the equation
Any such row vector is called a l ef t ei genvec t or of .
The left eigenvectors of are transposes of the right eigenvectors of the transposed matrix , since their defining
equation is equivalent to
It follows that, if is Hermitian, its left and right eigenvectors are complex conjugates. In particular if is a real
symmetric matrix, they are the same except for transposition.
Main article: Eigenvalue algorithm
The eigenvalues of a matrix can be determined by finding the roots of the characteristic polynomial. Explicit
algebraic formulas for the roots of a polynomial exist only if the degree is 4 or less. According to the AbelRuffini
theorem there is no general, explicit and exact algebraic formula for the roots of a polynomial with degree 5 or more.
It turns out that any polynomial with degree is the characteristic polynomial of some companion matrix of order .
Therefore, for matrices of order 5 or more, the eigenvalues and eigenvectors cannot be obtained by an explicit
algebraic formula, and must therefore be computed by approximate numerical methods.
In theory, the coefficients of the characteristic polynomial can be computed exactly, since they are sums of products
of matrix elements; and there are algorithms that can find all the roots of a polynomial of arbitrary degree to any
required accuracy.
[10]
However, this approach is not viable in practice because the coefficients would be
contaminated by unavoidable round-off errors, and the roots of a polynomial can be an extremely sensitive function of
the coefficients (as exemplified by Wilkinson's polynomial).
[10]
Efficient, accurate methods to compute eigenvalues and eigenvectors of arbitrary matrices were not known until the
advent of the QR algorithm in 1961.
[10]
Combining the Householder transformation with the LU decomposition
results in an algorithm with better convergence than the QR algorithm.
[citation needed]
For large Hermitian sparse
matrices, the Lanczos algorithm is one example of an efficient iterative method to compute eigenvalues and
eigenvectors, among several other possibilities.
[10]
Once the (exact) value of an eigenvalue is known, the corresponding eigenvectors can be found by finding non-zero
solutions of the eigenvalue equation, that becomes a system of linear equations with known coefficients. For example,
once it is known that 6 is an eigenvalue of the matrix
we can find its eigenvectors by solving the equation , that is
This matrix equation is equivalent to two linear equations
that is
Both equations reduce to the single linear equation . Therefore, any vector of the form , for any
non-zero real number , is an eigenvector of with eigenvalue .
[edit]
Calculation
[edit] Comput i ng t he ei genval ues
[edit] Comput i ng t he ei genvec t or s
The matrix above has another eigenvalue . A similar calculation shows that the corresponding eigenvectors
are the non-zero solutions of , that is, any vector of the form , for any non-zero real number
.
Some numeric methods that compute the eigenvalues of a matrix also determine a set of corresponding eigenvectors
as a by-product of the computation.
Eigenvalues are often introduced in the context of linear algebra or matrix theory. Historically, however, they arose in
the study of quadratic forms and differential equations.
Euler studied the rotational motion of a rigid body and discovered the importance of the principal axes. Lagrange
realized that the principal axes are the eigenvectors of the inertia matrix.
[11]
In the early 19th century, Cauchy saw
how their work could be used to classify the quadric surfaces, and generalized it to arbitrary dimensions.
[12]
Cauchy
also coined the term racine caractristique (characteristic root) for what is now called eigenvalue; his term survives in
characteristic equation.
[13]
Fourier used the work of Laplace and Lagrange to solve the heat equation by separation of variables in his famous
1822 book Thorie analytique de la chaleur.
[14]
Sturm developed Fourier's ideas further and brought them to the
attention of Cauchy, who combined them with his own ideas and arrived at the fact that real symmetric matrices have
real eigenvalues.
[12]
This was extended by Hermite in 1855 to what are now called Hermitian matrices.
[13]
Around
the same time, Brioschi proved that the eigenvalues of orthogonal matrices lie on the unit circle,
[12]
and Clebsch
found the corresponding result for skew-symmetric matrices.
[13]
Finally, Weierstrass clarified an important aspect in
the stability theory started by Laplace by realizing that defective matrices can cause instability.
[12]
In the meantime, Liouville studied eigenvalue problems similar to those of Sturm; the discipline that grew out of their
work is now called SturmLiouville theory.
[15]
Schwarz studied the first eigenvalue of Laplace's equation on general
domains towards the end of the 19th century, while Poincar studied Poisson's equation a few years later.
[16]
At the start of the 20th century, Hilbert studied the eigenvalues of integral operators by viewing the operators as
infinite matrices.
[17]
He was the first to use the German word eigen to denote eigenvalues and eigenvectors in 1904,
though he may have been following a related usage by Helmholtz. For some time, the standard term in English was
"proper value", but the more distinctive term "eigenvalue" is standard today.
[18]
The first numerical algorithm for computing eigenvalues and eigenvectors appeared in 1929, when Von Mises
published the power method. One of the most popular methods today, the QR algorithm, was proposed
independently by J ohn G.F. Francis
[19]
and Vera Kublanovskaya
[20]
in 1961.
[21]
The following table presents some example transformations in the plane along with their 22 matrices, eigenvalues,
and eigenvectors.
scaling unequal scaling rotation horizontal shear
hyperbolic
rotation
illustration
matrix

characteristic
polynomial
eigenvalues
[edit]
History
[edit]
Applications
[edit] Ei genval ues of geomet r i c t r ansf or mat i ons
,
algebraic
multipl.
geometric
multipl.
eigenvectors
All non-zero
vectors
Note that the characteristic equation for a rotation is a quadratic equation with discriminant ,
which is a negative number whenever is not an integer multiple of 180. Therefore, except for these special cases,
the two eigenvalues are complex numbers, ; and all eigenvectors have non-real entries. Indeed,
except for those special cases, a rotation changes the direction of every nonzero vector in the plane.
An example of an eigenvalue equation where the transformation
is represented in terms of a differential operator is the time-
independent Schrdinger equation in quantum mechanics:
where , the Hamiltonian, is a second-order differential operator
and , the wavefunction, is one of its eigenfunctions
corresponding to the eigenvalue , interpreted as its energy.
However, in the case where one is interested only in the bound
state solutions of the Schrdinger equation, one looks for
within the space of square integrable functions. Since this space is
a Hilbert space with a well-defined scalar product, one can
introduce a basis set in which and can be represented as
a one-dimensional array and a matrix respectively. This allows one
to represent the Schrdinger equation in a matrix form.
Bra-ket notation is often used in this context. A vector, which
represents a state of the system, in the Hilbert space of square
integrable functions is represented by . In this notation, the
Schrdinger equation is:
where is an ei genst at e of . It is a self adjoint operator,
the infinite dimensional analog of Hermitian matrices (see
Observable). As in the matrix case, in the equation above
is understood to be the vector obtained by application of
the transformation to .
In quantum mechanics, and in particular in atomic and molecular physics, within the HartreeFock theory, the atomic
and molecular orbitals can be defined by the eigenvectors of the Fock operator. The corresponding eigenvalues are
interpreted as ionization potentials via Koopmans' theorem. In this case, the term eigenvector is used in a somewhat
more general meaning, since the Fock operator is explicitly dependent on the orbitals and their eigenvalues. If one
wants to underline this aspect one speaks of nonlinear eigenvalue problem. Such equations are usually solved by an
iteration procedure, called in this case self-consistent field method. In quantum chemistry, one often represents the
HartreeFock equation in a non-orthogonal basis set. This particular representation is a generalized eigenvalue
problem called Roothaan equations.
[edit] Sc hr di nger equat i on
The wavefunctions associated with the bound
states of an electron in a hydrogen atom can be seen
as the eigenvectors of the hydrogen atom
Hamiltonian as well as of the angular momentum
operator. They are associated with eigenvalues
interpreted as their energies (increasing downward:
) and angular momentum
(increasing across: s, p, d, ...). The illustration shows
the square of the absolute value of the
wavefunctions. Brighter areas correspond to higher
probability density for a position measurement. The
center of each figure is the atomic nucleus, a proton.
[edit] Mol ec ul ar or bi t al s
In geology, especially in the study of glacial till, eigenvectors and eigenvalues are used as a method by which a
mass of information of a clast fabric's constituents' orientation and dip can be summarized in a 3-D space by six
numbers. In the field, a geologist may collect such data for hundreds or thousands of clasts in a soil sample, which
can only be compared graphically such as in a Tri-Plot (Sneed and Folk) diagram,
[22][23]
or as a Stereonet on a
Wulff Net.
[24]
The output for the orientation tensor is in the three orthogonal (perpendicular) axes of space. The three eigenvectors
are ordered by their eigenvalues ;
[25]
then is the primary orientation/dip of clast,
is the secondary and is the tertiary, in terms of strength. The clast orientation is defined as the direction of the
eigenvector, on a compass rose of 360. Dip is measured as the eigenvalue, the modulus of the tensor: this is valued
from 0 (no dip) to 90 (vertical). The relative values of , , and are dictated by the nature of the sediment's
fabric. If , the fabric is said to be isotropic. If , the fabric is said to be planar.
If , the fabric is said to be linear.
[26]
Main article: Principal component analysis
See also: Positive semidefinite matrix and Factor analysis
The eigendecomposition of a symmetric positive semidefinite (PSD) matrix
yields an orthogonal basis of eigenvectors, each of which has a
nonnegative eigenvalue. The orthogonal decomposition of a PSD matrix is
used in multivariate analysis, where the sample covariance matrices are
PSD. This orthogonal decomposition is called principal components analysis
(PCA) in statistics. PCA studies linear relations among variables. PCA is
performed on the covariance matrix or the correlation matrix (in which each
variable is scaled to have its sample variance equal to one). For the
covariance or correlation matrix, the eigenvectors correspond to principal
components and the eigenvalues to the variance explained by the principal
components. Principal component analysis of the correlation matrix
provides an orthonormal eigen-basis for the space of the observed data: In
this basis, the largest eigenvalues correspond to the principal-components
that are associated with most of the covariability among a number of
observed data.
Principal component analysis is used to study large data sets, such as
those encountered in data mining, chemical research, psychology, and in
marketing. PCA is popular especially in psychology, in the field of
psychometrics. In Q methodology, the eigenvalues of the correlation matrix
determine the Q-methodologist's judgment of practical significance (which
differs from the statistical significance of hypothesis testing): The factors with eigenvalues greater than 1.00 are
considered practically significant, that is, as explaining an important amount of the variability in the data, while
eigenvalues less than 1.00 are considered practically insignificant, as explaining only a negligible portion of the data
variability. More generally, principal component analysis can be used as a method of factor analysis in structural
equation modeling.
Main article: Vibration
Eigenvalue problems occur naturally in the vibration analysis of
mechanical structures with many degrees of freedom. The eigenvalues are
used to determine the natural frequencies (or ei genf r equenc i es) of
vibration, and the eigenvectors determine the shapes of these vibrational
modes. In particular, undamped vibration is governed by
or
[edit] Geol ogy and gl ac i ol ogy
[edit] Pr i nc i pal c omponent s anal ysi s
PCA of the multivariate Gaussian
distribution centered at with a
standard deviation of 3 in roughly the
direction and of 1 in
the orthogonal direction. The vectors shown
are unit eigenvectors of the (symmetric,
positive-semidefinite) covariance matrix
scaled by the square root of the
corresponding eigenvalue. (J ust as in the
one-dimensional case, the square root is
taken because the standard deviation is
more readily visualized than the variance.
[edit] Vi br at i on anal ysi s
1st lateral bending (See vibration for
more types of vibration)
that is, acceleration is proportional to position (i.e., we expect to be sinusoidal in time). In dimensions,
becomes a mass matrix and a stiffness matrix. Admissible solutions are then a linear combination of solutions to
the generalized eigenvalue problem
where is the eigenvalue and is the angular frequency. Note that the principal vibration modes are different
from the principal compliance modes, which are the eigenvectors of alone. Furthermore, damped vibration,
governed by
leads to what is called a so-called quadratic eigenvalue problem,
This can be reduced to a generalized eigenvalue problem by clever algebra at the cost of solving a larger system.
The orthogonality properties of the eigenvectors allows decoupling of the differential equations so that the system can
be represented as linear summation of the eigenvectors. The eigenvalue problem of complex structures is often
solved using finite element analysis, but neatly generalize the solution to scalar-valued vibration problems.
Main article: Eigenface
In image processing, processed images of faces can be seen as vectors
whose components are the brightnesses of each pixel.
[27]
The dimension of
this vector space is the number of pixels. The eigenvectors of the covariance
matrix associated with a large set of normalized pictures of faces are called
ei genf ac es ; this is an example of principal components analysis. They are
very useful for expressing any face image as a linear combination of some of
them. In the facial recognition branch of biometrics, eigenfaces provide a
means of applying data compression to faces for identification purposes.
Research related to eigen vision systems determining hand gestures has also
been made.
Similar to this concept, ei genvoi c es represent the general direction of
variability in human pronunciations of a particular utterance, such as a word in
a language. Based on a linear combination of such eigenvoices, a new voice
pronunciation of the word can be constructed. These concepts have been
found useful in automatic speech recognition systems, for speaker adaptation.
In mechanics, the eigenvectors of the moment of inertia tensor define the principal axes of a rigid body. The tensor of
moment of inertia is a key quantity required to determine the rotation of a rigid body around its center of mass.
In solid mechanics, the stress tensor is symmetric and so can be decomposed into a diagonal tensor with the
eigenvalues on the diagonal and eigenvectors as a basis. Because it is diagonal, in this orientation, the stress tensor
has no shear components; the components it does have are the principal components.
In spectral graph theory, an eigenvalue of a graph is defined as an eigenvalue of the graph's adjacency matrix , or
(increasingly) of the graph's Laplacian matrix (see also Discrete Laplace operator), which is either
(sometimes called the combinatorial Laplacian) or (sometimes called the normalized
Laplacian), where is a diagonal matrix with equal to the degree of vertex , and in , the th diagonal
entry is . The th principal eigenvector of a graph is defined as either the eigenvector corresponding to
the th largest or th smallest eigenvalue of the Laplacian. The first principal eigenvector of the graph is also
referred to merely as the principal eigenvector.
The principal eigenvector is used to measure the centrality of its vertices. An example is Google's PageRank
algorithm. The principal eigenvector of a modified adjacency matrix of the World Wide Web graph gives the page
ranks as its components. This vector corresponds to the stationary distribution of the Markov chain represented by
[edit] Ei genf ac es
Eigenfaces as examples of
eigenvectors
[edit] Tensor of moment of i ner t i a
[edit] St r ess t ensor
[edit] Ei genval ues of a gr aph
the row-normalized adjacency matrix; however, the adjacency matrix must first be modified to ensure a stationary
distribution exists. The second smallest eigenvector can be used to partition the graph into clusters, via spectral
clustering. Other methods are also available for clustering.
See Basic reproduction number
The basic reproduction number ( ) is a fundamental number in the study of how infectious diseases spread. If one
infectious person is put into a population of completely susceptible people, then is the average number of people
that one infectious person will infect. The generation time of an infection is the time, , from one person becoming
infected to the next person becoming infected. In a heterogenous population, the next generation matrix defines how
many people in the population will become infected after time has passed. is then the largest eigenvalue of
the next generation matrix.
[28][29]
Nonlinear eigenproblem
Quadratic eigenvalue problem
Introduction to eigenstates
Eigenplane
J ordan normal form
List of numerical analysis software
Antieigenvalue theory
1. ^
a

b
Wolfram Research, Inc. (2010) Eigenvector .
Accessed on 2010-01-29.
2. ^ William H. Press, Saul A. Teukolsky, William T.
Vetterling, Brian P. Flannery (2007), [[Numerical
Recipes : The Art of Scientific Computing], Chapter
11: Eigensystems., pages=563597. Third edition,
Cambridge University Press. ISBN 9780521880688
3. ^ See Korn & Korn 2000, Section 14.3.5a; Friedberg,
Insel & Spence 1989, p. 217
4. ^ Axler, Sheldon, "Ch. 5", Linear Algebra Done Right
(2nd ed.), p. 77
5. ^ Shilov 1977, p. 109
6. ^ Lemma for the eigenspace
7. ^ Schaum's Easy Outline of Linear Algebra , p. 111
8. ^ For a proof of this lemma, see Roman 2008, Theorem
8.2 on p. 186; Shilov 1977, p. 109; Hefferon 2001, p.
364; Beezer 2006, Theorem EDELI on p. 469; and
Lemma for linear independence of eigenvectors
9. ^ Strang, Gilbert (1988), Linear Algebra and Its
Applications (3rd ed.), San Diego: Harcourt
10. ^
a

b

c

d
Trefethen, Lloyd N.; Bau, David (1997),
Numerical Linear Algebra, SIAM
11. ^ See Hawkins 1975, 2
12. ^
a

b

c

d
See Hawkins 1975, 3
13. ^
a

b

c
See Kline 1972, pp. 807808
14. ^ See Kline 1972, p. 673
15. ^ See Kline 1972, pp. 715716
16. ^ See Kline 1972, pp. 706707
17. ^ See Kline 1972, p. 1063
18. ^ See Aldrich 2006
19. ^ Francis, J . G. F. (1961), "The QR Transformation, I
(part 1)", The Computer Journal 4 (3): 265271,
doi:10.1093/comjnl/4.3.265 and Francis, J . G. F.
(1962), "The QR Transformation, II (part 2)", The
USSR Computational Mathematics and Mathematical
Physics 3 : 637657. Also published in: Zhurnal
Vychislitel'noi Matematiki i Matematicheskoi Fiziki 1 (4),
1961: 555570
21. ^ See Golub & van Loan 1996, 7.3; Meyer 2000, 7.3
22. ^ Graham, D.; Midgley, N. (2000), "Graphical
representation of particle shape using triangular
diagrams: an Excel spreadsheet method", Earth Surface
Processes and Landforms 25 (13): 14731477,
doi:10.1002/1096-9837(200012)25:13<1473::AID-
ESP158>3.0.CO;2-C
23. ^ Sneed, E. D.; Folk, R. L. (1958), "Pebbles in the lower
Colorado River, Texas, a study of particle
morphogenesis", Journal of Geology 66 (2): 114150,
doi:10.1086/626490
24. ^ Knox-Robinson, C; Gardoll, Stephen J (1998), "GIS-
stereoplot: an interactive stereonet plotting module for
ArcView 3.0 geographic information system", Computers
& Geosciences 24 (3): 243, doi:10.1016/S0098-
3004(97)00122-2
25. ^ Stereo32 software
26. ^ Benn, D.; Evans, D. (2004), A Practical Guide to the
study of Glacial Sediments, London: Arnold, pp. 103
107
27. ^ Xirouhakis, A.; Votsis, G.; Delopoulus, A. (2004),
Estimation of 3D motion and structure of human
faces (PDF), Online paper in PDF format, National
Technical University of Athens
28. ^ Diekmann O, Heesterbeek J AP, Metz J AJ (1990), "On
the definition and the computation of the basic
reproduction ratio R0 in models for infectious diseases in
heterogeneous populations", Journal of Mathematical
Biology 28 (4): 365382, doi:10.1007/BF00178324 ,
PMID 2117040
29. ^ Odo Diekmann and J . A. P. Heesterbeek (2000),
[edit] Basi c r epr oduc t i on number
[edit]
See also
[edit]
Notes
Computer Journal 4 (4): 332345,
doi:10.1093/comjnl/4.4.332
20. ^ Kublanovskaya, Vera N. (1961), "On some algorithms
for the solution of the complete eigenvalue problem",

Mathematical epidemiology of infectious diseases, Wiley
series in mathematical and computational biology, West
Sussex, England: J ohn Wiley & Sons
Korn, Granino A.; Korn, Theresa M. (2000), "Mathematical Handbook for Scientists and Engineers: Definitions,
Theorems, and Formulas for Reference and Review", New York: McGraw-Hill (1152 p., Dover Publications, 2
Revised edition), Bibcode:1968mhse.book.....K , ISBN 0-486-41147-8.
Lipschutz, Seymour (1991), Schaum's outline of theory and problems of linear algebra, Schaum's outline series
(2nd ed.), New York, NY: McGraw-Hill Companies, ISBN 0-07-038007-4.
Friedberg, Stephen H.; Insel, Arnold J .; Spence, Lawrence E. (1989), Linear algebra (2nd ed.), Englewood Cliffs,
NJ 07632: Prentice Hall, ISBN 0-13-537102-3.
Aldrich, J ohn (2006), "Eigenvalue, eigenfunction, eigenvector, and related terms" , in J eff Miller (Editor), Earliest
Known Uses of Some of the Words of Mathematics , retrieved 2006-08-22
Strang, Gilbert (1993), Introduction to linear algebra, Wellesley-Cambridge Press, Wellesley, MA, ISBN 0-
9614088-5-5.
Strang, Gilbert (2006), Linear algebra and its applications, Thomson, Brooks/Cole, Belmont, CA, ISBN 0-03-
010567-6.
Bowen, Ray M.; Wang, Chao-Cheng (1980), Linear and multilinear algebra, Plenum Press, New York, NY,
ISBN 0-306-37508-7.
Cohen-Tannoudji, Claude (1977), "Chapter II. The mathematical tools of quantum mechanics", Quantum
mechanics, J ohn Wiley & Sons, ISBN 0-471-16432-1.
Fraleigh, J ohn B.; Beauregard, Raymond A. (1995), Linear algebra (3rd ed.), Addison-Wesley Publishing
Company, ISBN 0-201-83999-7 (international edition).
Golub, Gene H.; Van Loan, Charles F. (1996), Matrix computations (3rd Edition), J ohns Hopkins University Press,
Baltimore, MD, ISBN 978-0-8018-5414-9.
Hawkins, T. (1975), "Cauchy and the spectral theory of matrices", Historia Mathematica 2: 129,
doi:10.1016/0315-0860(75)90032-4 .
Horn, Roger A.; J ohnson, Charles F. (1985), Matrix analysis, Cambridge University Press, ISBN 0-521-30586-1
(hardback), ISBN 0-521-38632-2 (paperback) Check | i sbn= value (help).
Kline, Morris (1972), Mathematical thought from ancient to modern times, Oxford University Press, ISBN 0-19-
501496-0.
Meyer, Carl D. (2000), Matrix analysis and applied linear algebra, Society for Industrial and Applied Mathematics
(SIAM), Philadelphia, ISBN 978-0-89871-454-8.
Brown, Maureen (October 2004), Illuminating Patterns of Perception: An Overview of Q Methodology.
Golub, Gene F.; van der Vorst, Henk A. (2000), "Eigenvalue computation in the 20th century", Journal of
Computational and Applied Mathematics 123 : 3565, doi:10.1016/S0377-0427(00)00413-1 .
Akivis, Max A.; Vladislav V. Goldberg (1969), Tensor calculus, Russian, Science Publishers, Moscow.
Gelfand, I. M. (1971), Lecture notes in linear algebra, Russian, Science Publishers, Moscow.
Alexandrov, Pavel S. (1968), Lecture notes in analytical geometry, Russian, Science Publishers, Moscow.
Carter, Tamara A.; Tapia, Richard A.; Papaconstantinou, Anne, Linear Algebra: An Introduction to Linear Algebra
for Pre-Calculus Students , Rice University, Online Edition, retrieved 2008-02-19.
Roman, Steven (2008), Advanced linear algebra (3rd ed.), New York, NY: Springer Science +Business Media,
LLC, ISBN 978-0-387-72828-5.
Shilov, Georgi E. (1977), Linear algebra (translated and edited by Richard A. Silverman ed.), New York: Dover
Publications, ISBN 0-486-63518-X.
Hefferon, J im (2001), Linear Algebra , Online book, St Michael's College, Colchester, Vermont, USA.
Kuttler, Kenneth (2007), An introduction to linear algebra (PDF), Online e-book in PDF format, Brigham Young
University.
Demmel, J ames W. (1997), Applied numerical linear algebra, SIAM, ISBN 0-89871-389-7.
Beezer, Robert A. (2006), A first course in linear algebra , Free online book under GNU licence, University of
Puget Sound.
Lancaster, P. (1973), Matrix theory, Russian, Moscow, Russia: Science Publishers.
Halmos, Paul R. (1987), Finite-dimensional vector spaces (8th ed.), New York, NY: Springer-Verlag, ISBN 0-387-
[edit]
References
[show] V T E
[hide] V T E
The Wikibook Linear Algebra
has a page on the topic of:
Eigenvalues and
Eigenvectors
The Wikibook The Book of
Mathematical Proofs has a page
on the topic of: Algebra/Linear
Transformations
90093-4.
Pigolkina, T. S. and Shulman, V. S., Eigenvalue (in Russian), In:Vinogradov, I. M. (Ed.), Mathematical
Encyclopedia, Vol. 5, Soviet Encyclopedia, Moscow, 1977.
Greub, Werner H. (1975), Linear Algebra (4th Edition), Springer-Verlag, New York, NY, ISBN 0-387-90110-8.
Larson, Ron; Edwards, Bruce H. (2003), Elementary linear algebra (5th ed.), Houghton Mifflin Company, ISBN 0-
618-33567-6.
Curtis, Charles W., Linear Algebra: An Introductory Approach, 347 p., Springer; 4th ed. 1984. Corr. 7th printing
edition (August 19, 1999), ISBN 0-387-90992-3.
Shores, Thomas S. (2007), Applied linear algebra and matrix analysis, Springer Science+Business Media, LLC,
ISBN 0-387-33194-8.
Sharipov, Ruslan A. (1996), Course of Linear Algebra and Multidimensional Geometry: the textbook,
arXiv:math/0405323 , ISBN 5-7477-0099-5.
Gohberg, Israel; Lancaster, Peter; Rodman, Leiba (2005), Indefinite linear algebra and applications, Basel-Boston-
Berlin: Birkhuser Verlag, ISBN 3-7643-7349-0.
What are Eigen Values? non-technical introduction from
PhysLink.com's "Ask the Experts"
Eigen Values and Eigen Vectors Numerical Examples Tutorial and
Interactive Program from Revoledu.
Introduction to Eigen Vectors and Eigen Values lecture from Khan
Academy
Hill, Roger (2009). " Eigenvalues" . Sixty Symbols. Brady Haran for
the University of Nottingham.
Theor y
Hazewinkel, Michiel, ed. (2001), "Eigen value" , Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-
010-4
Hazewinkel, Michiel, ed. (2001), "Eigen vector" , Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-
010-4
Eigenvalue (of a matrix) , PlanetMath.org.
Eigenvector Wolfram MathWorld
Eigen Vector Examination working applet
Same Eigen Vector Examination as above in a Flash demo with sound
Computation of Eigenvalues
Numerical solution of eigenvalue problems Edited by Zhaojun Bai, J ames Demmel, J ack Dongarra, Axel Ruhe,
and Henk van der Vorst
Eigenvalues and Eigenvectors on the Ask Dr. Math forums: [1] , [2]
Onl i ne c al c ul at or s
arndt-bruenner.de
bluebit.gr
wims.unice.fr
Demonst r at i on appl et s
J ava applet about eigenvectors in the real plane
Topi c s r el at ed t o l i near al gebr a
Ar eas of mat hemat i c s
Ar eas
Arithmetic Algebra (elementary linear multilinear abstract Geometry (discrete algebraic differential finite
Trigonometry Calculus/Analysis Functional analysis Set theory Logic Category theory Number theory
Combinatorics Graph theory Topology Lie theory Differential equations/Dynamical systems
Mathematical physics Numerical analysis Computation Information theory Probability Mathematical statistics
Mathematical optimization Control theory Game theory Representation theory
Di vi si ons Pure mathematics Applied mathematics Discrete mathematics Computational mathematics
Cat egor y Mathematics portal Outline Lists
[edit]
External links
Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view
This page was last modified on 3 May 2013 at 17:49.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the
Terms of Use and Privacy Policy.
Wikipediais a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
Categories: Mathematical physics Abstract algebra Linear algebra Matrix theory
Singular value decomposition

A Tutorial On Principal Component Analysis

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

A Tutorial On Principal Component Analysis

Caricato da

Copyright:

Formati disponibili

A Tutorial on Principal Component Analysis

Center for Neural Science, New York University

Electronic address: shlens@salk.edu

where each camera contributes a 2-dimensional projection of

X. If we record the balls

We can note the form of each column of Y.

We recognize that each coefcient of y

One interpretation of X is the following. Each row of X corre-

where we have appended an additional (mr) and (nr) or-

Potrebbero piacerti anche