0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

38 visualizzazioni29 paginePrincipal Component Analysis, including examples and background

Sep 10, 2014

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

Principal Component Analysis, including examples and background

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

38 visualizzazioni29 paginePrincipal Component Analysis, including examples and background

© All Rights Reserved

Sei sulla pagina 1di 29

Jonathon Shlens

New York City, NY 10003-6603 and

Systems Neurobiology Laboratory, Salk Insitute for Biological Studies

La Jolla, CA 92037

(Dated: April 22, 2009; Version 3.01)

Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used

but (sometimes) poorly understood. The goal of this paper is to dispel the magic behind this black box. This

manuscript focuses on building a solid intuition for how and why principal component analysis works. This

manuscript crystallizes this knowledge by deriving from simple intuitions, the mathematics behind PCA. This

tutorial does not shy away from explaining the ideas informally, nor does it shy away from the mathematics. The

hope is that by addressing both aspects, readers of all levels will be able to gain a better understanding of PCA as

well as the when, the how and the why of applying this technique.

I. INTRODUCTION

Principal component analysis (PCA) is a standard tool in mod-

ern data analysis - in diverse elds from neuroscience to com-

puter graphics - because it is a simple, non-parametric method

for extracting relevant information from confusing data sets.

With minimal effort PCA provides a roadmap for how to re-

duce a complex data set to a lower dimension to reveal the

sometimes hidden, simplied structures that often underlie it.

The goal of this tutorial is to provide both an intuitive feel for

PCA, and a thorough discussion of this topic. We will begin

with a simple example and provide an intuitive explanation

of the goal of PCA. We will continue by adding mathemati-

cal rigor to place it within the framework of linear algebra to

provide an explicit solution. We will see how and why PCA

is intimately related to the mathematical technique of singular

value decomposition (SVD). This understanding will lead us

to a prescription for howto apply PCAin the real world and an

appreciation for the underlying assumptions. My hope is that

a thorough understanding of PCA provides a foundation for

approaching the elds of machine learning and dimensional

reduction.

The discussion and explanations in this paper are informal in

the spirit of a tutorial. The goal of this paper is to educate.

Occasionally, rigorous mathematical proofs are necessary al-

though relegated to the Appendix. Although not as vital to the

tutorial, the proofs are presented for the adventurous reader

who desires a more complete understanding of the math. My

only assumption is that the reader has a working knowledge

of linear algebra. My goal is to provide a thorough discussion

by largely building on ideas from linear algebra and avoiding

challenging topics in statistics and optimization theory (but

see Discussion). Please feel free to contact me with any sug-

gestions, corrections or comments.

II. MOTIVATION: A TOY EXAMPLE

Here is the perspective: we are an experimenter. We are trying

to understand some phenomenon by measuring various quan-

tities (e.g. spectra, voltages, velocities, etc.) in our system.

Unfortunately, we can not gure out what is happening be-

cause the data appears clouded, unclear and even redundant.

This is not a trivial problem, but rather a fundamental obstacle

in empirical science. Examples abound from complex sys-

tems such as neuroscience, web indexing, meteorology and

oceanography - the number of variables to measure can be

unwieldy and at times even deceptive, because the underlying

relationships can often be quite simple.

Take for example a simple toy problem from physics dia-

grammed in Figure 1. Pretend we are studying the motion

of the physicists ideal spring. This system consists of a ball

of mass m attached to a massless, frictionless spring. The ball

is released a small distance away from equilibrium (i.e. the

spring is stretched). Because the spring is ideal, it oscillates

indenitely along the x-axis about its equilibrium at a set fre-

quency.

This is a standard problem in physics in which the motion

along the x direction is solved by an explicit function of time.

In other words, the underlying dynamics can be expressed as

a function of a single variable x.

However, being ignorant experimenters we do not know any

of this. We do not know which, let alone how many, axes

and dimensions are important to measure. Thus, we decide to

measure the balls position in a three-dimensional space (since

we live in a three dimensional world). Specically, we place

three movie cameras around our system of interest. At 120 Hz

each movie camera records an image indicating a two dimen-

sional position of the ball (a projection). Unfortunately, be-

cause of our ignorance, we do not even know what are the real

x, y and z axes, so we choose three camera positionsa,

b andc

at some arbitrary angles with respect to the system. The angles

between our measurements might not even be 90

o

! Now, we

2

camera A camera B camera C

FIG. 1 A toy example. The position of a ball attached to an oscillat-

ing spring is recorded using three cameras A, B and C. The position

of the ball tracked by each camera is depicted in each panel below.

record with the cameras for several minutes. The big question

remains: how do we get from this data set to a simple equation

of x?

We know a-priori that if we were smart experimenters, we

would have just measured the position along the x-axis with

one camera. But this is not what happens in the real world.

We often do not know which measurements best reect the

dynamics of our system in question. Furthermore, we some-

times record more dimensions than we actually need.

Also, we have to deal with that pesky, real-world problem of

noise. In the toy example this means that we need to deal

with air, imperfect cameras or even friction in a less-than-ideal

spring. Noise contaminates our data set only serving to obfus-

cate the dynamics further. This toy example is the challenge

experimenters face everyday. Keep this example in mind as

we delve further into abstract concepts. Hopefully, by the end

of this paper we will have a good understanding of how to

systematically extract x using principal component analysis.

III. FRAMEWORK: CHANGE OF BASIS

The goal of principal component analysis is to identify the

most meaningful basis to re-express a data set. The hope is

that this new basis will lter out the noise and reveal hidden

structure. In the example of the spring, the explicit goal of

PCA is to determine: the dynamics are along the x-axis. In

other words, the goal of PCA is to determine that x, i.e. the

unit basis vector along the x-axis, is the important dimension.

Determining this fact allows an experimenter to discern which

dynamics are important, redundant or noise.

A. A Naive Basis

With a more precise denition of our goal, we need a more

precise denition of our data as well. We treat every time

sample (or experimental trial) as an individual sample in our

data set. At each time sample we record a set of data consist-

ing of multiple measurements (e.g. voltage, position, etc.). In

our data set, at one point in time, camera A records a corre-

sponding ball position (x

A

, y

A

). One sample or trial can then

be expressed as a 6 dimensional column vector

X =

x

A

y

A

x

B

y

B

x

C

y

C

the balls position to the entire vector

position for 10 minutes at 120 Hz, then we have recorded 10

60120 = 72000 of these vectors.

With this concrete example, let us recast this problem in ab-

stract terms. Each sample

X is an m-dimensional vector,

where m is the number of measurement types. Equivalently,

every sample is a vector that lies in an m-dimensional vec-

tor space spanned by some orthonormal basis. From linear

algebra we know that all measurement vectors form a linear

combination of this set of unit length basis vectors. What is

this orthonormal basis?

This question is usually a tacit assumption often overlooked.

Pretend we gathered our toy example data above, but only

looked at camera A. What is an orthonormal basis for (x

A

, y

A

)?

A naive choice would be {(1, 0), (0, 1)}, but why select this

basis over {(

2

2

,

2

2

), (

2

2

,

2

2

)} or any other arbitrary rota-

tion? The reason is that the naive basis reects the method we

gathered the data. Pretend we record the position (2, 2). We

did not record 2

2 in the (

2

2

,

2

2

) direction and 0 in the per-

pendicular direction. Rather, we recorded the position (2, 2)

on our camera meaning 2 units up and 2 units to the left in our

camera window. Thus our original basis reects the method

we measured our data.

How do we express this naive basis in linear algebra? In the

two dimensional case, {(1, 0), (0, 1)} can be recast as individ-

ual row vectors. A matrix constructed out of these row vectors

is the 22 identity matrix I. We can generalize this to the m-

dimensional case by constructing an mm identity matrix

B =

b

1

b

2

.

.

.

b

m

1 0 0

0 1 0

.

.

.

.

.

.

.

.

.

.

.

.

0 0 1

= I

3

where each row is an orthornormal basis vector b

i

with m

components. We can consider our naive basis as the effective

starting point. All of our data has been recorded in this basis

and thus it can be trivially expressed as a linear combination

of {b

i

}.

B. Change of Basis

With this rigor we may now state more precisely what PCA

asks: Is there another basis, which is a linear combination of

the original basis, that best re-expresses our data set?

A close reader might have noticed the conspicuous addition of

the word linear. Indeed, PCA makes one stringent but power-

ful assumption: linearity. Linearity vastly simplies the prob-

lem by restricting the set of potential bases. With this assump-

tion PCA is now limited to re-expressing the data as a linear

combination of its basis vectors.

Let X be the original data set, where each column is a single

sample (or moment in time) of our data set (i.e.

X). In the toy

example X is an mn matrix where m = 6 and n = 72000.

Let Y be another mn matrix related by a linear transfor-

mation P. X is the original recorded data set and Y is a new

representation of that data set.

PX = Y (1)

Also let us dene the following quantities.

1

p

i

are the rows of P

x

i

are the columns of X (or individual

X).

y

i

are the columns of Y.

Equation 1 represents a change of basis and thus can have

many interpretations.

1. P is a matrix that transforms X into Y.

2. Geometrically, P is a rotation and a stretch which again

transforms X into Y.

3. The rows of P, {p

1

, . . . , p

m

}, are a set of new basis vec-

tors for expressing the columns of X.

The latter interpretation is not obvious but can be seen by writ-

1

In this section x

i

and y

i

are column vectors, but be forewarned. In all other

sections x

i

and y

i

are row vectors.

ing out the explicit dot products of PX.

PX =

p

1

.

.

.

p

m

x

1

x

n

Y =

p

1

x

1

p

1

x

n

.

.

.

.

.

.

.

.

.

p

m

x

1

p

m

x

n

y

i

=

p

1

x

i

.

.

.

p

m

x

i

i

is a dot-product of

x

i

with the corresponding row in P. In other words, the j

th

coefcient of y

i

is a projection on to the j

th

row of P. This is

in fact the very form of an equation where y

i

is a projection

on to the basis of {p

1

, . . . , p

m

}. Therefore, the rows of P are a

new set of basis vectors for representing of columns of X.

C. Questions Remaining

By assuming linearity the problem reduces to nding the ap-

propriate change of basis. The row vectors {p

1

, . . . , p

m

} in

this transformation will become the principal components of

X. Several questions now arise.

What is the best way to re-express X?

What is a good choice of basis P?

These questions must be answered by next asking ourselves

what features we would like Y to exhibit. Evidently, addi-

tional assumptions beyond linearity are required to arrive at

a reasonable result. The selection of these assumptions is the

subject of the next section.

IV. VARIANCE AND THE GOAL

Now comes the most important question: what does best ex-

press the data mean? This section will build up an intuitive

answer to this question and along the way tack on additional

assumptions.

A. Noise and Rotation

Measurement noise in any data set must be low or else, no

matter the analysis technique, no information about a signal

4

!

2

signal

!

2

noise

x

y

FIG. 2 Simulated data of (x, y) for camera A. The signal and noise

variances

2

signal

and

2

noise

are graphically represented by the two

lines subtending the cloud of data. Note that the largest direction

of variance does not lie along the basis of the recording (x

A

, y

A

) but

rather along the best-t line.

can be extracted. There exists no absolute scale for noise but

rather all noise is quantied relative to the signal strength. A

common measure is the signal-to-noise ratio (SNR), or a ratio

of variances

2

,

SNR =

2

signal

2

noise

.

A high SNR (1) indicates a high precision measurement,

while a low SNR indicates very noisy data.

Lets take a closer examination of the data from camera

A in Figure 2. Remembering that the spring travels in a

straight line, every individual camera should record motion in

a straight line as well. Therefore, any spread deviating from

straight-line motion is noise. The variance due to the signal

and noise are indicated by each line in the diagram. The ratio

of the two lengths measures how skinny the cloud is: possibil-

ities include a thin line (SNR 1), a circle (SNR =1) or even

worse. By positing reasonably good measurements, quantita-

tively we assume that directions with largest variances in our

measurement space contain the dynamics of interest. In Fig-

ure 2 the direction with the largest variance is not x

A

= (1, 0)

nor y

A

= (0, 1), but the direction along the long axis of the

cloud. Thus, by assumption the dynamics of interest exist

along directions with largest variance and presumably high-

est SNR.

Our assumption suggests that the basis for which we are

searching is not the naive basis because these directions (i.e.

(x

A

, y

A

)) do not correspond to the directions of largest vari-

ance. Maximizing the variance (and by assumption the SNR)

corresponds to nding the appropriate rotation of the naive

basis. This intuition corresponds to nding the direction indi-

cated by the line

2

signal

in Figure 2. In the 2-dimensional case

of Figure 2 the direction of largest variance corresponds to the

best-t line for the data cloud. Thus, rotating the naive basis

to lie parallel to the best-t line would reveal the direction of

motion of the spring for the 2-D case. How do we generalize

this notion to an arbitrary number of dimensions? Before we

approach this question we need to examine this issue from a

second perspective.

low redundancy high redundancy

r

1

r

2

r

1

r

2

r

1

r

2

FIG. 3 A spectrum of possible redundancies in data from the two

separate measurements r

1

and r

2

. The two measurements on the

left are uncorrelated because one can not predict one from the other.

Conversely, the two measurements on the right are highly correlated

indicating highly redundant measurements.

B. Redundancy

Figure 2 hints at an additional confounding factor in our data

- redundancy. This issue is particularly evident in the example

of the spring. In this case multiple sensors record the same

dynamic information. Reexamine Figure 2 and ask whether

it was really necessary to record 2 variables. Figure 3 might

reect a range of possibile plots between two arbitrary mea-

surement types r

1

and r

2

. The left-hand panel depicts two

recordings with no apparent relationship. Because one can not

predict r

1

from r

2

, one says that r

1

and r

2

are uncorrelated.

On the other extreme, the right-hand panel of Figure 3 de-

picts highly correlated recordings. This extremity might be

achieved by several means:

A plot of (x

A

, x

B

) if cameras A and B are very nearby.

A plot of (x

A

, x

A

) where x

A

is in meters and x

A

is in

inches.

Clearly in the right panel of Figure 3 it would be more mean-

ingful to just have recorded a single variable, not both. Why?

Because one can calculate r

1

from r

2

(or vice versa) using the

best-t line. Recording solely one response would express the

data more concisely and reduce the number of sensor record-

ings (2 1 variables). Indeed, this is the central idea behind

dimensional reduction.

C. Covariance Matrix

In a 2 variable case it is simple to identify redundant cases by

nding the slope of the best-t line and judging the quality of

the t. How do we quantify and generalize these notions to

arbitrarily higher dimensions? Consider two sets of measure-

ments with zero means

A ={a

1

, a

2

, . . . , a

n

} , B ={b

1

, b

2

, . . . , b

n

}

5

where the subscript denotes the sample number. The variance

of A and B are individually dened as,

2

A

=

1

n

i

a

2

i

,

2

B

=

1

n

i

b

2

i

The covariance between A and B is a straight-forward gener-

alization.

covariance o f A and B

2

AB

=

1

n

i

a

i

b

i

The covariance measures the degree of the linear relationship

between two variables. A large positive value indicates pos-

itively correlated data. Likewise, a large negative value de-

notes negatively correlated data. The absolute magnitude of

the covariance measures the degree of redundancy. Some ad-

ditional facts about the covariance.

AB

is zero if and only if A and B are uncorrelated (e.g.

Figure 2, left panel).

2

AB

=

2

A

if A = B.

We can equivalently convert A and B into corresponding row

vectors.

a = [a

1

a

2

. . . a

n

]

b = [b

1

b

2

. . . b

n

]

so that we may express the covariance as a dot product matrix

computation.

2

2

ab

1

n

ab

T

(2)

Finally, we can generalize from two vectors to an arbitrary

number. Rename the row vectors a and b as x

1

and x

2

, respec-

tively, and consider additional indexed row vectors x

3

, . . . , x

m

.

Dene a new mn matrix X.

X =

x

1

.

.

.

x

m

sponds to all measurements of a particular type. Each column

of X corresponds to a set of measurements from one particular

trial (this is

X from section 3.1). We now arrive at a denition

for the covariance matrix C

X

.

C

X

1

n

XX

T

.

2

Note that in practice, the covariance

2

AB

is calculated as

1

n1

i

a

i

b

i

. The

slight change in normalization constant arises from estimation theory, but

that is beyond the scope of this tutorial.

Consider the matrix C

X

=

1

n

XX

T

. The i j

th

element of C

X

is the dot product between the vector of the i

th

measurement

type with the vector of the j

th

measurement type. We can

summarize several properties of C

X

:

C

X

is a square symmetric mm matrix (Theorem 2 of

Appendix A)

The diagonal terms of C

X

are the variance of particular

measurement types.

The off-diagonal terms of C

X

are the covariance be-

tween measurement types.

C

X

captures the covariance between all possible pairs of mea-

surements. The covariance values reect the noise and redun-

dancy in our measurements.

In the diagonal terms, by assumption, large values cor-

respond to interesting structure.

In the off-diagonal terms large magnitudes correspond

to high redundancy.

Pretend we have the option of manipulating C

X

. We will sug-

gestively dene our manipulated covariance matrix C

Y

. What

features do we want to optimize in C

Y

?

D. Diagonalize the Covariance Matrix

We can summarize the last two sections by stating that our

goals are (1) to minimize redundancy, measured by the mag-

nitude of the covariance, and (2) maximize the signal, mea-

sured by the variance. What would the optimized covariance

matrix C

Y

look like?

All off-diagonal terms in C

Y

should be zero. Thus, C

Y

must be a diagonal matrix. Or, said another way, Y is

decorrelated.

Each successive dimension in Y should be rank-ordered

according to variance.

There are many methods for diagonalizing C

Y

. It is curious to

note that PCA arguably selects the easiest method: PCA as-

sumes that all basis vectors {p

1

, . . . , p

m

} are orthonormal, i.e.

P is an orthonormal matrix. Why is this assumption easiest?

Envision how PCA works. In our simple example in Figure 2,

P acts as a generalized rotation to align a basis with the axis

of maximal variance. In multiple dimensions this could be

performed by a simple algorithm:

1. Select a normalized direction in m-dimensional space

along which the variance in X is maximized. Save this

vector as p

1

.

6

2. Find another direction along which variance is maxi-

mized, however, because of the orthonormality condi-

tion, restrict the search to all directions orthogonal to

all previous selected directions. Save this vector as p

i

3. Repeat this procedure until m vectors are selected.

The resulting ordered set of ps are the principal components.

In principle this simple algorithm works, however that would

bely the true reason why the orthonormality assumption is ju-

dicious. The true benet to this assumption is that there exists

an efcient, analytical solution to this problem. We will dis-

cuss two solutions in the following sections.

Notice what we gained with the stipulation of rank-ordered

variance. We have a method for judging the importance of

the principal direction. Namely, the variances associated with

each direction p

i

quantify how principal each direction is

by rank-ordering each basis vector p

i

according to the corre-

sponding variances.We will now pause to review the implica-

tions of all the assumptions made to arrive at this mathemati-

cal goal.

E. Summary of Assumptions

This section provides a summary of the assumptions be-

hind PCA and hint at when these assumptions might perform

poorly.

I. Linearity

Linearity frames the problem as a change of ba-

sis. Several areas of research have explored how

extending these notions to nonlinear regimes (see

Discussion).

II. Large variances have important structure.

This assumption also encompasses the belief that

the data has a high SNR. Hence, principal compo-

nents with larger associated variances represent

interesting structure, while those with lower vari-

ances represent noise. Note that this is a strong,

and sometimes, incorrect assumption (see Dis-

cussion).

III. The principal components are orthogonal.

This assumption provides an intuitive simplica-

tion that makes PCA soluble with linear algebra

decomposition techniques. These techniques are

highlighted in the two following sections.

We have discussed all aspects of deriving PCA - what remain

are the linear algebra solutions. The rst solution is some-

what straightforward while the second solution involves un-

derstanding an important algebraic decomposition.

V. SOLVING PCA USING EIGENVECTOR DECOMPOSITION

We derive our rst algebraic solution to PCA based on an im-

portant property of eigenvector decomposition. Once again,

the data set is X, an mn matrix, where m is the number of

measurement types and n is the number of samples. The goal

is summarized as follows.

Find some orthonormal matrix P in Y = PX such

that C

Y

1

n

YY

T

is a diagonal matrix. The rows

of P are the principal components of X.

We begin by rewriting C

Y

in terms of the unknown variable.

C

Y

=

1

n

YY

T

=

1

n

(PX)(PX)

T

=

1

n

PXX

T

P

T

= P(

1

n

XX

T

)P

T

C

Y

= PC

X

P

T

Note that we have identied the covariance matrix of X in the

last line.

Our plan is to recognize that any symmetric matrix A is diag-

onalized by an orthogonal matrix of its eigenvectors (by The-

orems 3 and 4 from Appendix A). For a symmetric matrix A

Theorem 4 provides A=EDE

T

, where D is a diagonal matrix

and E is a matrix of eigenvectors of A arranged as columns.

3

Now comes the trick. We select the matrix P to be a matrix

where each row p

i

is an eigenvector of

1

n

XX

T

. By this selec-

tion, P E

T

. With this relation and Theorem 1 of Appendix

A (P

1

= P

T

) we can nish evaluating C

Y

.

C

Y

= PC

X

P

T

= P(E

T

DE)P

T

= P(P

T

DP)P

T

= (PP

T

)D(PP

T

)

= (PP

1

)D(PP

1

)

C

Y

= D

It is evident that the choice of P diagonalizes C

Y

. This was

the goal for PCA. We can summarize the results of PCA in the

matrices P and C

Y

.

3

The matrix A might have r m orthonormal eigenvectors where r is the

rank of the matrix. When the rank of A is less than m, A is degenerate or all

data occupy a subspace of dimension r m. Maintaining the constraint of

orthogonality, we can remedy this situation by selecting (mr) additional

orthonormal vectors to ll up the matrix E. These additional vectors

do not effect the nal solution because the variances associated with these

directions are zero.

7

The principal components of X are the eigenvectors of

C

X

=

1

n

XX

T

.

The i

th

diagonal value of C

Y

is the variance of X along

p

i

.

In practice computing PCA of a data set X entails (1) subtract-

ing off the mean of each measurement type and (2) computing

the eigenvectors of C

X

. This solution is demonstrated in Mat-

lab code included in Appendix B.

VI. A MORE GENERAL SOLUTION USING SVD

This section is the most mathematically involved and can be

skipped without much loss of continuity. It is presented solely

for completeness. We derive another algebraic solution for

PCA and in the process, nd that PCA is closely related to

singular value decomposition (SVD). In fact, the two are so

intimately related that the names are often used interchange-

ably. What we will see though is that SVD is a more general

method of understanding change of basis.

We begin by quickly deriving the decomposition. In the fol-

lowing section we interpret the decomposition and in the last

section we relate these results to PCA.

A. Singular Value Decomposition

Let X be an arbitrary n m matrix

4

and X

T

X be a rank r,

square, symmetric mm matrix. In a seemingly unmotivated

fashion, let us dene all of the quantities of interest.

{ v

1

, v

2

, . . . , v

r

} is the set of orthonormal m1 eigen-

vectors with associated eigenvalues {

1

,

2

, . . . ,

r

} for

the symmetric matrix X

T

X.

(X

T

X) v

i

=

i

v

i

i

i

are positive real and termed the singular val-

ues.

{ u

1

, u

2

, . . . , u

r

} is the set of n 1 vectors dened by

u

i

i

X v

i

.

The nal denition includes two new and unexpected proper-

ties.

u

i

u

j

=

1 if i = j

0 otherwise

4

Notice that in this section only we are reversing convention from mn to

nm. The reason for this derivation will become clear in section 6.3.

X v

i

=

i

These properties are both proven in Theorem 5. We now have

all of the pieces to construct the decomposition. The scalar

version of singular value decomposition is just a restatement

of the third denition.

X v

i

=

i

u

i

(3)

This result says a quite a bit. X multiplied by an eigen-

vector of X

T

X is equal to a scalar times another vector.

The set of eigenvectors { v

1

, v

2

, . . . , v

r

} and the set of vec-

tors { u

1

, u

2

, . . . , u

r

} are both orthonormal sets or bases in r-

dimensional space.

We can summarize this result for all vectors in one matrix

multiplication by following the prescribed construction in Fig-

ure 4. We start by constructing a new diagonal matrix .

1

.

.

.

0

r

0

0

.

.

.

0

where

2

. . .

r

are the rank-ordered set of singu-

lar values. Likewise we construct accompanying orthogonal

matrices,

V =

v

1

v

2

. . . v

m

U =

u

1

u

2

. . . u

n

thonormal vectors to ll up the matrices for V and U respec-

tively (i.e. to deal with degeneracy issues). Figure 4 provides

a graphical representation of how all of the pieces t together

to form the matrix version of SVD.

XV = U

where each column of V and U perform the scalar version of

the decomposition (Equation 3). Because V is orthogonal, we

can multiply both sides by V

1

=V

T

to arrive at the nal form

of the decomposition.

X = UV

T

(4)

Although derived without motivation, this decomposition is

quite powerful. Equation 4 states that any arbitrary matrix X

can be converted to an orthogonal matrix, a diagonal matrix

and another orthogonal matrix (or a rotation, a stretch and a

second rotation). Making sense of Equation 4 is the subject of

the next section.

B. Interpreting SVD

The nal form of SVD is a concise but thick statement. In-

stead let us reinterpret Equation 3 as

Xa = kb

8

The scalar form of SVD is expressed in equation 3.

X v

i

=

i

u

i

The mathematical intuition behind the construction of the matrix form is that we want to express all n scalar equations in just one

equation. It is easiest to understand this process graphically. Drawing the matrices of equation 3 looks likes the following.

We can construct three new matrices V, U and . All singular values are rst rank-ordered

2

. . .

r

, and the corre-

sponding vectors are indexed in the same rank order. Each pair of associated vectors v

i

and u

i

is stacked in the i

th

column along

their respective matrices. The corresponding singular value

i

is placed along the diagonal (the ii

th

position) of . This generates

the equation XV = U, which looks like the following.

The matrices V and U are mm and n n matrices respectively and is a diagonal matrix with a few non-zero values (repre-

sented by the checkerboard) along its diagonal. Solving this single matrix equation solves all n value form equations.

FIG. 4 Construction of the matrix form of SVD (Equation 4) from the scalar form (Equation 3).

where a and b are column vectors and k is a scalar con-

stant. The set { v

1

, v

2

, . . . , v

m

} is analogous to a and the set

{ u

1

, u

2

, . . . , u

n

} is analogous to b. What is unique though is

that { v

1

, v

2

, . . . , v

m

} and { u

1

, u

2

, . . . , u

n

} are orthonormal sets

of vectors which span an m or n dimensional space, respec-

tively. In particular, loosely speaking these sets appear to span

all possible inputs (i.e. a) and outputs (i.e. b). Can we

formalize the view that { v

1

, v

2

, . . . , v

n

} and { u

1

, u

2

, . . . , u

n

}

span all possible inputs and outputs?

We can manipulate Equation 4 to make this fuzzy hypothesis

more precise.

X = UV

T

U

T

X = V

T

U

T

X = Z

where we have dened Z V

T

. Note that the previous

columns { u

1

, u

2

, . . . , u

n

} are now rows in U

T

. Comparing this

equation to Equation 1, { u

1

, u

2

, . . . , u

n

} perform the same role

as { p

1

, p

2

, . . . , p

m

}. Hence, U

T

is a change of basis from X to

Z. Just as before, we were transforming column vectors, we

can again infer that we are transforming column vectors. The

fact that the orthonormal basis U

T

(or P) transforms column

vectors means that U

T

is a basis that spans the columns of X.

Bases that span the columns are termed the column space of

X. The column space formalizes the notion of what are the

possible outputs of any matrix.

There is a funny symmetry to SVD such that we can dene a

similar quantity - the row space.

XV = U

(XV)

T

= (U)

T

V

T

X

T

= U

T

V

T

X

T

= Z

where we have dened Z U

T

. Again the rows of V

T

(or

the columns of V) are an orthonormal basis for transforming

X

T

into Z. Because of the transpose on X, it follows that V

is an orthonormal basis spanning the row space of X. The

row space likewise formalizes the notion of what are possible

inputs into an arbitrary matrix.

We are only scratching the surface for understanding the full

implications of SVD. For the purposes of this tutorial though,

we have enough information to understand how PCA will fall

within this framework.

C. SVD and PCA

It is evident that PCA and SVD are intimately related. Let us

return to the original mn data matrix X. We can dene a

9

Quick Summary of PCA

1. Organize data as an mn matrix, where m is the number

of measurement types and n is the number of samples.

2. Subtract off the mean for each measurement type.

3. Calculate the SVD or the eigenvectors of the covariance.

FIG. 5 A step-by-step instruction list on how to perform principal

component analysis

new matrix Y as an nm matrix.

5

Y

1

n

X

T

where each column of Y has zero mean. The choice of Y

becomes clear by analyzing Y

T

Y.

Y

T

Y =

n

X

T

n

X

T

=

1

n

XX

T

Y

T

Y = C

X

By construction Y

T

Yequals the covariance matrix of X. From

section 5 we know that the principal components of X are

the eigenvectors of C

X

. If we calculate the SVD of Y, the

columns of matrix V contain the eigenvectors of Y

T

Y = C

X

.

Therefore, the columns of V are the principal components of

X. This second algorithm is encapsulated in Matlab code in-

cluded in Appendix B.

What does this mean? V spans the row space of Y

1

n

X

T

.

Therefore, V must also span the column space of

1

n

X. We

can conclude that nding the principal components amounts

to nding an orthonormal basis that spans the column space

of X.

6

VII. DISCUSSION

Principal component analysis (PCA) has widespread applica-

tions because it reveals simple underlying structures in com-

plex data sets using analytical solutions from linear algebra.

Figure 5 provides a brief summary for implementing PCA.

A primary benet of PCA arises from quantifying the impor-

tance of each dimension for describing the variability of a data

set. In particular, the measurement of the variance along each

5

Yis of the appropriate nmdimensions laid out in the derivation of section

6.1. This is the reason for the ipping of dimensions in 6.1 and Figure 4.

6

If the nal goal is to nd an orthonormal basis for the coulmn space of

X then we can calculate it directly without constructing Y. By symmetry

the columns of U produced by the SVD of

1

n

X must also be the principal

components.

A B

x

y

x

y

z

!

FIG. 6 Example of when PCA fails (red lines). (a) Tracking a per-

son on a ferris wheel (black dots). All dynamics can be described

by the phase of the wheel , a non-linear combination of the naive

basis. (b) In this example data set, non-Gaussian distributed data and

non-orthogonal axes causes PCA to fail. The axes with the largest

variance do not correspond to the appropriate answer.

principle component provides a means for comparing the rel-

ative importance of each dimension. An implicit hope behind

employing this method is that the variance along a small num-

ber of principal components (i.e. less than the number of mea-

surement types) provides a reasonable characterization of the

complete data set. This statement is the precise intuition be-

hind any method of dimensional reduction a vast arena of

active research. In the example of the spring, PCA identi-

es that a majority of variation exists along a single dimen-

sion (the direction of motion x), eventhough 6 dimensions are

recorded.

Although PCA works on a multitude of real world prob-

lems, any diligent scientist or engineer must ask when does

PCA fail? Before we answer this question, let us note a re-

markable feature of this algorithm. PCA is completely non-

parametric: any data set can be plugged in and an answer

comes out, requiring no parameters to tweak and no regard for

how the data was recorded. From one perspective, the fact that

PCA is non-parametric (or plug-and-play) can be considered

a positive feature because the answer is unique and indepen-

dent of the user. From another perspective the fact that PCA

is agnostic to the source of the data is also a weakness. For

instance, consider tracking a person on a ferris wheel in Fig-

ure 6a. The data points can be cleanly described by a single

variable, the precession angle of the wheel , however PCA

would fail to recover this variable.

A. Limits and Statistics of Dimensional Reduction

A deeper appreciation of the limits of PCA requires some con-

sideration about the underlying assumptions and in tandem,

a more rigorous description of the source of data. Gener-

ally speaking, the primary motivation behind this method is

to decorrelate the data set, i.e. remove second-order depen-

dencies. The manner of approaching this goal is loosely akin

to how one might explore a town in the Western United States:

drive down the longest road running through the town. When

10

one sees another big road, turn left or right and drive down

this road, and so forth. In this analogy, PCA requires that each

new road explored must be perpendicular to the previous, but

clearly this requirement is overly stringent and the data (or

town) might be arranged along non-orthogonal axes, such as

Figure 6b. Figure 6 provides two examples of this type of data

where PCA provides unsatisfying results.

To address these problems, we must dene what we consider

optimal results. In the context of dimensional reduction, one

measure of success is the degree to which a reduced repre-

sentation can predict the original data. In statistical terms,

we must dene an error function (or loss function). It can

be proved that under a common loss function, mean squared

error (i.e. L

2

norm), PCA provides the optimal reduced rep-

resentation of the data. This means that selecting orthogonal

directions for principal components is the best solution to pre-

dicting the original data. Given the examples of Figure 6, how

could this statement be true? Our intuitions from Figure 6

suggest that this result is somehow misleading.

The solution to this paradox lies in the goal we selected for the

analysis. The goal of the analysis is to decorrelate the data, or

said in other terms, the goal is to remove second-order depen-

dencies in the data. In the data sets of Figure 6, higher order

dependencies exist between the variables. Therefore, remov-

ing second-order dependencies is insufcient at revealing all

structure in the data.

7

Multiple solutions exist for removing higher-order dependen-

cies. For instance, if prior knowledge is known about the

problem, then a nonlinearity (i.e. kernel) might be applied

to the data to transform the data to a more appropriate naive

basis. For instance, in Figure 6a, one might examine the po-

lar coordinate representation of the data. This parametric ap-

proach is often termed kernel PCA.

Another direction is to impose more general statistical deni-

tions of dependency within a data set, e.g. requiring that data

along reduced dimensions be statistically independent. This

class of algorithms, termed, independent component analysis

(ICA), has been demonstrated to succeed in many domains

where PCA fails. ICA has been applied to many areas of sig-

nal and image processing, but suffers from the fact that solu-

tions are (sometimes) difcult to compute.

Writing this paper has been an extremely instructional expe-

rience for me. I hope that this paper helps to demystify the

motivation and results of PCA, and the underlying assump-

tions behind this important analysis technique. Please send

me a note if this has been useful to you as it inspires me to

keep writing!

7

When are second order dependencies sufcient for revealing all dependen-

cies in a data set? This statistical condition is met when the rst and second

order statistics are sufcient statistics of the data. This occurs, for instance,

when a data set is Gaussian distributed.

APPENDIX A: Linear Algebra

This section proves a few unapparent theorems in linear

algebra, which are crucial to this paper.

1. The inverse of an orthogonal matrix is its transpose.

Let A be an mn orthogonal matrix where a

i

is the i

th

column

vector. The i j

th

element of A

T

A is

(A

T

A)

i j

= a

i

T

a

j

=

1 i f i = j

0 otherwise

Therefore, because A

T

A = I, it follows that A

1

= A

T

.

2. For any matrix A, A

T

A and AA

T

are symmetric.

(AA

T

)

T

= A

TT

A

T

= AA

T

(A

T

A)

T

= A

T

A

TT

= A

T

A

3. A matrix is symmetric if and only if it is orthogonally

diagonalizable.

Because this statement is bi-directional, it requires a two-part

if-and-only-if proof. One needs to prove the forward and

the backwards if-then cases.

Let us start with the forward case. If A is orthogonally di-

agonalizable, then A is a symmetric matrix. By hypothesis,

orthogonally diagonalizable means that there exists some E

such that A = EDE

T

, where D is a diagonal matrix and E is

some special matrix which diagonalizes A. Let us compute

A

T

.

A

T

= (EDE

T

)

T

= E

TT

D

T

E

T

= EDE

T

= A

Evidently, if A is orthogonally diagonalizable, it must also be

symmetric.

The reverse case is more involved and less clean so it will be

left to the reader. In lieu of this, hopefully the forward case

is suggestive if not somewhat convincing.

4. A symmetric matrix is diagonalized by a matrix of its

orthonormal eigenvectors.

Let A be a square nn symmetric matrix with associated

eigenvectors {e

1

, e

2

, . . . , e

n

}. Let E = [e

1

e

2

. . . e

n

] where the

i

th

column of E is the eigenvector e

i

. This theorem asserts that

there exists a diagonal matrix D such that A = EDE

T

.

This proof is in two parts. In the rst part, we see that the

any matrix can be orthogonally diagonalized if and only if

it that matrixs eigenvectors are all linearly independent. In

the second part of the proof, we see that a symmetric matrix

11

has the special property that all of its eigenvectors are not just

linearly independent but also orthogonal, thus completing our

proof.

In the rst part of the proof, let A be just some matrix, not

necessarily symmetric, and let it have independent eigenvec-

tors (i.e. no degeneracy). Furthermore, let E = [e

1

e

2

. . . e

n

]

be the matrix of eigenvectors placed in the columns. Let D be

a diagonal matrix where the i

th

eigenvalue is placed in the ii

th

position.

We will now show that AE = ED. We can examine the

columns of the right-hand and left-hand sides of the equation.

Left hand side : AE = [Ae

1

Ae

2

. . . Ae

n

]

Right hand side : ED = [

1

e

1

2

e

2

. . .

n

e

n

]

Evidently, if AE = ED then Ae

i

=

i

e

i

for all i. This equa-

tion is the denition of the eigenvalue equation. Therefore,

it must be that AE = ED. A little rearrangement provides

A = EDE

1

, completing the rst part the proof.

For the second part of the proof, we show that a symmetric

matrix always has orthogonal eigenvectors. For some sym-

metric matrix, let

1

and

2

be distinct eigenvalues for eigen-

vectors e

1

and e

2

.

1

e

1

e

2

= (

1

e

1

)

T

e

2

= (Ae

1

)

T

e

2

= e

1

T

A

T

e

2

= e

1

T

Ae

2

= e

1

T

(

2

e

2

)

1

e

1

e

2

=

2

e

1

e

2

By the last relation we can equate that (

1

2

)e

1

e

2

= 0.

Since we have conjectured that the eigenvalues are in fact

unique, it must be the case that e

1

e

2

= 0. Therefore, the

eigenvectors of a symmetric matrix are orthogonal.

Let us back up now to our original postulate that A is a sym-

metric matrix. By the second part of the proof, we know

that the eigenvectors of A are all orthonormal (we choose

the eigenvectors to be normalized). This means that E is an

orthogonal matrix so by theorem 1, E

T

= E

1

and we can

rewrite the nal result.

A = EDE

T

. Thus, a symmetric matrix is diagonalized by a matrix of its

eigenvectors.

5. For any arbitrary m n matrix X, the symmetric

matrix X

T

X has a set of orthonormal eigenvectors

of { v

1

, v

2

, . . . , v

n

} and a set of associated eigenvalues

{

1

,

2

, . . . ,

n

}. The set of vectors {X v

1

, X v

2

, . . . , X v

n

}

then form an orthogonal basis, where each vector X v

i

is of

length

i

.

All of these properties arise from the dot product of any two

vectors from this set.

(X v

i

) (X v

j

) = (X v

i

)

T

(X v

j

)

= v

T

i

X

T

X v

j

= v

T

i

(

j

v

j

)

=

j

v

i

v

j

(X v

i

) (X v

j

) =

j

i j

The last relation arises because the set of eigenvectors of X is

orthogonal resulting in the Kronecker delta. In more simpler

terms the last relation states:

(X v

i

) (X v

j

) =

j

i = j

0 i = j

This equation states that any two vectors in the set are orthog-

onal.

The second property arises from the above equation by realiz-

ing that the length squared of each vector is dened as:

X v

i

2

= (X v

i

) (X v

i

) =

i

APPENDIX B: Code

This code is written for Matlab 6.5 (Release 13) from

Mathworks

8

. The code is not computationally ef-

cient but explanatory (terse comments begin with a %).

This rst version follows Section 5 by examining the

covariance of the data set.

function [signals,PC,V] = pca1(data)

% PCA1: Perform PCA using covariance.

% data - MxN matrix of input data

% (M dimensions, N trials)

% signals - MxN matrix of projected data

% PC - each column is a PC

% V - Mx1 matrix of variances

[M,N] = size(data);

% subtract off the mean for each dimension

mn = mean(data,2);

data = data - repmat(mn,1,N);

% calculate the covariance matrix

covariance = 1 / (N-1) * data * data;

% find the eigenvectors and eigenvalues

8

http://www.mathworks.com

12

[PC, V] = eig(covariance);

% extract diagonal of matrix as vector

V = diag(V);

% sort the variances in decreasing order

[junk, rindices] = sort(-1*V);

V = V(rindices);

PC = PC(:,rindices);

% project the original data set

signals = PC * data;

This second version follows section 6 computing PCA

through SVD.

function [signals,PC,V] = pca2(data)

% PCA2: Perform PCA using SVD.

% data - MxN matrix of input data

% (M dimensions, N trials)

% signals - MxN matrix of projected data

% PC - each column is a PC

% V - Mx1 matrix of variances

[M,N] = size(data);

% subtract off the mean for each dimension

mn = mean(data,2);

data = data - repmat(mn,1,N);

% construct the matrix Y

Y = data / sqrt(N-1);

% SVD does it all

[u,S,PC] = svd(Y);

% calculate the variances

S = diag(S);

V = S .* S;

% project the original data

signals = PC * data;

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

Eigenvalues and eigenvectors

From Wikipedia, the free encyclopedia

An ei genvec t or of a square matrix is a non-zero vector

that, when multiplied by , yields the original vector multiplied by

a single number ; that is:

The number is called the ei genval ue of corresponding to

.

[1]

In analytic geometry, for example, a three-element vector may be

seen as an arrow in three-dimensional space starting at the origin.

In that case, an eigenvector of a 33 matrix is an arrow whose

direction is either preserved or exactly reversed after multiplication

by . The corresponding eigenvalue determines how the length of

the arrow is changed by the operation, and whether its direction is

reversed or not.

In abstract linear algebra, these concepts are naturally extended to

more general situations, where the set of real scale factors is replaced by any field of scalars (such as algebraic or

complex numbers); the set of Cartesian vectors is replaced by any vector space (such as the continuous

functions, the polynomials or the trigonometric series), and matrix multiplication is replaced by any linear operator that

maps vectors to vectors (such as the derivative from calculus). In such cases, the "vector" in "eigenvector" may be

replaced by a more specific term, such as "eigenfunction", "eigenmode", "eigenface", or "eigenstate". Thus, for

example, the exponential function is an eigenfunction of the derivative operator " ", with eigenvalue

, since its derivative is .

The set of all eigenvectors of a matrix (or linear operator), each paired with its corresponding eigenvalue, is called the

ei gensyst em of that matrix.

[2]

An ei genspac e of a matrix is the set of all eigenvectors with the same

eigenvalue, together with the zero vector.

[1]

An ei genbasi s for is any basis for the set of all vectors that

consists of linearly independent eigenvectors of . Not every real matrix has real eigenvalues, but every complex

matrix has at least one complex eigenvalue.

The terms c har ac t er i st i c vec t or , c har ac t er i st i c val ue, and c har ac t er i st i c spac e are also used for these

concepts. The prefix ei gen - is adopted from the German word eigen for "self" or "proper".

Eigenvalues and eigenvectors have many applications in both pure and applied mathematics. They are used in matrix

factorization, in quantum mechanics, and in many other areas.

Cont ent s [hide]

1 Definition

1.1 Eigenvectors and eigenvalues of a real matrix

1.1.1 An example

1.1.2 Another example

1.1.3 Trivial cases

1.2 General definition

1.3 Eigenspace and spectrum

1.4 Eigenbasis

2 Generalizations to infinite-dimensional spaces

2.1 Eigenfunctions

2.2 Spectral theory

2.3 Associative algebras and representation theory

3 Eigenvalues and eigenvectors of matrices

In this shear mapping the red arrow changes

direction but the blue arrow does not. The blue arrow

is an eigenvector of this shear mapping, and since its

length is unchanged its eigenvalue is 1.

Read Edit View history Article Talk

Main page

Contents

Featured content

Current events

Random article

Donate to Wikipedia

Help

About Wikipedia

Community portal

Recent changes

Contact Wikipedia

()

Catal

esky

Dansk

Deutsch

Espaol

Esperanto

Franais

Italiano

Latvieu

Lietuvi

Magyar

Nederlands

Norsk bokml

Norsk nynorsk

Polski

Portugus

Romn

Simple English

Slovenina

Suomi

Svenska

Interaction

Toolbox

Print/export

Languages

Create account Log in

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

3.1 Characteristic polynomial

3.1.1 In the real domain

3.1.2 In the complex domain

3.2 Algebraic multiplicities

3.2.1 Example

3.3 Diagonalization and eigendecomposition

3.4 Further properties

3.5 Left and right eigenvectors

4 Calculation

4.1 Computing the eigenvalues

4.2 Computing the eigenvectors

5 History

6 Applications

6.1 Eigenvalues of geometric transformations

6.2 Schrdinger equation

6.3 Molecular orbitals

6.4 Geology and glaciology

6.5 Principal components analysis

6.6 Vibration analysis

6.7 Eigenfaces

6.8 Tensor of moment of inertia

6.9 Stress tensor

6.10 Eigenvalues of a graph

6.11 Basic reproduction number

7 See also

8 Notes

9 References

10 External links

See also: Euclidean vector and Matrix (mathematics)

In many contexts, a vector can be assumed to be a list of real

numbers (called elements), written vertically with brackets around the

entire list, such as the vectors u and v below. Two vectors are said to

be scalar multiples of each other (also called parallel or collinear) if

they have the same number of elements, and if every element of one

vector is obtained by multiplying each corresponding element in the

other vector by the same number (known as a scaling factor, or a

scalar). For example, the vectors

and

are scalar multiples of each other, because each element of is 20

times the corresponding element of .

A vector with three elements, like or above, may represent a point in three-dimensional space, relative to some

Cartesian coordinate system. It helps to think of such a vector as the tip of an arrow whose tail is at the origin of the

coordinate system. In this case, the condition " is parallel to " means that the two arrows lie on the same straight

line, and may differ only in length and direction along that line.

If we multiply any square matrix with rows and columns by such a vector , the result will be another vector

, also with rows and one column. That is,

[edit]

Definition

[edit] Ei genvec t or s and ei genval ues of a r eal mat r i x

Matrix acts by stretching the vector , not

changing its direction, so is an eigenvector of

.

Ting Vit

Edit links

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

is mapped to

where, for each index ,

In general, if is not all zeros, the vectors and will not be parallel. When they are parallel (that is, when there

is some real number such that ) we say that is an ei genvec t or of . In that case, the scale factor

is said to be the ei genval ue corresponding to that eigenvector.

In particular, multiplication by a 33 matrix may change both the direction and the magnitude of an arrow in

three-dimensional space. However, if is an eigenvector of with eigenvalue , the operation may only change its

length, and either keep its direction or flip it (make the arrow point in the exact opposite direction). Specifically, the

length of the arrow will increase if , remain the same if , and decrease it if . Moreover,

the direction will be precisely the same if , and flipped if . If , then the length of the arrow

becomes zero.

For the transformation matrix

the vector

is an eigenvector with eigenvalue 2. Indeed,

On the other hand the vector

is not an eigenvector, since

and this vector is not a multiple of the original vector .

For the matrix

we have

[edit] An ex ampl e

The transformation matrix preserves the direction of

vectors parallel to (in blue) and (in violet). The

points that lie on the line through the origin, parallel to an

eigenvector, remain on the line after the transformation.

The vectors in red are not eigenvectors, therefore their

direction is altered by the transformation. See also: An

extended version, showing all four quadrants.

[edit] Anot her ex ampl e

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

and

Therefore, the vectors , and are eigenvectors of corresponding to the

eigenvalues 0, 3, and 2, respectively. (Here the symbol indicates matrix transposition, in this case turning the row

vectors into column vectors.)

The identity matrix (whose general element is 1 if , and 0 otherwise) maps every vector to itself.

Therefore, every vector is an eigenvector of , with eigenvalue 1.

More generally, if is a diagonal matrix (with whenever ), and is a vector parallel to axis (that

is, , and if ), then where . That is, the eigenvalues of a diagonal

matrix are the elements of its main diagonal. This is trivially the case of any 1 1 matrix.

The concept of eigenvectors and eigenvalues extends naturally to abstract linear transformations on abstract vector

spaces. Namely, let be any vector space over some field of scalars, and let be a linear transformation

mapping into . We say that a non-zero vector of is an ei genvec t or of if (and only if) there is a scalar

in such that

.

This equation is called the eigenvalue equation for , and the scalar is the ei genval ue of corresponding to

the eigenvector . Note that means the result of applying the operator to the vector , while means

the product of the scalar by .

[3]

The matrix-specific definition is a special case of this abstract definition. Namely, the vector space is the set of all

column vectors of a certain size 1, and is the linear transformation that consists in multiplying a vector by the

given matrix .

Some authors allow to be the zero vector in the definition of eigenvector.

[4]

This is reasonable as long as we

define eigenvalues and eigenvectors carefully: If we would like the zero vector to be an eigenvector, then we must

first define an eigenvalue of as a scalar in such that there is a nonzero vector in with . We

then define an eigenvector to be a vector in such that there is an eigenvalue in with . This

way, we ensure that it is not the case that every scalar is an eigenvalue corresponding to the zero vector.

If is an eigenvector of , with eigenvalue , then any scalar multiple of with nonzero is also an

eigenvector with eigenvalue , since . Moreover, if and are

eigenvectors with the same eigenvalue , then is also an eigenvector with the same eigenvalue .

Therefore, the set of all eigenvectors with the same eigenvalue , together with the zero vector, is a linear subspace

of , called the ei genspac e of associated to .

[5][6]

If that subspace has dimension 1, it is sometimes called

an ei genl i ne .

[7]

The geometric multiplicity of an eigenvalue is the dimension of the eigenspace associated to , i.e.

number of linearly independent eigenvectors with that eigenvalue. These eigenvectors can be chosen so that they are

pairwise orthogonal and have unit length under some arbitrary inner product defined on . In other words, every

eigenspace has an orthonormal basis of eigenvectors.

Conversely, any eigenvector with eigenvalue must be linearly independent from all eigenvectors that are associated

to a different eigenvalue . Therefore a linear transformation that operates on an -dimensional space cannot

[edit] Tr i vi al c ases

[edit] Gener al def i ni t i on

[edit] Ei genspac e and spec t r um

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

have more than distinct eigenvalues (or eigenspaces).

[8]

Any subspace spanned by eigenvectors of is an invariant subspace of .

The list of eigenvalues of is sometimes called the spec t r um of . The order of this list is arbitrary, but the

number of times that an eigenvalue appears is important.

There is no unique way to choose a basis for an eigenspace of an abstract linear operator based only on itself,

without some additional data such as a choice of coordinate basis for . Even for an eigenline, the basis vector is

indeterminate in both magnitude and orientation. If the scalar field is the real numbers , one can order the

eigenspaces by their eigenvalues. Since the modulus of an eigenvalue is important in many applications, the

eigenspaces are often ordered by that criterion.

An ei genbasi s for a linear operator that operates on a vector space is a basis for that consists entirely of

eigenvectors of (possibly with different eigenvalues). Such a basis may not exist.

Suppose has finite dimension , and let be the sum of the geometric multiplicities over all distinct

eigenvalues of . This integer is the maximum number of linearly independent eigenvectors of , and therefore

cannot exceed . If is exactly , then admits an eigenbasis; that is, there exists a basis for that consists

of eigenvectors. The matrix that represents relative to this basis is a diagonal matrix, whose diagonal

elements are the eigenvalues associated to each basis vector.

Conversely, if the sum is less than , then admits no eigenbasis, and there is no choice of coordinates that

will allow to be represented by a diagonal matrix.

Note that is at least equal to the number of distinct eigenvalues of , but may be larger than that.

[9]

For

example, the identity operator on has , and any basis of is an eigenbasis of ; but its only

eigenvalue is 1, with .

For more details on this topic, see Spectral theorem.

The definition of eigenvalue of a linear transformation remains valid even if the underlying space is an infinite

dimensional Hilbert or Banach space. Namely, a scalar is an eigenvalue if and only if there is some nonzero vector

such that .

A widely used class of linear operators acting on infinite dimensional spaces are the differential operators on function

spaces. Let be a linear differential operator in on the space of infinitely differentiable real functions of a real

argument . The eigenvalue equation for is the differential equation

The functions that satisfy this equation are commonly called ei genf unc t i ons . For the derivative operator , an

eigenfunction is a function that, when differentiated, yields a constant times the original function. If is zero, the

generic solution is a constant function. If is non-zero, the solution is an exponential function

Eigenfunctions are an essential tool in the solution of differential equations and many other applied and theoretical

fields. For instance, the exponential functions are eigenfunctions of any shift invariant linear operator. This fact is the

basis of powerful Fourier transform methods for solving all sorts of problems.

If is an eigenvalue of , then the operator is not one-to-one, and therefore its inverse

is not defined. The converse is true for finite-dimensional vector spaces, but not for infinite-dimensional ones. In

general, the operator may not have an inverse, even if is not an eigenvalue.

For this reason, in functional analysis one defines the spectrum of a linear operator as the set of all scalars for

which the operator has no bounded inverse. Thus the spectrum of an operator always contains all its

eigenvalues, but is not limited to them.

[edit] Ei genbasi s

[edit]

Generalizations to infinite-dimensional spaces

[edit] Ei genf unc t i ons

[edit] Spec t r al t heor y

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

More algebraically, rather than generalizing the vector space to an infinite dimensional space, one can generalize the

algebraic object that is acting on the space, replacing a single operator acting on a vector space with an algebra

representation an associative algebra acting on a module. The study of such actions is the field of representation

theory.

A closer analog of eigenvalues is given by the representation-theoretical concept of weight, with the analogs of

eigenvectors and eigenspaces being weight vectors and weight spaces.

The eigenvalue equation for a matrix is

which is equivalent to

where is the identity matrix. It is a fundamental result of linear algebra that an equation has a

non-zero solution if and only if the determinant of the matrix is zero. It follows that the eigenvalues

of are precisely the real numbers that satisfy the equation

The left-hand side of this equation can be seen (using Leibniz' rule for the determinant) to be a polynomial function of

the variable . The degree of this polynomial is , the order of the matrix. Its coefficients depend on the entries of

, except that its term of degree is always . This polynomial is called the characteristic polynomial of

; and the above equation is called the characteristic equation (or, less often, the secular equation) of .

For example, let be the matrix

The characteristic polynomial of is

which is

The roots of this polynomial are 2, 1, and 11. Indeed these are the only three eigenvalues of , corresponding to the

eigenvectors and (or any non-zero multiples thereof).

Since the eigenvalues are roots of the characteristic polynomial, an matrix has at most eigenvalues. If the

matrix has real entries, the coefficients of the characteristic polynomial are all real; but it may have fewer than real

roots, or no real roots at all.

For example, consider the cyclic permutation matrix

This matrix shifts the coordinates of the vector up by one position, and moves the first coordinate to the bottom. Its

characteristic polynomial is which has one real root . Any vector with three equal non-zero

elements is an eigenvector for this eigenvalue. For example,

[edit] Assoc i at i ve al gebr as and r epr esent at i on t heor y

[edit]

Eigenvalues and eigenvectors of matrices

[edit] Char ac t er i st i c pol ynomi al

[edit] I n t he r eal domai n

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

The fundamental theorem of algebra implies that the characteristic polynomial of an matrix , being a

polynomial of degree , has exactly complex roots. More precisely, it can be factored into the product of linear

terms,

where each is a complex number. The numbers , , ... , (which may not be all distinct) are roots of the

polynomial, and are precisely the eigenvalues of .

Even if the entries of are all real numbers, the eigenvalues may still have non-zero imaginary parts (and the

elements of the corresponding eigenvectors will therefore also have non-zero imaginary parts). Also, the eigenvalues

may be irrational numbers even if all the entries of are rational numbers, or all are integers. However, if the

entries of are algebraic numbers (which include the rationals), the eigenvalues will be (complex) algebraic

numbers too.

The non-real roots of a real polynomial with real coefficients can be grouped into pairs of complex conjugate values,

namely with the two members of each pair having the same real part and imaginary parts that differ only in sign. If

the degree is odd, then by the intermediate value theorem at least one of the roots will be real. Therefore, any real

matrix with odd order will have at least one real eigenvalue; whereas a real matrix with even order may have no real

eigenvalues.

In the example of the 33 cyclic permutation matrix , above, the characteristic polynomial has two

additional non-real roots, namely

and ,

where is the imaginary unit. Note that , , and . Then

and

Therefore, the vectors and are eigenvectors of , with eigenvalues 1, , and ,

respectively.

Let be an eigenvalue of an matrix . The algebraic multiplicity of is its multiplicity as a root

of the characteristic polynomial, that is, the largest integer such that divides evenly that polynomial.

Like the geometric multiplicity , the algebraic multiplicity is an integer between 1 and ; and the sum of

over all distinct eigenvalues also cannot exceed . If complex eigenvalues are considered, is exactly

.

It can be proved that the geometric multiplicity of an eigenvalue never exceeds its algebraic multiplicity

. Therefore, is at most .

For the matrix:

the characteristic polynomial of is

,

[edit] I n t he c ompl ex domai n

[edit] Al gebr ai c mul t i pl i c i t i es

[edit] Ex ampl e

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

being the product of the diagonal with a lower triangular matrix.

The roots of this polynomial, and hence the eigenvalues, are 2 and 3. The algebraic multiplicity of each eigenvalue is

2; in other words they are both double roots. On the other hand, the geometric multiplicity of the eigenvalue 2 is only

1, because its eigenspace is spanned by the vector , and is therefore 1 dimensional. Similarly, the

geometric multiplicity of the eigenvalue 3 is 1 because its eigenspace is spanned by . Hence, the total

algebraic multiplicity of A, denoted , is 4, which is the most it could be for a 4 by 4 matrix. The geometric

multiplicity is 2, which is the smallest it could be for a matrix which has two distinct eigenvalues.

If the sum of the geometric multiplicities of all eigenvalues is exactly , then has a set of linearly

independent eigenvectors. Let be a square matrix whose columns are those eigenvectors, in any order. Then we

will have , where is the diagonal matrix such that is the eigenvalue associated to column of

. Since the columns of are linearly independent, the matrix is invertible. Premultiplying both sides by

we get . By definition, therefore, the matrix is diagonalizable.

Conversely, if is diagonalizable, let be a non-singular square matrix such that is some diagonal

matrix . Multiplying both sides on the left by we get . Therefore each column of must be an

eigenvector of , whose eigenvalue is the corresponding element on the diagonal of . Since the columns of

must be linearly independent, it follows that . Thus is equal to if and only if is diagonalizable.

If is diagonalizable, the space of all -element vectors can be decomposed into the direct sum of the eigenspaces

of . This decomposition is called the eigendecomposition of , and it is the preserved under change of

coordinates.

A matrix that is not diagonalizable is said to be defective. For defective matrices, the notion of eigenvector can be

generalized to generalized eigenvectors, and that of diagonal matrix to a J ordan form matrix. Over an algebraically

closed field, any matrix has a J ordan form and therefore admits a basis of generalized eigenvectors, and a

decomposition into generalized eigenspaces

Let be an arbitrary matrix of complex numbers with eigenvalues , , ... . (Here it is understood

that an eigenvalue with algebraic multiplicity occurs times in this list.) Then

The trace of , defined as the sum of its diagonal elements, is also the sum of all eigenvalues:

.

The determinant of is the product of all eigenvalues:

.

The eigenvalues of the th power of , i.e. the eigenvalues of , for any positive integer , are

The matrix is invertible if and only if all the eigenvalues are nonzero.

If is invertible, then the eigenvalues of are

If is equal to its conjugate transpose (in other words, if is Hermitian), then every eigenvalue is real. The

same is true of any a symmetric real matrix. If is also positive-definite, positive-semidefinite, negative-definite,

or negative-semidefinite every eigenvalue is positive, non-negative, negative, or non-positive respectively.

Every eigenvalue of a unitary matrix has absolute value .

See also: left and right (algebra)

[edit] Di agonal i zat i on and ei gendec omposi t i on

[edit] Fur t her pr oper t i es

[edit] Lef t and r i ght ei genvec t or s

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

The use of matrices with a single column (rather than a single row) to represent vectors is traditional in many

disciplines. For that reason, the word "eigenvector" almost always means a r i ght ei genvec t or , namely a column

vector that must placed to the right of the matrix in the defining equation

.

There may be also single-row vectors that are unchanged when they occur on the left side of a product with a square

matrix ; that is, which satisfy the equation

Any such row vector is called a l ef t ei genvec t or of .

The left eigenvectors of are transposes of the right eigenvectors of the transposed matrix , since their defining

equation is equivalent to

It follows that, if is Hermitian, its left and right eigenvectors are complex conjugates. In particular if is a real

symmetric matrix, they are the same except for transposition.

Main article: Eigenvalue algorithm

The eigenvalues of a matrix can be determined by finding the roots of the characteristic polynomial. Explicit

algebraic formulas for the roots of a polynomial exist only if the degree is 4 or less. According to the AbelRuffini

theorem there is no general, explicit and exact algebraic formula for the roots of a polynomial with degree 5 or more.

It turns out that any polynomial with degree is the characteristic polynomial of some companion matrix of order .

Therefore, for matrices of order 5 or more, the eigenvalues and eigenvectors cannot be obtained by an explicit

algebraic formula, and must therefore be computed by approximate numerical methods.

In theory, the coefficients of the characteristic polynomial can be computed exactly, since they are sums of products

of matrix elements; and there are algorithms that can find all the roots of a polynomial of arbitrary degree to any

required accuracy.

[10]

However, this approach is not viable in practice because the coefficients would be

contaminated by unavoidable round-off errors, and the roots of a polynomial can be an extremely sensitive function of

the coefficients (as exemplified by Wilkinson's polynomial).

[10]

Efficient, accurate methods to compute eigenvalues and eigenvectors of arbitrary matrices were not known until the

advent of the QR algorithm in 1961.

[10]

Combining the Householder transformation with the LU decomposition

results in an algorithm with better convergence than the QR algorithm.

[citation needed]

For large Hermitian sparse

matrices, the Lanczos algorithm is one example of an efficient iterative method to compute eigenvalues and

eigenvectors, among several other possibilities.

[10]

Once the (exact) value of an eigenvalue is known, the corresponding eigenvectors can be found by finding non-zero

solutions of the eigenvalue equation, that becomes a system of linear equations with known coefficients. For example,

once it is known that 6 is an eigenvalue of the matrix

we can find its eigenvectors by solving the equation , that is

This matrix equation is equivalent to two linear equations

that is

Both equations reduce to the single linear equation . Therefore, any vector of the form , for any

non-zero real number , is an eigenvector of with eigenvalue .

[edit]

Calculation

[edit] Comput i ng t he ei genval ues

[edit] Comput i ng t he ei genvec t or s

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

The matrix above has another eigenvalue . A similar calculation shows that the corresponding eigenvectors

are the non-zero solutions of , that is, any vector of the form , for any non-zero real number

.

Some numeric methods that compute the eigenvalues of a matrix also determine a set of corresponding eigenvectors

as a by-product of the computation.

Eigenvalues are often introduced in the context of linear algebra or matrix theory. Historically, however, they arose in

the study of quadratic forms and differential equations.

Euler studied the rotational motion of a rigid body and discovered the importance of the principal axes. Lagrange

realized that the principal axes are the eigenvectors of the inertia matrix.

[11]

In the early 19th century, Cauchy saw

how their work could be used to classify the quadric surfaces, and generalized it to arbitrary dimensions.

[12]

Cauchy

also coined the term racine caractristique (characteristic root) for what is now called eigenvalue; his term survives in

characteristic equation.

[13]

Fourier used the work of Laplace and Lagrange to solve the heat equation by separation of variables in his famous

1822 book Thorie analytique de la chaleur.

[14]

Sturm developed Fourier's ideas further and brought them to the

attention of Cauchy, who combined them with his own ideas and arrived at the fact that real symmetric matrices have

real eigenvalues.

[12]

This was extended by Hermite in 1855 to what are now called Hermitian matrices.

[13]

Around

the same time, Brioschi proved that the eigenvalues of orthogonal matrices lie on the unit circle,

[12]

and Clebsch

found the corresponding result for skew-symmetric matrices.

[13]

Finally, Weierstrass clarified an important aspect in

the stability theory started by Laplace by realizing that defective matrices can cause instability.

[12]

In the meantime, Liouville studied eigenvalue problems similar to those of Sturm; the discipline that grew out of their

work is now called SturmLiouville theory.

[15]

Schwarz studied the first eigenvalue of Laplace's equation on general

domains towards the end of the 19th century, while Poincar studied Poisson's equation a few years later.

[16]

At the start of the 20th century, Hilbert studied the eigenvalues of integral operators by viewing the operators as

infinite matrices.

[17]

He was the first to use the German word eigen to denote eigenvalues and eigenvectors in 1904,

though he may have been following a related usage by Helmholtz. For some time, the standard term in English was

"proper value", but the more distinctive term "eigenvalue" is standard today.

[18]

The first numerical algorithm for computing eigenvalues and eigenvectors appeared in 1929, when Von Mises

published the power method. One of the most popular methods today, the QR algorithm, was proposed

independently by J ohn G.F. Francis

[19]

and Vera Kublanovskaya

[20]

in 1961.

[21]

The following table presents some example transformations in the plane along with their 22 matrices, eigenvalues,

and eigenvectors.

scaling unequal scaling rotation horizontal shear

hyperbolic

rotation

illustration

matrix

characteristic

polynomial

eigenvalues

[edit]

History

[edit]

Applications

[edit] Ei genval ues of geomet r i c t r ansf or mat i ons

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

,

algebraic

multipl.

geometric

multipl.

eigenvectors

All non-zero

vectors

Note that the characteristic equation for a rotation is a quadratic equation with discriminant ,

which is a negative number whenever is not an integer multiple of 180. Therefore, except for these special cases,

the two eigenvalues are complex numbers, ; and all eigenvectors have non-real entries. Indeed,

except for those special cases, a rotation changes the direction of every nonzero vector in the plane.

An example of an eigenvalue equation where the transformation

is represented in terms of a differential operator is the time-

independent Schrdinger equation in quantum mechanics:

where , the Hamiltonian, is a second-order differential operator

and , the wavefunction, is one of its eigenfunctions

corresponding to the eigenvalue , interpreted as its energy.

However, in the case where one is interested only in the bound

state solutions of the Schrdinger equation, one looks for

within the space of square integrable functions. Since this space is

a Hilbert space with a well-defined scalar product, one can

introduce a basis set in which and can be represented as

a one-dimensional array and a matrix respectively. This allows one

to represent the Schrdinger equation in a matrix form.

Bra-ket notation is often used in this context. A vector, which

represents a state of the system, in the Hilbert space of square

integrable functions is represented by . In this notation, the

Schrdinger equation is:

where is an ei genst at e of . It is a self adjoint operator,

the infinite dimensional analog of Hermitian matrices (see

Observable). As in the matrix case, in the equation above

is understood to be the vector obtained by application of

the transformation to .

In quantum mechanics, and in particular in atomic and molecular physics, within the HartreeFock theory, the atomic

and molecular orbitals can be defined by the eigenvectors of the Fock operator. The corresponding eigenvalues are

interpreted as ionization potentials via Koopmans' theorem. In this case, the term eigenvector is used in a somewhat

more general meaning, since the Fock operator is explicitly dependent on the orbitals and their eigenvalues. If one

wants to underline this aspect one speaks of nonlinear eigenvalue problem. Such equations are usually solved by an

iteration procedure, called in this case self-consistent field method. In quantum chemistry, one often represents the

HartreeFock equation in a non-orthogonal basis set. This particular representation is a generalized eigenvalue

problem called Roothaan equations.

[edit] Sc hr di nger equat i on

The wavefunctions associated with the bound

states of an electron in a hydrogen atom can be seen

as the eigenvectors of the hydrogen atom

Hamiltonian as well as of the angular momentum

operator. They are associated with eigenvalues

interpreted as their energies (increasing downward:

) and angular momentum

(increasing across: s, p, d, ...). The illustration shows

the square of the absolute value of the

wavefunctions. Brighter areas correspond to higher

probability density for a position measurement. The

center of each figure is the atomic nucleus, a proton.

[edit] Mol ec ul ar or bi t al s

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

In geology, especially in the study of glacial till, eigenvectors and eigenvalues are used as a method by which a

mass of information of a clast fabric's constituents' orientation and dip can be summarized in a 3-D space by six

numbers. In the field, a geologist may collect such data for hundreds or thousands of clasts in a soil sample, which

can only be compared graphically such as in a Tri-Plot (Sneed and Folk) diagram,

[22][23]

or as a Stereonet on a

Wulff Net.

[24]

The output for the orientation tensor is in the three orthogonal (perpendicular) axes of space. The three eigenvectors

are ordered by their eigenvalues ;

[25]

then is the primary orientation/dip of clast,

is the secondary and is the tertiary, in terms of strength. The clast orientation is defined as the direction of the

eigenvector, on a compass rose of 360. Dip is measured as the eigenvalue, the modulus of the tensor: this is valued

from 0 (no dip) to 90 (vertical). The relative values of , , and are dictated by the nature of the sediment's

fabric. If , the fabric is said to be isotropic. If , the fabric is said to be planar.

If , the fabric is said to be linear.

[26]

Main article: Principal component analysis

See also: Positive semidefinite matrix and Factor analysis

The eigendecomposition of a symmetric positive semidefinite (PSD) matrix

yields an orthogonal basis of eigenvectors, each of which has a

nonnegative eigenvalue. The orthogonal decomposition of a PSD matrix is

used in multivariate analysis, where the sample covariance matrices are

PSD. This orthogonal decomposition is called principal components analysis

(PCA) in statistics. PCA studies linear relations among variables. PCA is

performed on the covariance matrix or the correlation matrix (in which each

variable is scaled to have its sample variance equal to one). For the

covariance or correlation matrix, the eigenvectors correspond to principal

components and the eigenvalues to the variance explained by the principal

components. Principal component analysis of the correlation matrix

provides an orthonormal eigen-basis for the space of the observed data: In

this basis, the largest eigenvalues correspond to the principal-components

that are associated with most of the covariability among a number of

observed data.

Principal component analysis is used to study large data sets, such as

those encountered in data mining, chemical research, psychology, and in

marketing. PCA is popular especially in psychology, in the field of

psychometrics. In Q methodology, the eigenvalues of the correlation matrix

determine the Q-methodologist's judgment of practical significance (which

differs from the statistical significance of hypothesis testing): The factors with eigenvalues greater than 1.00 are

considered practically significant, that is, as explaining an important amount of the variability in the data, while

eigenvalues less than 1.00 are considered practically insignificant, as explaining only a negligible portion of the data

variability. More generally, principal component analysis can be used as a method of factor analysis in structural

equation modeling.

Main article: Vibration

Eigenvalue problems occur naturally in the vibration analysis of

mechanical structures with many degrees of freedom. The eigenvalues are

used to determine the natural frequencies (or ei genf r equenc i es) of

vibration, and the eigenvectors determine the shapes of these vibrational

modes. In particular, undamped vibration is governed by

or

[edit] Geol ogy and gl ac i ol ogy

[edit] Pr i nc i pal c omponent s anal ysi s

PCA of the multivariate Gaussian

distribution centered at with a

standard deviation of 3 in roughly the

direction and of 1 in

the orthogonal direction. The vectors shown

are unit eigenvectors of the (symmetric,

positive-semidefinite) covariance matrix

scaled by the square root of the

corresponding eigenvalue. (J ust as in the

one-dimensional case, the square root is

taken because the standard deviation is

more readily visualized than the variance.

[edit] Vi br at i on anal ysi s

1st lateral bending (See vibration for

more types of vibration)

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

that is, acceleration is proportional to position (i.e., we expect to be sinusoidal in time). In dimensions,

becomes a mass matrix and a stiffness matrix. Admissible solutions are then a linear combination of solutions to

the generalized eigenvalue problem

where is the eigenvalue and is the angular frequency. Note that the principal vibration modes are different

from the principal compliance modes, which are the eigenvectors of alone. Furthermore, damped vibration,

governed by

leads to what is called a so-called quadratic eigenvalue problem,

This can be reduced to a generalized eigenvalue problem by clever algebra at the cost of solving a larger system.

The orthogonality properties of the eigenvectors allows decoupling of the differential equations so that the system can

be represented as linear summation of the eigenvectors. The eigenvalue problem of complex structures is often

solved using finite element analysis, but neatly generalize the solution to scalar-valued vibration problems.

Main article: Eigenface

In image processing, processed images of faces can be seen as vectors

whose components are the brightnesses of each pixel.

[27]

The dimension of

this vector space is the number of pixels. The eigenvectors of the covariance

matrix associated with a large set of normalized pictures of faces are called

ei genf ac es ; this is an example of principal components analysis. They are

very useful for expressing any face image as a linear combination of some of

them. In the facial recognition branch of biometrics, eigenfaces provide a

means of applying data compression to faces for identification purposes.

Research related to eigen vision systems determining hand gestures has also

been made.

Similar to this concept, ei genvoi c es represent the general direction of

variability in human pronunciations of a particular utterance, such as a word in

a language. Based on a linear combination of such eigenvoices, a new voice

pronunciation of the word can be constructed. These concepts have been

found useful in automatic speech recognition systems, for speaker adaptation.

In mechanics, the eigenvectors of the moment of inertia tensor define the principal axes of a rigid body. The tensor of

moment of inertia is a key quantity required to determine the rotation of a rigid body around its center of mass.

In solid mechanics, the stress tensor is symmetric and so can be decomposed into a diagonal tensor with the

eigenvalues on the diagonal and eigenvectors as a basis. Because it is diagonal, in this orientation, the stress tensor

has no shear components; the components it does have are the principal components.

In spectral graph theory, an eigenvalue of a graph is defined as an eigenvalue of the graph's adjacency matrix , or

(increasingly) of the graph's Laplacian matrix (see also Discrete Laplace operator), which is either

(sometimes called the combinatorial Laplacian) or (sometimes called the normalized

Laplacian), where is a diagonal matrix with equal to the degree of vertex , and in , the th diagonal

entry is . The th principal eigenvector of a graph is defined as either the eigenvector corresponding to

the th largest or th smallest eigenvalue of the Laplacian. The first principal eigenvector of the graph is also

referred to merely as the principal eigenvector.

The principal eigenvector is used to measure the centrality of its vertices. An example is Google's PageRank

algorithm. The principal eigenvector of a modified adjacency matrix of the World Wide Web graph gives the page

ranks as its components. This vector corresponds to the stationary distribution of the Markov chain represented by

[edit] Ei genf ac es

Eigenfaces as examples of

eigenvectors

[edit] Tensor of moment of i ner t i a

[edit] St r ess t ensor

[edit] Ei genval ues of a gr aph

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

the row-normalized adjacency matrix; however, the adjacency matrix must first be modified to ensure a stationary

distribution exists. The second smallest eigenvector can be used to partition the graph into clusters, via spectral

clustering. Other methods are also available for clustering.

See Basic reproduction number

The basic reproduction number ( ) is a fundamental number in the study of how infectious diseases spread. If one

infectious person is put into a population of completely susceptible people, then is the average number of people

that one infectious person will infect. The generation time of an infection is the time, , from one person becoming

infected to the next person becoming infected. In a heterogenous population, the next generation matrix defines how

many people in the population will become infected after time has passed. is then the largest eigenvalue of

the next generation matrix.

[28][29]

Nonlinear eigenproblem

Quadratic eigenvalue problem

Introduction to eigenstates

Eigenplane

J ordan normal form

List of numerical analysis software

Antieigenvalue theory

1. ^

a

b

Wolfram Research, Inc. (2010) Eigenvector .

Accessed on 2010-01-29.

2. ^ William H. Press, Saul A. Teukolsky, William T.

Vetterling, Brian P. Flannery (2007), [[Numerical

Recipes : The Art of Scientific Computing], Chapter

11: Eigensystems., pages=563597. Third edition,

Cambridge University Press. ISBN 9780521880688

3. ^ See Korn & Korn 2000, Section 14.3.5a; Friedberg,

Insel & Spence 1989, p. 217

4. ^ Axler, Sheldon, "Ch. 5", Linear Algebra Done Right

(2nd ed.), p. 77

5. ^ Shilov 1977, p. 109

6. ^ Lemma for the eigenspace

7. ^ Schaum's Easy Outline of Linear Algebra , p. 111

8. ^ For a proof of this lemma, see Roman 2008, Theorem

8.2 on p. 186; Shilov 1977, p. 109; Hefferon 2001, p.

364; Beezer 2006, Theorem EDELI on p. 469; and

Lemma for linear independence of eigenvectors

9. ^ Strang, Gilbert (1988), Linear Algebra and Its

Applications (3rd ed.), San Diego: Harcourt

10. ^

a

b

c

d

Trefethen, Lloyd N.; Bau, David (1997),

Numerical Linear Algebra, SIAM

11. ^ See Hawkins 1975, 2

12. ^

a

b

c

d

See Hawkins 1975, 3

13. ^

a

b

c

See Kline 1972, pp. 807808

14. ^ See Kline 1972, p. 673

15. ^ See Kline 1972, pp. 715716

16. ^ See Kline 1972, pp. 706707

17. ^ See Kline 1972, p. 1063

18. ^ See Aldrich 2006

19. ^ Francis, J . G. F. (1961), "The QR Transformation, I

(part 1)", The Computer Journal 4 (3): 265271,

doi:10.1093/comjnl/4.3.265 and Francis, J . G. F.

(1962), "The QR Transformation, II (part 2)", The

USSR Computational Mathematics and Mathematical

Physics 3 : 637657. Also published in: Zhurnal

Vychislitel'noi Matematiki i Matematicheskoi Fiziki 1 (4),

1961: 555570

21. ^ See Golub & van Loan 1996, 7.3; Meyer 2000, 7.3

22. ^ Graham, D.; Midgley, N. (2000), "Graphical

representation of particle shape using triangular

diagrams: an Excel spreadsheet method", Earth Surface

Processes and Landforms 25 (13): 14731477,

doi:10.1002/1096-9837(200012)25:13<1473::AID-

ESP158>3.0.CO;2-C

23. ^ Sneed, E. D.; Folk, R. L. (1958), "Pebbles in the lower

Colorado River, Texas, a study of particle

morphogenesis", Journal of Geology 66 (2): 114150,

doi:10.1086/626490

24. ^ Knox-Robinson, C; Gardoll, Stephen J (1998), "GIS-

stereoplot: an interactive stereonet plotting module for

ArcView 3.0 geographic information system", Computers

& Geosciences 24 (3): 243, doi:10.1016/S0098-

3004(97)00122-2

25. ^ Stereo32 software

26. ^ Benn, D.; Evans, D. (2004), A Practical Guide to the

study of Glacial Sediments, London: Arnold, pp. 103

107

27. ^ Xirouhakis, A.; Votsis, G.; Delopoulus, A. (2004),

Estimation of 3D motion and structure of human

faces (PDF), Online paper in PDF format, National

Technical University of Athens

28. ^ Diekmann O, Heesterbeek J AP, Metz J AJ (1990), "On

the definition and the computation of the basic

reproduction ratio R0 in models for infectious diseases in

heterogeneous populations", Journal of Mathematical

Biology 28 (4): 365382, doi:10.1007/BF00178324 ,

PMID 2117040

29. ^ Odo Diekmann and J . A. P. Heesterbeek (2000),

[edit] Basi c r epr oduc t i on number

[edit]

See also

[edit]

Notes

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

Computer Journal 4 (4): 332345,

doi:10.1093/comjnl/4.4.332

20. ^ Kublanovskaya, Vera N. (1961), "On some algorithms

for the solution of the complete eigenvalue problem",

Mathematical epidemiology of infectious diseases, Wiley

series in mathematical and computational biology, West

Sussex, England: J ohn Wiley & Sons

Korn, Granino A.; Korn, Theresa M. (2000), "Mathematical Handbook for Scientists and Engineers: Definitions,

Theorems, and Formulas for Reference and Review", New York: McGraw-Hill (1152 p., Dover Publications, 2

Revised edition), Bibcode:1968mhse.book.....K , ISBN 0-486-41147-8.

Lipschutz, Seymour (1991), Schaum's outline of theory and problems of linear algebra, Schaum's outline series

(2nd ed.), New York, NY: McGraw-Hill Companies, ISBN 0-07-038007-4.

Friedberg, Stephen H.; Insel, Arnold J .; Spence, Lawrence E. (1989), Linear algebra (2nd ed.), Englewood Cliffs,

NJ 07632: Prentice Hall, ISBN 0-13-537102-3.

Aldrich, J ohn (2006), "Eigenvalue, eigenfunction, eigenvector, and related terms" , in J eff Miller (Editor), Earliest

Known Uses of Some of the Words of Mathematics , retrieved 2006-08-22

Strang, Gilbert (1993), Introduction to linear algebra, Wellesley-Cambridge Press, Wellesley, MA, ISBN 0-

9614088-5-5.

Strang, Gilbert (2006), Linear algebra and its applications, Thomson, Brooks/Cole, Belmont, CA, ISBN 0-03-

010567-6.

Bowen, Ray M.; Wang, Chao-Cheng (1980), Linear and multilinear algebra, Plenum Press, New York, NY,

ISBN 0-306-37508-7.

Cohen-Tannoudji, Claude (1977), "Chapter II. The mathematical tools of quantum mechanics", Quantum

mechanics, J ohn Wiley & Sons, ISBN 0-471-16432-1.

Fraleigh, J ohn B.; Beauregard, Raymond A. (1995), Linear algebra (3rd ed.), Addison-Wesley Publishing

Company, ISBN 0-201-83999-7 (international edition).

Golub, Gene H.; Van Loan, Charles F. (1996), Matrix computations (3rd Edition), J ohns Hopkins University Press,

Baltimore, MD, ISBN 978-0-8018-5414-9.

Hawkins, T. (1975), "Cauchy and the spectral theory of matrices", Historia Mathematica 2: 129,

doi:10.1016/0315-0860(75)90032-4 .

Horn, Roger A.; J ohnson, Charles F. (1985), Matrix analysis, Cambridge University Press, ISBN 0-521-30586-1

(hardback), ISBN 0-521-38632-2 (paperback) Check | i sbn= value (help).

Kline, Morris (1972), Mathematical thought from ancient to modern times, Oxford University Press, ISBN 0-19-

501496-0.

Meyer, Carl D. (2000), Matrix analysis and applied linear algebra, Society for Industrial and Applied Mathematics

(SIAM), Philadelphia, ISBN 978-0-89871-454-8.

Brown, Maureen (October 2004), Illuminating Patterns of Perception: An Overview of Q Methodology.

Golub, Gene F.; van der Vorst, Henk A. (2000), "Eigenvalue computation in the 20th century", Journal of

Computational and Applied Mathematics 123 : 3565, doi:10.1016/S0377-0427(00)00413-1 .

Akivis, Max A.; Vladislav V. Goldberg (1969), Tensor calculus, Russian, Science Publishers, Moscow.

Gelfand, I. M. (1971), Lecture notes in linear algebra, Russian, Science Publishers, Moscow.

Alexandrov, Pavel S. (1968), Lecture notes in analytical geometry, Russian, Science Publishers, Moscow.

Carter, Tamara A.; Tapia, Richard A.; Papaconstantinou, Anne, Linear Algebra: An Introduction to Linear Algebra

for Pre-Calculus Students , Rice University, Online Edition, retrieved 2008-02-19.

Roman, Steven (2008), Advanced linear algebra (3rd ed.), New York, NY: Springer Science +Business Media,

LLC, ISBN 978-0-387-72828-5.

Shilov, Georgi E. (1977), Linear algebra (translated and edited by Richard A. Silverman ed.), New York: Dover

Publications, ISBN 0-486-63518-X.

Hefferon, J im (2001), Linear Algebra , Online book, St Michael's College, Colchester, Vermont, USA.

Kuttler, Kenneth (2007), An introduction to linear algebra (PDF), Online e-book in PDF format, Brigham Young

University.

Demmel, J ames W. (1997), Applied numerical linear algebra, SIAM, ISBN 0-89871-389-7.

Beezer, Robert A. (2006), A first course in linear algebra , Free online book under GNU licence, University of

Puget Sound.

Lancaster, P. (1973), Matrix theory, Russian, Moscow, Russia: Science Publishers.

Halmos, Paul R. (1987), Finite-dimensional vector spaces (8th ed.), New York, NY: Springer-Verlag, ISBN 0-387-

[edit]

References

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

[show] V T E

[hide] V T E

The Wikibook Linear Algebra

has a page on the topic of:

Eigenvalues and

Eigenvectors

The Wikibook The Book of

Mathematical Proofs has a page

on the topic of: Algebra/Linear

Transformations

90093-4.

Pigolkina, T. S. and Shulman, V. S., Eigenvalue (in Russian), In:Vinogradov, I. M. (Ed.), Mathematical

Encyclopedia, Vol. 5, Soviet Encyclopedia, Moscow, 1977.

Greub, Werner H. (1975), Linear Algebra (4th Edition), Springer-Verlag, New York, NY, ISBN 0-387-90110-8.

Larson, Ron; Edwards, Bruce H. (2003), Elementary linear algebra (5th ed.), Houghton Mifflin Company, ISBN 0-

618-33567-6.

Curtis, Charles W., Linear Algebra: An Introductory Approach, 347 p., Springer; 4th ed. 1984. Corr. 7th printing

edition (August 19, 1999), ISBN 0-387-90992-3.

Shores, Thomas S. (2007), Applied linear algebra and matrix analysis, Springer Science+Business Media, LLC,

ISBN 0-387-33194-8.

Sharipov, Ruslan A. (1996), Course of Linear Algebra and Multidimensional Geometry: the textbook,

arXiv:math/0405323 , ISBN 5-7477-0099-5.

Gohberg, Israel; Lancaster, Peter; Rodman, Leiba (2005), Indefinite linear algebra and applications, Basel-Boston-

Berlin: Birkhuser Verlag, ISBN 3-7643-7349-0.

What are Eigen Values? non-technical introduction from

PhysLink.com's "Ask the Experts"

Eigen Values and Eigen Vectors Numerical Examples Tutorial and

Interactive Program from Revoledu.

Introduction to Eigen Vectors and Eigen Values lecture from Khan

Academy

Hill, Roger (2009). " Eigenvalues" . Sixty Symbols. Brady Haran for

the University of Nottingham.

Theor y

Hazewinkel, Michiel, ed. (2001), "Eigen value" , Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-

010-4

Hazewinkel, Michiel, ed. (2001), "Eigen vector" , Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-

010-4

Eigenvalue (of a matrix) , PlanetMath.org.

Eigenvector Wolfram MathWorld

Eigen Vector Examination working applet

Same Eigen Vector Examination as above in a Flash demo with sound

Computation of Eigenvalues

Numerical solution of eigenvalue problems Edited by Zhaojun Bai, J ames Demmel, J ack Dongarra, Axel Ruhe,

and Henk van der Vorst

Eigenvalues and Eigenvectors on the Ask Dr. Math forums: [1] , [2]

Onl i ne c al c ul at or s

arndt-bruenner.de

bluebit.gr

wims.unice.fr

Demonst r at i on appl et s

J ava applet about eigenvectors in the real plane

Topi c s r el at ed t o l i near al gebr a

Ar eas of mat hemat i c s

Ar eas

Arithmetic Algebra (elementary linear multilinear abstract Geometry (discrete algebraic differential finite

Trigonometry Calculus/Analysis Functional analysis Set theory Logic Category theory Number theory

Combinatorics Graph theory Topology Lie theory Differential equations/Dynamical systems

Mathematical physics Numerical analysis Computation Information theory Probability Mathematical statistics

Mathematical optimization Control theory Game theory Representation theory

Di vi si ons Pure mathematics Applied mathematics Discrete mathematics Computational mathematics

Cat egor y Mathematics portal Outline Lists

[edit]

External links

Eigenvalues and eigenvectors - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors[06/05/2013 11:28:46]

Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view

This page was last modified on 3 May 2013 at 17:49.

Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the

Terms of Use and Privacy Policy.

Wikipediais a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.

Categories: Mathematical physics Abstract algebra Linear algebra Matrix theory

Singular value decomposition