Pca Vs Pls

PCA vs PLS
Maya Hristakeva
University of California, Santa Cruz
May 13, 2009
Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 1 / 20

Outline
1 Linear Regression
2 Prinicpal Component Analysis
3 Partial Least Squares

Outline
Setup
Data matrix (instances as columns):
X = [x1 ... xT ] ∈ RN x T
Reference values:
y = [y1 ... yT ]T ∈ RT x 1
Goal: minimize square loss

T
1X T 1
min (xi w − yi )2 ≡ min ||XT w − y||2
w 2 w 2
i=1

Outline
Variance and Covariance

1
PT
Expectation of X = [x1 ... xT ]: E[X] = T i=1 xi
Variance of X:
T
1 X
var(X) = cov(X, X) = (xi − E[X])(xi − E[X])T
T i=1
Covariance of X = [x1 ... xT ] and Z = [z1 ... zT ]

T
1 X
cov(X, Z) = (xi − E[X])(zi − E[Z])T
T i=1
In this presentation, we assume that X and y are mean-centered:
E[X] = 0 and E[y] = 0

Linear Regression
Outline
1 Linear Regression

Linear Regression
Linear Regression
Least Squares optimization problem:
T
1X T 1
L(w) = min (xi w − yi )2 ≡ min ||XT w − y||2
w 2 w 2
i=1
Differentiate w.r.t. w:
∇w L(w) = X(XT w − y) = 0
XXT w = Xy
Exact solution:
w? = (XXT )−1 Xy
Note: XXT is not always invertible

Linear Regression
Ridge Regression
Regularization penalizes large values of ||w||22
1 λ
L(w) = min ||XT w − y||2 + ||w||2
w 2 2
Differentiate w.r.t. w:
∇w L(w) = X(XT w − y) + λw = 0
(XXT + λI)w = Xy
Exact solution:
w? = (XXT + λI)−1 Xy
Note: XXT + λI is always invertible for λ > 0

Prinicpal Component Analysis
Outline
1 Linear Regression

Compression Loss Minimization
Find a rank k projection matrix P for which the compression loss is

minimized:
T
X
min ||Pxi − xi ||2 ≡ min ||PX − X||2
P P
i=1
= min tr ((I − P)XXT )
P
= max tr (PXXT )
P
T
X
= max tr var (P̃T xi )
P̃
i=1
where P is a projection matrix of rank k.

Projection Matrix Properties
Properties of P:
P2 = P ∈ RNxN
P = ki=1 pi pTi = P̃P̃T for P̃ = [p1 ...pk ] ∈ RNxK
P
pTi pi = 1 (i.e. pi has unit-length)

pTi pj = 0 for i 6= j (i.e. pi and pj are orthogonal)

Variance Maximization
Find k projection directions P̃ = [p1 ...pk ] for which the variance of
the compressed data (P̃T X) is maximized:
T T
X 1 X T
max tr var (P̃T xi ) ≡ max tr (P̃ xi )(P̃T xi )T
P̃
i=1
P̃ T i=1
N
1 X T
= max tr (xi P̃P̃T} xi )
P̃ T i=1 | {z
P
N
1 X
= max tr (P (xi xTi ))
P T i=1
1
= max tr (P XXT )
P
|T {z }
C

PCA Solution
Let C = XXT : covariance matrix of X
X
max tr (PC) = max tr (P( γi ci cTi )
P P
i
X
= max γi tr (cTi Pci )
P | {zP }
i
cT
i Pci ≤1, i cT
i Pci =k
X
≤ max
P γi δi
0≤δi ≤1, i δi =k
i
k
X
= max γij = k largest eigenvalues of C
1≤i1 <i2 <ik ≤n
j=1
Hence, P consists of the eigenvectors corresponding to the k largest

eigenvalues of C.
Principal Component Regression

Principal Component Regression ≡ PCA + Linear Regression
Use PCA to find a k−rank projection matrix P = P̃P̃T
min ||PX − X||2

P
Minimize square loss

1
arg min ||(P̃T X)T w − y||2
w 2
Solution:
w? = (P̃T XXT P̃)−1 P̃T Xy ∈ Rk x 1

Summary of PCA
Finds a set of k orthogonal direction

Directions of maximum variance of XXT
Minimizes compression error (i.e. best approximation of X)
Ignores all information about y while constructing the projection
matrix P

Partial Least Squares
Outline
1 Linear Regression

Partial Least Squares (PLS)

Finds components from X that are also relevant to y
PLS finds projection directions for which the covariance between
X and y is maximized:
T
X
T 2
arg max(cov (X pi , y)) = arg max( (xTj pi )yj )2
pi pi
j=1
T
X
= arg max(tr (pTi (xj yj ))2
pi
j=1
= arg max(tr (pTi Xy))2

pi
= arg max(pTi Xy)(pTi Xy)T

pi
= arg max pTi XyyT XT pi

pi
Finding the First PLS Direction p1
Finding p1
arg max pT1 XyyT XT p1 s.t. pT1 p1 = 1

p1
L(p1 , λ) = pT1 XyyT XT p1 − λ(pT1 p1 − 1)

∇p1 L = XyyT XT p1 − λp1 = 0
XyyT XT p1 = λp1
Hence, p1 is the largest eigenvector of XyyT XT .

Finding the remaining k − 1 PLS directions
Since (XyyT XT ) is a rank-1 matrix, an additional orthogonality

constraints is used to find the remaining k − 1 PLS projection
directions
arg max pTi XyyT XT pi

pi
s.t. pTi pi = 1 and pTi XXT pj = 0 for1 ≤ j < i

PLS Regression
PLS Regression ≡ PLS Decomposition + Linear Regression
Use PLS to find a projection directions pi
max(cov (XT pi , y))2

pi
s.t. pTi pi = 1 and pTi XXT pj = 0 for1 ≤ j < i

Minimize square loss
1
arg min ||(P̃T X)T w − y||2
w 2
Solution:
w? = (P̃T XXT P̃)−1 P̃T Xy
for P̃ = [p1 ... pk ]
Summary
PCA and PLS:

Differ in the optimization problem they solve to find a projection
matrix P
Are all linear decomposition techniques
Can be combined with various loss function other than square
loss

Pca Vs Pls

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Pca Vs Pls

Caricato da

Copyright:

Formati disponibili

PCA vs PLS

University of California, Santa Cruz

May 13, 2009

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 1 / 20

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 2 / 20

Data matrix (instances as columns):

Goal: minimize square loss

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 3 / 20

Variance and Covariance

Covariance of X = [x1 ... xT ] and Z = [z1 ... zT ]

In this presentation, we assume that X and y are mean-centered:

E[X] = 0 and E[y] = 0

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 4 / 20

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 5 / 20

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 6 / 20

Regularization penalizes large values of ||w||22

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 7 / 20

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 8 / 20

Compression Loss Minimization

Find a rank k projection matrix P for which the compression loss is

where P is a projection matrix of rank k.

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 9 / 20

Projection Matrix Properties

pTi pi = 1 (i.e. pi has unit-length)

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 10 / 20

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 11 / 20

Hence, P consists of the eigenvectors corresponding to the k largest

Principal Component Regression

Use PCA to find a k−rank projection matrix P = P̃P̃T

min ||PX − X||2

Minimize square loss

w? = (P̃T XXT P̃)−1 P̃T Xy ∈ Rk x 1

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 13 / 20

Finds a set of k orthogonal direction

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 14 / 20

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 15 / 20

Partial Least Squares (PLS)

= arg max(tr (pTi Xy))2

= arg max(pTi Xy)(pTi Xy)T

= arg max pTi XyyT XT pi

Finding the First PLS Direction p1

arg max pT1 XyyT XT p1 s.t. pT1 p1 = 1

L(p1 , λ) = pT1 XyyT XT p1 − λ(pT1 p1 − 1)

Hence, p1 is the largest eigenvector of XyyT XT .

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 17 / 20

Finding the remaining k − 1 PLS directions

Since (XyyT XT ) is a rank-1 matrix, an additional orthogonality

arg max pTi XyyT XT pi

s.t. pTi pi = 1 and pTi XXT pj = 0 for1 ≤ j < i

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 18 / 20

max(cov (XT pi , y))2

s.t. pTi pi = 1 and pTi XXT pj = 0 for1 ≤ j < i

PCA and PLS:

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 20 / 20

Potrebbero piacerti anche