Sei sulla pagina 1di 21

12/22/2017 Principle Component Analysis

What is PCA
Principal Component Analysis (PCA) is a dimensionality-reduction technique that is often used to
transform a high-dimensional dataset into a smaller-dimensional subspace prior to running a
machine learning algorithm on the data.

When should you use PCA?


It is often helpful to use a dimensionality-reduction technique such as PCA prior to performing
machine learning because:

Reducing the dimensionality of the dataset reduces the size of the space on which k-
nearest-neighbors (kNN) must calculate distance, which improve the performance of kNN.
Reducing the dimensionality of the dataset reduces the number of degrees of freedom of
the hypothesis, which reduces the risk of overfitting.
Most algorithms will run significantly faster if they have fewer dimensions they need to look
at.
Reducing the dimensionality via PCA can simplify the dataset, facilitating description,
visualization, and insight.

What does PCA do?


Principal Component Analysis does just what it advertises; it finds the principal components of the
dataset. PCA transforms the data into a new, lower-dimensional subspace—into a new coordinate
system—. In the new coordinate system, the first axis corresponds to the first principal component,
which is the component that explains the greatest amount of the variance in the data.

Math of PCA
In [1]: import numpy as np

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 1/21
12/22/2017 Principle Component Analysis

In [2]: # Create (6x2) array


A = np.array([
[ 3, 7],
[-4, -6],
[ 7, 8],
[ 1, -1],
[-4, -1],
[-3, -7]
])

m,n = A.shape # m-observations, n-features

print("Array:")
print(A) # our array

print("---")
print("Dimensions:")
print(A.shape) # shape

print("---")
print("Mean across Rows:")
print(np.mean(A,axis=0))

Array:
[[ 3 7]
[-4 -6]
[ 7 8]
[ 1 -1]
[-4 -1]
[-3 -7]]
---
Dimensions:
(6, 2)
---
Mean across Rows:
[ 0. 0.]

In [3]: # Note: you can convert this easily into a DataFrame ...
import pandas as pd
df = pd.DataFrame(A, columns = ['a0', 'a1'])
print(df)

a0 a1
0 3 7
1 -4 -6
2 7 8
3 1 -1
4 -4 -1
5 -3 -7

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 2/21
12/22/2017 Principle Component Analysis

In [4]: # ... and can go from df back to np.array


df.values

Out[4]: array([[ 3, 7],


[-4, -6],
[ 7, 8],
[ 1, -1],
[-4, -1],
[-3, -7]])

Covariance

In [5]: import matplotlib


import matplotlib.pyplot as plt
%matplotlib inline

# makes charts pretty


import seaborn as sns
sns.set(color_codes=True)

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 3/21
12/22/2017 Principle Component Analysis

In [6]: # plots
plt.scatter(A[:,0],A[:,1])

# annotations
for i in range(m):
plt.annotate('('+str(A[i,0])+','+str(A[i,1])+')',(A[i,0]+0.2,A[i,1]+0.2))

# axes
plt.plot([-6,8],[0,0],'grey') # x-axis
plt.plot([0,0],[-8,10],'grey') # y-axis
plt.axis([-6, 8, -8, 10])
plt.axes().set_aspect('equal')

# labels
plt.xlabel("$a_0$")
plt.ylabel("$a_1$")
plt.title("Dataset $A$")

Out[6]: <matplotlib.text.Text at 0x7fd0c6baa438>

Sample covariance between a0 and a1 :


∑m−1 (ai − a¯0 )(ai − a¯1 )
cova0,a1 = i=0 0 m − 1 1
In [7]: # Calculate covariance between a0 and a1
a0 = A[:,0]
a1 = A[:,1]
prod = a0*a1 # element-wise product, ignore means as zero already
print("Length of prod equals " + str(len(prod)))
print("---")
print("Covariance:")
print(np.sum(prod)/(m-1))

Length of prod equals 6


---
Covariance:
25.0

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 4/21
12/22/2017 Principle Component Analysis

In [8]: # Get more stuff using NumPy's covariance method


np.cov(a0,a1)

Out[8]: array([[ 20., 25.],


[ 25., 40.]])

Linear Algebra way:

Σ = (mA−A1)
T

In [9]: # Aside: What does A.T do?


A.T # or np.transpose(A)

Out[9]: array([[ 3, -4, 7, 1, -4, -3],


[ 7, -6, 8, -1, -1, -7]])

In [10]: # Matrix Multiplication, note @ operator


A.T @ A # or np.dot(A.T,A)

Out[10]: array([[100, 125],


[125, 200]])

In [11]: # Need to divide by (m-1) to yield true Sample Covariance Matrix


# Let's call this Sigma
Sigma = (A.T @ A)/(m-1) # or np.cov(A.T)
Sigma

Out[11]: array([[ 20., 25.],


[ 25., 40.]])

4. Eigen-decomposition of Σ
According to Wikipedia article on PCA
(https://en.m.wikipedia.org/wiki/Principal_component_analysis), "PCA can be done by eigenvalue
decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data
matrix." I choose the first approach.

Σ
* is a real, symmetric matrix; thus, it has 1) real eigenvalues and 2) orthogonal eigenvectors.

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 5/21
12/22/2017 Principle Component Analysis

In [12]: l, X = np.linalg.eig(Sigma)
print("Eigenvalues:")
print(l)
print("---")
print("Eigenvectors:")
print(X)

Eigenvalues:
[ 3.07417596 56.92582404]
---
Eigenvectors:
[[-0.82806723 -0.56062881]
[ 0.56062881 -0.82806723]]

Recall from your Linear Algebra class that the following should hold:

Σx0 = λ0 x0
Σx1 = λ1 x1
In [13]: # let's check the first Eigenvalue, Eigenvector combination
print("Sigma times eigenvector:")
print(Sigma @ X[:,0]) # 2x2 times 2x1
print("Eigenvalue times eigenvector:")
print(l[0] * X[:,0]) # scalar times 2x1, ANNOYING - MUST USE * vs. @

Sigma times eigenvector:


[-2.54562438 1.72347161]
Eigenvalue times eigenvector:
[-2.54562438 1.72347161]

In [14]: # ... and the second


print("Sigma times eigenvector:")
print(Sigma @ X[:,1]) # 2x2 times 2x1
print("Eigenvalue times eigenvector:")
print(l[1] * X[:,1]) # scalar times 2x1, ANNOYING - MUST USE * vs. @

Sigma times eigenvector:


[-31.91425695 -47.13840945]
Eigenvalue times eigenvector:
[-31.91425695 -47.13840945]

In [15]: print("The first principal component is evector with largest evalue:")


print(X[:,1])
print("---")
print("Second principal component:")
print(X[:,0])

The first principal component is evector with largest evalue:


[-0.56062881 -0.82806723]
---
Second principal component:
[-0.82806723 0.56062881]

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 6/21
12/22/2017 Principle Component Analysis

In [16]: # Orthogonal? A: Yes


X[:,1].T @ X[:,0]

Out[16]: 0.0

In [17]: # Length 1? A: Yes


print(np.sqrt(X[:,1].T @ X[:,1]))
print(np.sqrt(X[:,0].T @ X[:,0]))

1.0
1.0

In [18]: # plots
plt.scatter(A[:,0],A[:,1])
scale = 3 # increase this scaling factor to highlight these vectors
plt.plot([0,X[0,1]*scale],[0,X[1,1]*scale],'r') # First principal component
plt.plot([0,X[0,0]*scale],[0,X[1,0]*scale],'g') # Second principal component

# annotations
for i in range(m):
plt.annotate('('+str(A[i,0])+','+str(A[i,1])+')',(A[i,0]+0.2,A[i,1]+0.2))

# axes
plt.plot([-6,8],[0,0],'grey') # x-axis
plt.plot([0,0],[-8,10],'grey') # y-axis
plt.axis([-6, 8, -8, 10])
plt.axes().set_aspect('equal')

# labels
plt.xlabel("$a_0$")
plt.ylabel("$a_1$")
plt.title("Eigenvectors of $\Sigma$")

Out[18]: <matplotlib.text.Text at 0x7fd0c6a41588>

5. Dimensionality Reduction: 2D to 1D

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 7/21
12/22/2017 Principle Component Analysis

I discovered why Prof. Ng recommended Octave/Matlab versus Python. Linear algebra


expressions are not clean in Python. Had to convert np.array to matrix object. See below.

In [19]: # change to matrix


Amat = np.asmatrix(A)
Xmat = np.asmatrix(X)

In [20]: # Choose eigenvector with highest eigenvalue as first principal component


pc1 = Xmat[:,1]

In [21]: Acomp = Amat @ pc1 # 6x2 @ 2x1 yields 6x1


print("Compressed version of A:")
print(Acomp)

Compressed version of A:
[[ -7.47835704]
[ 7.21091862]
[-10.54893951]
[ 0.26743842]
[ 3.07058247]
[ 7.47835704]]

In [22]: Arec = Acomp @ pc1.T # 6x1 @ 1x2, this breaks with np.array
print("Reconstruction from 1D compression of A:")
print(Arec)

Reconstruction from 1D compression of A:


[[ 4.1925824 6.1925824 ]
[-4.04264872 -5.97112541]
[ 5.9140394 8.73523112]
[-0.14993368 -0.22145699]
[-1.72145699 -2.54264872]
[-4.1925824 -6.1925824 ]]

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 8/21
12/22/2017 Principle Component Analysis

In [24]: # plots
plt.scatter([Amat[:,0]], [Amat[:,1]]# Dimensionality Reduction: Principal Componen

Here we'll explore **Principal Component Analysis**, which is an extremely useful

We'll start with our standard set of initial imports:

from __future__ import print_function, division

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# use seaborn plotting style defaults


import seaborn as sns; sns.set()

## Introducing Principal Component Analysis

Principal Component Analysis is a very powerful unsupervised method for *dimensio

np.random.seed(1)
X = np.dot(np.random.random(size=(2, 2)), np.random.normal(size=(2, 200))).T
plt.plot(X[:, 0], X[:, 1], 'o')
plt.axis('equal');

We can see that there is a definite trend in the data. What PCA seeks to do is to

from sklearn.decomposition import PCA


pca = PCA(n_components=2)
pca.fit(X)
print(pca.explained_variance_)
print(pca.components_)

To see what these numbers mean, let's view them as vectors plotted on top of the d

plt.plot(X[:, 0], X[:, 1], 'o', alpha=0.5)


for length, vector in zip(pca.explained_variance_, pca.components_):
v = vector * 3 * np.sqrt(length)
plt.plot([0, v[0]], [0, v[1]], '-k', lw=3)
plt.axis('equal');

Notice that one vector is longer than the other. In a sense, this tells us that th
The explained variance quantifies this measure of "importance" in direction.

Another way to think of it is that the second principal component could be **comp

clf = PCA(0.95) # keep 95% of variance


X_trans = clf.fit_transform(X)
print(X.shape)
print(X_trans.shape)

By specifying that we want to throw away 5% of the variance, the data is now comp

X_new = clf.inverse_transform(X_trans)
plt.plot(X[:, 0], X[:, 1], 'o', alpha=0.2)

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 9/21
12/22/2017 Principle Component Analysis

plt.plot(X_new[:, 0], X_new[:, 1], 'ob', alpha=0.8)


plt.axis('equal');

The light points are the original data, while the dark points are the projected ve

This is the sense in which "dimensionality reduction" works: if you can approximat

### Application of PCA to Digits

The dimensionality reduction might seem a bit abstract in two dimensions, but the

from sklearn.datasets import load_digits


digits = load_digits()
X = digits.data
y = digits.target

pca = PCA(2) # project from 64 to 2 dimensions


Xproj = pca.fit_transform(X)
print(X.shape)
print(Xproj.shape)

plt.scatter(Xproj[:, 0], Xproj[:, 1], c=y, edgecolor='none', alpha=0.5,


cmap=plt.cm.get_cmap('nipy_spectral', 10))
plt.colorbar();

This gives us an idea of the relationship between the digits. Essentially, we have

### What do the Components Mean?

PCA is a very useful dimensionality reduction algorithm, because it has a very int
The input data is represented as a vector: in the case of the digits, our data is

$$
x = [x_1, x_2, x_3 \cdots]
$$

but what this really means is

$$
image(x) = x_1 \cdot{\rm (pixel~1)} + x_2 \cdot{\rm (pixel~2)} + x_3 \cdot{\rm (pi
$$

If we reduce the dimensionality in the pixel space to (say) 6, we recover only a p

from fig_code.figures import plot_image_components

sns.set_style('white')
plot_image_components(digits.data[0])

But the pixel-wise representation is not the only choice. We can also use other *b

$$
image(x) = {\rm mean} + x_1 \cdot{\rm (basis~1)} + x_2 \cdot{\rm (basis~2)} + x_3
$$

What PCA does is to choose optimal **basis functions** so that only a few are need
The low-dimensional representation of our data is the coefficients of this series

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 10/21
12/22/2017 Principle Component Analysis

from fig_code.figures import plot_pca_interactive


plot_pca_interactive(digits.data)

Here we see that with only six PCA components, we recover a reasonable approximat

Thus we see that PCA can be viewed from two angles. It can be viewed as **dimensio

### Choosing the Number of Components

But how much information have we thrown away? We can figure this out by looking a

sns.set()
pca = PCA().fit(X)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance');

Here we see that our two-dimensional projection loses a lot of information (as mea

### PCA as data compression

As we mentioned, PCA can be used for is a sort of data compression. Using a small

Here's what a single digit looks like as you change the number of components:

fig, axes = plt.subplots(8, 8, figsize=(8, 8))


fig.subplots_adjust(hspace=0.1, wspace=0.1)

for i, ax in enumerate(axes.flat):
pca = PCA(i + 1).fit(X)
im = pca.inverse_transform(pca.transform(X[20:21]))

ax.imshow(im.reshape((8, 8)), cmap='binary')


ax.text(0.95, 0.05, 'n = {0}'.format(i + 1), ha='right',
transform=ax.transAxes, color='green')
ax.set_xticks([])
ax.set_yticks([])

Let's take another look at this by using IPython's ``interact`` functionality to v

from IPython.html.widgets import interact

def plot_digits(n_components):
fig = plt.figure(figsize=(8, 8))
plt.subplot(1, 1, 1, frameon=False, xticks=[], yticks=[])
nside = 10

pca = PCA(n_components).fit(X)
Xproj = pca.inverse_transform(pca.transform(X[:nside ** 2]))
Xproj = np.reshape(Xproj, (nside, nside, 8, 8))
total_var = pca.explained_variance_ratio_.sum()

im = np.vstack([np.hstack([Xproj[i, j] for j in range(nside)])


for i in range(nside)])
plt.imshow(im)
plt.grid(False)

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 11/21
12/22/2017 Principle Component Analysis

plt.title("n = {0}, variance = {1:.2f}".format(n_components, total_var),


size=18)
plt.clim(0, 16)

interact(plot_digits, n_components=[1, 64], nside=[1, 8]);) # A in blue


plt.plot(Arec[:,0],Arec[:,1],'r', marker='o') # Arec in RED

# axes
plt.plot([-6,8],[0,0],'grey') # x-axis
plt.plot([0,0],[-8,10],'grey') # y-axis
plt.axis([-6, 8, -8, 10])
plt.axes().set_aspect('equal')

# labels
plt.xlabel("$a_0$")
plt.ylabel("$a_1$")
plt.title("Reconstructing the 1D compression of $A$")

Out[24]: <matplotlib.text.Text at 0x7fd0c67c7518>

In [25]: print(np.linalg.matrix_rank(Amat)) # originally a Rank 2 matrix


print(np.linalg.matrix_rank(Arec)) # reconstructed matrix is Rank 1

2
1

By tacking on the Rank-1 matrix related to the 2nd eigenvector you get back to the original data:

In [26]: # Add the Rank 1 matrix for the other vector to recover A completely
Amat @ Xmat[:,1] @ Xmat[:,1].T + Amat @ Xmat[:,0] @ Xmat[:,0].T

Out[26]: matrix([[ 3., 7.],


[-4., -6.],
[ 7., 8.],
[ 1., -1.],
[-4., -1.],
[-3., -7.]])

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 12/21
12/22/2017 Principle Component Analysis

In [27]: # Why does this work? Well, recall


# X @ X.T is identity matrix as X is orthonormal
A @ Xmat @ Xmat.T

Out[27]: matrix([[ 3., 7.],


[-4., -6.],
[ 7., 8.],
[ 1., -1.],
[-4., -1.],
[-3., -7.]])

In [28]: # plots
plt.scatter(A[:,0], A[:,1]) # A in blue
plt.plot(Arec[:,0],Arec[:,1],'r', marker='o') # Arec in RED

# across observations
for i in range(m):
e = np.vstack((A[i],Arec[i]))
plt.plot(e[:,0],e[:,1],'b') # BLUE

# axes
plt.plot([-6,8],[0,0],'grey') # x-axis
plt.plot([0,0],[-8,10],'grey') # y-axis
plt.axis([-6, 8, -8, 10])
plt.axes().set_aspect('equal')

# labels
plt.xlabel("$a_0$")
plt.ylabel("$a_1$")
plt.title("Back to $A$")

Out[28]: <matplotlib.text.Text at 0x7fd0c675cdd8>

Wicked animated GIF which illustrates PCA


(http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-
eigenvectors-eigenvalues)

Magically, eigen-decomposition (or PCA) finds the line where

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 13/21
12/22/2017 Principle Component Analysis

1. the spread of values along the black line is maximal


2. the projection error (sum of red lines) is minimal

6. Variance Retained

In [29]: # Average squared projection error using PC1


unexp_err = np.mean(np.sum(np.square(Amat - Arec),axis=1))
total_err = np.mean(np.sum(np.square(Amat),axis=1))
ret_err = 1 - (unexp_err / total_err) # percent of variance retained
print(ret_err)

0.948763733928

In [30]: # Using eigenvalues


l[1]/np.sum(l) # recall, use the 2nd eigenvalue

Out[30]: 0.94876373392787527

7. Summary of Eigen-decomposition Approach

1. A
Normalize columns of so that each feature has zero mean
2. Compute sample covariance matrix Σ = AT A/(m − 1)
3. Σ
Perform eigen-decomposition of using np.linalg.eig(Sigma)
4. k
Compress by ordering evectors according to largest evalues and compute AXk
5. Reconstruct from compressed version by computing AXk XkT
Implementation with scikit-learn
In [41]: from sklearn import datasets

Let's try going from 4D to 2D using the classical iris dataset.

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 14/21
12/22/2017 Principle Component Analysis

In [42]: iris = datasets.load_iris() # Bunch object


print(iris.DESCR)

Iris Plants Database


====================

Notes
-----
Data Set Characteristics:
:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica
:Summary Statistics:

============== ==== ==== ======= ===== ====================


Min Max Mean SD Class Correlation
============== ==== ==== ======= ===== ====================
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
============== ==== ==== ======= ===== ====================

:Missing Attribute Values: None


:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988

This is a copy of UCI ML iris datasets.


http://archive.ics.uci.edu/ml/datasets/Iris (http://archive.ics.uci.edu/ml/data
sets/Iris)

The famous Iris database, first used by Sir R.A Fisher

This is perhaps the best known database to be found in the


pattern recognition literature. Fisher's paper is a classic in the field and
is referenced frequently to this day. (See Duda & Hart, for example.) The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant. One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.

References
----------
- Fisher,R.A. "The use of multiple measurements in taxonomic problems"
Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
Mathematical Statistics" (John Wiley, NY, 1950).
- Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 15/21
12/22/2017 Principle Component Analysis

- Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System


Structure and Classification Rule for Recognition in Partially Exposed
Environments". IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. PAMI-2, No. 1, 67-71.
- Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions
on Information Theory, May 1972, 431-433.
- See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II
conceptual clustering system finds 3 classes in the data.
- Many, many more ...

In [43]: A0 = iris.data # np.array

print("Dimensions:")
print(A0.shape)

print("---")
print("First 5 samples:")
print(A0[:5,:])

print("---")
print("Feature names:")
print(iris.feature_names)

Dimensions:
(150, 4)
---
First 5 samples:
[[ 5.1 3.5 1.4 0.2]
[ 4.9 3. 1.4 0.2]
[ 4.7 3.2 1.3 0.2]
[ 4.6 3.1 1.5 0.2]
[ 5. 3.6 1.4 0.2]]
---
Feature names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (c
m)']

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 16/21
12/22/2017 Principle Component Analysis

In [44]: # Eigen-decomposition: 5-step process

# 1. Normalize columns of $A$ so that each feature has zero mean


mu = np.mean(A0,axis=0)
A = A0 - mu
print("Does A have zero mean across rows?")
print(np.mean(A,axis=0))

# 2. Compute sample covariance matrix $\Sigma = {A^TA}/{(m-1)}$


m,n = A.shape
Sigma = (A.T @ A)/(m-1)
print("---")
print("Sigma:")
print(Sigma)

# 3. Perform eigen-decomposition of $\Sigma$ using `np.linalg.eig(Sigma)`


l,X = np.linalg.eig(Sigma)
print("---")
print("Evalues:")
print(l)
print("---")
print("Evectors:")
print(X)

# 4. Compress by ordering $k$ evectors according to largest evalues and compute $A


print("---")
print("Compressed - 4D to 2D:")
Acomp = A @ X[:,:2] # first 2 evectors
print(Acomp[:5,:]) # first 5 observations

# 5. Reconstruct from compressed version by computing $A X_k X_k^T$


print("---")
print("Reconstructed version - 2D to 4D:")
Arec = A @ X[:,:2] @ X[:,:2].T # first 2 evectors
print(Arec[:5,:]+mu) # first 5 obs, adding mu to compare to original

Does A have zero mean across rows?


[ -1.12502600e-15 -6.75015599e-16 -3.23889064e-15 -6.06921920e-16]
---
Sigma:
[[ 0.68569351 -0.03926846 1.27368233 0.5169038 ]
[-0.03926846 0.18800403 -0.32171275 -0.11798121]
[ 1.27368233 -0.32171275 3.11317942 1.29638747]
[ 0.5169038 -0.11798121 1.29638747 0.58241432]]
---
Evalues:
[ 4.22484077 0.24224357 0.07852391 0.02368303]
---
Evectors:
[[ 0.36158968 -0.65653988 -0.58099728 0.31725455]
[-0.08226889 -0.72971237 0.59641809 -0.32409435]
[ 0.85657211 0.1757674 0.07252408 -0.47971899]
[ 0.35884393 0.07470647 0.54906091 0.75112056]]
---
Compressed - 4D to 2D:
[[-2.68420713 -0.32660731]
[-2.71539062 0.16955685]
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 17/21
12/22/2017 Principle Component Analysis

[-2.88981954 0.13734561]
[-2.7464372 0.31112432]
[-2.72859298 -0.33392456]]
---
Reconstructed version - 2D to 4D:
[[ 5.08718247 3.51315614 1.4020428 0.21105556]
[ 4.75015528 3.15366444 1.46254138 0.23693223]
[ 4.70823155 3.19151946 1.30746874 0.17193308]
[ 4.64598447 3.05291508 1.46083069 0.23636736]
[ 5.07593707 3.5221472 1.36273698 0.19458132]]

In [45]: # Using sklearn.decomposition.PCA

pca = PCA(n_components=2) # two components


pca.fit(A0) # run PCA, putting in raw version for fun

print("Principal components:")
print(pca.components_)

print("---")
print("Compressed - 4D to 2D:")
print(pca.transform(A0)[:5,:]) # first 5 obs

print("---")
print("Reconstructed - 2D to 4D:")
print(pca.inverse_transform(pca.transform(A0))[:5,:]) # first 5 obs

Principal components:
[[ 0.36158968 -0.08226889 0.85657211 0.35884393]
[ 0.65653988 0.72971237 -0.1757674 -0.07470647]]
---
Compressed - 4D to 2D:
[[-2.68420713 0.32660731]
[-2.71539062 -0.16955685]
[-2.88981954 -0.13734561]
[-2.7464372 -0.31112432]
[-2.72859298 0.33392456]]
---
Reconstructed - 2D to 4D:
[[ 5.08718247 3.51315614 1.4020428 0.21105556]
[ 4.75015528 3.15366444 1.46254138 0.23693223]
[ 4.70823155 3.19151946 1.30746874 0.17193308]
[ 4.64598447 3.05291508 1.46083069 0.23636736]
[ 5.07593707 3.5221472 1.36273698 0.19458132]]

Another Example

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 18/21
12/22/2017 Principle Component Analysis

In [31]: from __future__ import print_function, division

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# use seaborn plotting style defaults


import seaborn as sns; sns.set()

In [32]: np.random.seed(1)
X = np.dot(np.random.random(size=(2, 2)), np.random.normal(size=(2, 200))).T
plt.plot(X[:, 0], X[:, 1], 'o')
plt.axis('equal');

We can see that there is a definite trend in the data. What PCA seeks to do is to find the Principal
Axes in the data, and explain how important those axes are in describing the data distribution:

In [33]: from sklearn.decomposition import PCA


pca = PCA(n_components=2)
pca.fit(X)
print(pca.explained_variance_)
print(pca.components_)

[ 0.75871884 0.01838551]
[[-0.94446029 -0.32862557]
[-0.32862557 0.94446029]]

To see what these numbers mean, let's view them as vectors plotted on top of the data:

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 19/21
12/22/2017 Principle Component Analysis

In [34]: plt.plot(X[:, 0], X[:, 1], 'o', alpha=0.5)


for length, vector in zip(pca.explained_variance_, pca.components_):
v = vector * 3 * np.sqrt(length)
plt.plot([0, v[0]], [0, v[1]], '-k', lw=3)
plt.axis('equal');

Notice that one vector is longer than the other. In a sense, this tells us that that direction in the data
is somehow more "important" than the other direction. The explained variance quantifies this
measure of "importance" in direction.

Another way to think of it is that the second principal component could be completely ignored
without much loss of information! Let's see what our data look like if we only keep 95% of the
variance:

In [35]: clf = PCA(0.95) # keep 95% of variance


X_trans = clf.fit_transform(X)
print(X.shape)
print(X_trans.shape)

(200, 2)
(200, 1)

By specifying that we want to throw away 5% of the variance, the data is now compressed by a
factor of 50%! Let's see what the data look like after this compression:

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 20/21
12/22/2017 Principle Component Analysis

In [36]: X_new = clf.inverse_transform(X_trans)


plt.plot(X[:, 0], X[:, 1], 'o', alpha=0.2)
plt.plot(X_new[:, 0], X_new[:, 1], 'ob', alpha=0.8)
plt.axis('equal');

The light points are the original data, while the dark points are the projected version. We see that
after truncating 5% of the variance of this dataset and then reprojecting it, the "most important"
features of the data are maintained, and we've compressed the data by 50%!

This is the sense in which "dimensionality reduction" works: if you can approximate a data set in a
lower dimension, you can often have an easier time visualizing it or fitting complicated models to the
data.

In [ ]:

http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 21/21

Potrebbero piacerti anche