Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
What is PCA
Principal Component Analysis (PCA) is a dimensionality-reduction technique that is often used to
transform a high-dimensional dataset into a smaller-dimensional subspace prior to running a
machine learning algorithm on the data.
Reducing the dimensionality of the dataset reduces the size of the space on which k-
nearest-neighbors (kNN) must calculate distance, which improve the performance of kNN.
Reducing the dimensionality of the dataset reduces the number of degrees of freedom of
the hypothesis, which reduces the risk of overfitting.
Most algorithms will run significantly faster if they have fewer dimensions they need to look
at.
Reducing the dimensionality via PCA can simplify the dataset, facilitating description,
visualization, and insight.
Math of PCA
In [1]: import numpy as np
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 1/21
12/22/2017 Principle Component Analysis
print("Array:")
print(A) # our array
print("---")
print("Dimensions:")
print(A.shape) # shape
print("---")
print("Mean across Rows:")
print(np.mean(A,axis=0))
Array:
[[ 3 7]
[-4 -6]
[ 7 8]
[ 1 -1]
[-4 -1]
[-3 -7]]
---
Dimensions:
(6, 2)
---
Mean across Rows:
[ 0. 0.]
In [3]: # Note: you can convert this easily into a DataFrame ...
import pandas as pd
df = pd.DataFrame(A, columns = ['a0', 'a1'])
print(df)
a0 a1
0 3 7
1 -4 -6
2 7 8
3 1 -1
4 -4 -1
5 -3 -7
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 2/21
12/22/2017 Principle Component Analysis
Covariance
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 3/21
12/22/2017 Principle Component Analysis
In [6]: # plots
plt.scatter(A[:,0],A[:,1])
# annotations
for i in range(m):
plt.annotate('('+str(A[i,0])+','+str(A[i,1])+')',(A[i,0]+0.2,A[i,1]+0.2))
# axes
plt.plot([-6,8],[0,0],'grey') # x-axis
plt.plot([0,0],[-8,10],'grey') # y-axis
plt.axis([-6, 8, -8, 10])
plt.axes().set_aspect('equal')
# labels
plt.xlabel("$a_0$")
plt.ylabel("$a_1$")
plt.title("Dataset $A$")
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 4/21
12/22/2017 Principle Component Analysis
Σ = (mA−A1)
T
4. Eigen-decomposition of Σ
According to Wikipedia article on PCA
(https://en.m.wikipedia.org/wiki/Principal_component_analysis), "PCA can be done by eigenvalue
decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data
matrix." I choose the first approach.
Σ
* is a real, symmetric matrix; thus, it has 1) real eigenvalues and 2) orthogonal eigenvectors.
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 5/21
12/22/2017 Principle Component Analysis
In [12]: l, X = np.linalg.eig(Sigma)
print("Eigenvalues:")
print(l)
print("---")
print("Eigenvectors:")
print(X)
Eigenvalues:
[ 3.07417596 56.92582404]
---
Eigenvectors:
[[-0.82806723 -0.56062881]
[ 0.56062881 -0.82806723]]
Recall from your Linear Algebra class that the following should hold:
Σx0 = λ0 x0
Σx1 = λ1 x1
In [13]: # let's check the first Eigenvalue, Eigenvector combination
print("Sigma times eigenvector:")
print(Sigma @ X[:,0]) # 2x2 times 2x1
print("Eigenvalue times eigenvector:")
print(l[0] * X[:,0]) # scalar times 2x1, ANNOYING - MUST USE * vs. @
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 6/21
12/22/2017 Principle Component Analysis
Out[16]: 0.0
1.0
1.0
In [18]: # plots
plt.scatter(A[:,0],A[:,1])
scale = 3 # increase this scaling factor to highlight these vectors
plt.plot([0,X[0,1]*scale],[0,X[1,1]*scale],'r') # First principal component
plt.plot([0,X[0,0]*scale],[0,X[1,0]*scale],'g') # Second principal component
# annotations
for i in range(m):
plt.annotate('('+str(A[i,0])+','+str(A[i,1])+')',(A[i,0]+0.2,A[i,1]+0.2))
# axes
plt.plot([-6,8],[0,0],'grey') # x-axis
plt.plot([0,0],[-8,10],'grey') # y-axis
plt.axis([-6, 8, -8, 10])
plt.axes().set_aspect('equal')
# labels
plt.xlabel("$a_0$")
plt.ylabel("$a_1$")
plt.title("Eigenvectors of $\Sigma$")
5. Dimensionality Reduction: 2D to 1D
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 7/21
12/22/2017 Principle Component Analysis
Compressed version of A:
[[ -7.47835704]
[ 7.21091862]
[-10.54893951]
[ 0.26743842]
[ 3.07058247]
[ 7.47835704]]
In [22]: Arec = Acomp @ pc1.T # 6x1 @ 1x2, this breaks with np.array
print("Reconstruction from 1D compression of A:")
print(Arec)
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 8/21
12/22/2017 Principle Component Analysis
In [24]: # plots
plt.scatter([Amat[:,0]], [Amat[:,1]]# Dimensionality Reduction: Principal Componen
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(1)
X = np.dot(np.random.random(size=(2, 2)), np.random.normal(size=(2, 200))).T
plt.plot(X[:, 0], X[:, 1], 'o')
plt.axis('equal');
We can see that there is a definite trend in the data. What PCA seeks to do is to
To see what these numbers mean, let's view them as vectors plotted on top of the d
Notice that one vector is longer than the other. In a sense, this tells us that th
The explained variance quantifies this measure of "importance" in direction.
Another way to think of it is that the second principal component could be **comp
By specifying that we want to throw away 5% of the variance, the data is now comp
X_new = clf.inverse_transform(X_trans)
plt.plot(X[:, 0], X[:, 1], 'o', alpha=0.2)
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 9/21
12/22/2017 Principle Component Analysis
The light points are the original data, while the dark points are the projected ve
This is the sense in which "dimensionality reduction" works: if you can approximat
The dimensionality reduction might seem a bit abstract in two dimensions, but the
This gives us an idea of the relationship between the digits. Essentially, we have
PCA is a very useful dimensionality reduction algorithm, because it has a very int
The input data is represented as a vector: in the case of the digits, our data is
$$
x = [x_1, x_2, x_3 \cdots]
$$
$$
image(x) = x_1 \cdot{\rm (pixel~1)} + x_2 \cdot{\rm (pixel~2)} + x_3 \cdot{\rm (pi
$$
sns.set_style('white')
plot_image_components(digits.data[0])
But the pixel-wise representation is not the only choice. We can also use other *b
$$
image(x) = {\rm mean} + x_1 \cdot{\rm (basis~1)} + x_2 \cdot{\rm (basis~2)} + x_3
$$
What PCA does is to choose optimal **basis functions** so that only a few are need
The low-dimensional representation of our data is the coefficients of this series
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 10/21
12/22/2017 Principle Component Analysis
Here we see that with only six PCA components, we recover a reasonable approximat
Thus we see that PCA can be viewed from two angles. It can be viewed as **dimensio
But how much information have we thrown away? We can figure this out by looking a
sns.set()
pca = PCA().fit(X)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance');
Here we see that our two-dimensional projection loses a lot of information (as mea
As we mentioned, PCA can be used for is a sort of data compression. Using a small
Here's what a single digit looks like as you change the number of components:
for i, ax in enumerate(axes.flat):
pca = PCA(i + 1).fit(X)
im = pca.inverse_transform(pca.transform(X[20:21]))
def plot_digits(n_components):
fig = plt.figure(figsize=(8, 8))
plt.subplot(1, 1, 1, frameon=False, xticks=[], yticks=[])
nside = 10
pca = PCA(n_components).fit(X)
Xproj = pca.inverse_transform(pca.transform(X[:nside ** 2]))
Xproj = np.reshape(Xproj, (nside, nside, 8, 8))
total_var = pca.explained_variance_ratio_.sum()
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 11/21
12/22/2017 Principle Component Analysis
# axes
plt.plot([-6,8],[0,0],'grey') # x-axis
plt.plot([0,0],[-8,10],'grey') # y-axis
plt.axis([-6, 8, -8, 10])
plt.axes().set_aspect('equal')
# labels
plt.xlabel("$a_0$")
plt.ylabel("$a_1$")
plt.title("Reconstructing the 1D compression of $A$")
2
1
By tacking on the Rank-1 matrix related to the 2nd eigenvector you get back to the original data:
In [26]: # Add the Rank 1 matrix for the other vector to recover A completely
Amat @ Xmat[:,1] @ Xmat[:,1].T + Amat @ Xmat[:,0] @ Xmat[:,0].T
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 12/21
12/22/2017 Principle Component Analysis
In [28]: # plots
plt.scatter(A[:,0], A[:,1]) # A in blue
plt.plot(Arec[:,0],Arec[:,1],'r', marker='o') # Arec in RED
# across observations
for i in range(m):
e = np.vstack((A[i],Arec[i]))
plt.plot(e[:,0],e[:,1],'b') # BLUE
# axes
plt.plot([-6,8],[0,0],'grey') # x-axis
plt.plot([0,0],[-8,10],'grey') # y-axis
plt.axis([-6, 8, -8, 10])
plt.axes().set_aspect('equal')
# labels
plt.xlabel("$a_0$")
plt.ylabel("$a_1$")
plt.title("Back to $A$")
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 13/21
12/22/2017 Principle Component Analysis
6. Variance Retained
0.948763733928
Out[30]: 0.94876373392787527
1. A
Normalize columns of so that each feature has zero mean
2. Compute sample covariance matrix Σ = AT A/(m − 1)
3. Σ
Perform eigen-decomposition of using np.linalg.eig(Sigma)
4. k
Compress by ordering evectors according to largest evalues and compute AXk
5. Reconstruct from compressed version by computing AXk XkT
Implementation with scikit-learn
In [41]: from sklearn import datasets
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 14/21
12/22/2017 Principle Component Analysis
Notes
-----
Data Set Characteristics:
:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica
:Summary Statistics:
References
----------
- Fisher,R.A. "The use of multiple measurements in taxonomic problems"
Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
Mathematical Statistics" (John Wiley, NY, 1950).
- Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 15/21
12/22/2017 Principle Component Analysis
print("Dimensions:")
print(A0.shape)
print("---")
print("First 5 samples:")
print(A0[:5,:])
print("---")
print("Feature names:")
print(iris.feature_names)
Dimensions:
(150, 4)
---
First 5 samples:
[[ 5.1 3.5 1.4 0.2]
[ 4.9 3. 1.4 0.2]
[ 4.7 3.2 1.3 0.2]
[ 4.6 3.1 1.5 0.2]
[ 5. 3.6 1.4 0.2]]
---
Feature names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (c
m)']
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 16/21
12/22/2017 Principle Component Analysis
[-2.88981954 0.13734561]
[-2.7464372 0.31112432]
[-2.72859298 -0.33392456]]
---
Reconstructed version - 2D to 4D:
[[ 5.08718247 3.51315614 1.4020428 0.21105556]
[ 4.75015528 3.15366444 1.46254138 0.23693223]
[ 4.70823155 3.19151946 1.30746874 0.17193308]
[ 4.64598447 3.05291508 1.46083069 0.23636736]
[ 5.07593707 3.5221472 1.36273698 0.19458132]]
print("Principal components:")
print(pca.components_)
print("---")
print("Compressed - 4D to 2D:")
print(pca.transform(A0)[:5,:]) # first 5 obs
print("---")
print("Reconstructed - 2D to 4D:")
print(pca.inverse_transform(pca.transform(A0))[:5,:]) # first 5 obs
Principal components:
[[ 0.36158968 -0.08226889 0.85657211 0.35884393]
[ 0.65653988 0.72971237 -0.1757674 -0.07470647]]
---
Compressed - 4D to 2D:
[[-2.68420713 0.32660731]
[-2.71539062 -0.16955685]
[-2.88981954 -0.13734561]
[-2.7464372 -0.31112432]
[-2.72859298 0.33392456]]
---
Reconstructed - 2D to 4D:
[[ 5.08718247 3.51315614 1.4020428 0.21105556]
[ 4.75015528 3.15366444 1.46254138 0.23693223]
[ 4.70823155 3.19151946 1.30746874 0.17193308]
[ 4.64598447 3.05291508 1.46083069 0.23636736]
[ 5.07593707 3.5221472 1.36273698 0.19458132]]
Another Example
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 18/21
12/22/2017 Principle Component Analysis
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
In [32]: np.random.seed(1)
X = np.dot(np.random.random(size=(2, 2)), np.random.normal(size=(2, 200))).T
plt.plot(X[:, 0], X[:, 1], 'o')
plt.axis('equal');
We can see that there is a definite trend in the data. What PCA seeks to do is to find the Principal
Axes in the data, and explain how important those axes are in describing the data distribution:
[ 0.75871884 0.01838551]
[[-0.94446029 -0.32862557]
[-0.32862557 0.94446029]]
To see what these numbers mean, let's view them as vectors plotted on top of the data:
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 19/21
12/22/2017 Principle Component Analysis
Notice that one vector is longer than the other. In a sense, this tells us that that direction in the data
is somehow more "important" than the other direction. The explained variance quantifies this
measure of "importance" in direction.
Another way to think of it is that the second principal component could be completely ignored
without much loss of information! Let's see what our data look like if we only keep 95% of the
variance:
(200, 2)
(200, 1)
By specifying that we want to throw away 5% of the variance, the data is now compressed by a
factor of 50%! Let's see what the data look like after this compression:
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 20/21
12/22/2017 Principle Component Analysis
The light points are the original data, while the dark points are the projected version. We see that
after truncating 5% of the variance of this dataset and then reprojecting it, the "most important"
features of the data are maintained, and we've compressed the data by 50%!
This is the sense in which "dimensionality reduction" works: if you can approximate a data set in a
lower dimension, you can often have an easier time visualizing it or fitting complicated models to the
data.
In [ ]:
http://localhost:8888/notebooks/session%2023_24/ML10/Principle%20Component%20Analysis.ipynb 21/21