Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
http://davetang.org/muse/2012/02/01/step-by-step-principal-components-analysis-using-r/
Ive always wondered about the calculations behind a principal components analysis (PCA). An extremely useful
tutorial explains the key concepts and runs through the analysis. Here I use R to calculate each step of a PCA in
hopes of better understanding the analysis.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1 of 5
2, 1, 1.5, 1.1)
1.6, 1.1, 1.6, 0.9)
0.49
23-04-2012 16:21
Step by step principal components analysis using R | | Musings from a PhD candidateMusings from a...
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
2 of 5
http://davetang.org/muse/2012/02/01/step-by-step-principal-components-analysis-using-r/
#[1] 0.7165556
covar_matrix <- matrix(c(cov(x,x),cov(x,y),cov(y,x),cov(y,y)),nrow=2,ncol=2,byrow=TRUE
covar_matrix
#
x
y
#x 0.6165556 0.6154444
#y 0.6154444 0.7165556
eigen(covar_matrix)
#$values
#[1] 1.2840277 0.0490834
#
#$vectors
#
[,1]
[,2]
#[1,] 0.6778734 -0.7351787
#[2,] 0.7351787 0.6778734
e <- eigen(covar_matrix)
#to get the first principal components
#multiple the first eigenvector to mean transformed x, y values
for (i in 1:length(x_less_mean)){
pc1 <- (x_less_mean[i] * e$vectors[1,1]) + (y_less_mean[i] * e$vectors[2,1])
print(pc1)
}
#[1] 0.8279702
#[1] -1.77758
#[1] 0.9921975
#[1] 0.2742104
#[1] 1.675801
#[1] 0.9129491
#[1] -0.09910944
#[1] -1.144572
#[1] -0.4380461
#[1] -1.223821
for (i in 1:length(x_less_mean)){
pc2 <- (x_less_mean[i] * e$vectors[1,2]) + (y_less_mean[i] * e$vectors[2,2])
print(pc2)
23-04-2012 16:21
Step by step principal components analysis using R | | Musings from a PhD candidateMusings from a...
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
3 of 5
http://davetang.org/muse/2012/02/01/step-by-step-principal-components-analysis-using-r/
}
#[1] -0.1751153
#[1] 0.1428572
#[1] 0.384375
#[1] 0.1304172
#[1] -0.2094985
#[1] 0.1752824
#[1] -0.3498247
#[1] 0.04641726
#[1] 0.01776463
#[1] -0.1626753
z <- array(0, dim=c(10,2))
for (i in 1:length(x_less_mean)){
pc1 <- (x_less_mean[i] * e$vectors[1,1]) + (y_less_mean[i] * e$vectors[2,1])
pc2 <- (x_less_mean[i] * e$vectors[1,2]) + (y_less_mean[i] * e$vectors[2,2])
z[i,1] <- pc1
z[i,2] <- pc2
}
z
#
[,1]
[,2]
# [1,] 0.82797019 -0.17511531
# [2,] -1.77758033 0.14285723
# [3,] 0.99219749 0.38437499
# [4,] 0.27421042 0.13041721
# [5,] 1.67580142 -0.20949846
# [6,] 0.91294910 0.17528244
# [7,] -0.09910944 -0.34982470
# [8,] -1.14457216 0.04641726
# [9,] -0.43804614 0.01776463
#[10,] -1.22382056 -0.16267529
#now do this using the inbuilt prcomp() function
data <- data.frame(x,y)
data.pca <- prcomp(data)
data.pca
23-04-2012 16:21
Step by step principal components analysis using R | | Musings from a PhD candidateMusings from a...
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
4 of 5
http://davetang.org/muse/2012/02/01/step-by-step-principal-components-analysis-using-r/
#Standard deviations:
#[1] 1.1331495 0.2215477
#
#Rotation:
#
PC1
PC2
#x -0.6778734 0.7351787
#y -0.7351787 -0.6778734
data.pca$x
#
PC1
PC2
# [1,] -0.82797019 0.17511531
# [2,] 1.77758033 -0.14285723
# [3,] -0.99219749 -0.38437499
# [4,] -0.27421042 -0.13041721
# [5,] -1.67580142 0.20949846
# [6,] -0.91294910 -0.17528244
# [7,] 0.09910944 0.34982470
# [8,] 1.14457216 -0.04641726
# [9,] 0.43804614 -0.01776463
#[10,] 1.22382056 0.16267529
op<-par(mfrow=c(1,2))
plot(z)
plot(data.pca$x)
23-04-2012 16:21
Step by step principal components analysis using R | | Musings from a PhD candidateMusings from a...
http://davetang.org/muse/2012/02/01/step-by-step-principal-components-analysis-using-r/
Im still trying to figure out why the signs are inverted between calculating the principal components manually vs.
using prcomp(). I guess the relationships dont change but was curious as to why this was the case. However in the
tutorial listed at the start of this post, first principal components are identical however the second principal
components are inverted.
This entry was posted in R, Statistics and tagged pca, R, statistics. Bookmark the permalink.
5 of 5
23-04-2012 16:21