Sei sulla pagina 1di 5

Step by step principal components analysis using R | | Musings from a PhD candidateMusings from a...

http://davetang.org/muse/2012/02/01/step-by-step-principal-components-analysis-using-r/

Musings from a PhD candidate


The best way to have a good idea is to
have lots of ideas Linus Pauling

Step by step principal components analysis using R


Posted on February 1, 2012 by Davo

Ive always wondered about the calculations behind a principal components analysis (PCA). An extremely useful
tutorial explains the key concepts and runs through the analysis. Here I use R to calculate each step of a PCA in
hopes of better understanding the analysis.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1 of 5

x <- c(2.5, 0.5, 2.2, 1.9, 3.1, 2.3,


y <- c(2.4, 0.7, 2.9, 2.2, 3.0, 2.7,
mean(x)
#[1] 1.81
mean(y)
#[1] 1.91
x_less_mean <- x - mean(x)
x_less_mean
# [1] 0.69 -1.31 0.39 0.09 1.29
y_less_mean <- y - mean(y)
y_less_mean
# [1] 0.49 -1.21 0.99 0.29 1.09
cov(x,y)
#[1] 0.6154444
cov(x,x)
#[1] 0.6165556
cov(y,y)

2, 1, 1.5, 1.1)
1.6, 1.1, 1.6, 0.9)

0.49

0.19 -0.81 -0.31 -0.71

0.79 -0.31 -0.81 -0.31 -1.01

23-04-2012 16:21

Step by step principal components analysis using R | | Musings from a PhD candidateMusings from a...

18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
2 of 5

http://davetang.org/muse/2012/02/01/step-by-step-principal-components-analysis-using-r/

#[1] 0.7165556
covar_matrix <- matrix(c(cov(x,x),cov(x,y),cov(y,x),cov(y,y)),nrow=2,ncol=2,byrow=TRUE
covar_matrix
#
x
y
#x 0.6165556 0.6154444
#y 0.6154444 0.7165556
eigen(covar_matrix)
#$values
#[1] 1.2840277 0.0490834
#
#$vectors
#
[,1]
[,2]
#[1,] 0.6778734 -0.7351787
#[2,] 0.7351787 0.6778734
e <- eigen(covar_matrix)
#to get the first principal components
#multiple the first eigenvector to mean transformed x, y values
for (i in 1:length(x_less_mean)){
pc1 <- (x_less_mean[i] * e$vectors[1,1]) + (y_less_mean[i] * e$vectors[2,1])
print(pc1)
}
#[1] 0.8279702
#[1] -1.77758
#[1] 0.9921975
#[1] 0.2742104
#[1] 1.675801
#[1] 0.9129491
#[1] -0.09910944
#[1] -1.144572
#[1] -0.4380461
#[1] -1.223821
for (i in 1:length(x_less_mean)){
pc2 <- (x_less_mean[i] * e$vectors[1,2]) + (y_less_mean[i] * e$vectors[2,2])
print(pc2)
23-04-2012 16:21

Step by step principal components analysis using R | | Musings from a PhD candidateMusings from a...

52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
3 of 5

http://davetang.org/muse/2012/02/01/step-by-step-principal-components-analysis-using-r/

}
#[1] -0.1751153
#[1] 0.1428572
#[1] 0.384375
#[1] 0.1304172
#[1] -0.2094985
#[1] 0.1752824
#[1] -0.3498247
#[1] 0.04641726
#[1] 0.01776463
#[1] -0.1626753
z <- array(0, dim=c(10,2))
for (i in 1:length(x_less_mean)){
pc1 <- (x_less_mean[i] * e$vectors[1,1]) + (y_less_mean[i] * e$vectors[2,1])
pc2 <- (x_less_mean[i] * e$vectors[1,2]) + (y_less_mean[i] * e$vectors[2,2])
z[i,1] <- pc1
z[i,2] <- pc2
}
z
#
[,1]
[,2]
# [1,] 0.82797019 -0.17511531
# [2,] -1.77758033 0.14285723
# [3,] 0.99219749 0.38437499
# [4,] 0.27421042 0.13041721
# [5,] 1.67580142 -0.20949846
# [6,] 0.91294910 0.17528244
# [7,] -0.09910944 -0.34982470
# [8,] -1.14457216 0.04641726
# [9,] -0.43804614 0.01776463
#[10,] -1.22382056 -0.16267529
#now do this using the inbuilt prcomp() function
data <- data.frame(x,y)
data.pca <- prcomp(data)
data.pca
23-04-2012 16:21

Step by step principal components analysis using R | | Musings from a PhD candidateMusings from a...

86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

4 of 5

http://davetang.org/muse/2012/02/01/step-by-step-principal-components-analysis-using-r/

#Standard deviations:
#[1] 1.1331495 0.2215477
#
#Rotation:
#
PC1
PC2
#x -0.6778734 0.7351787
#y -0.7351787 -0.6778734
data.pca$x
#
PC1
PC2
# [1,] -0.82797019 0.17511531
# [2,] 1.77758033 -0.14285723
# [3,] -0.99219749 -0.38437499
# [4,] -0.27421042 -0.13041721
# [5,] -1.67580142 0.20949846
# [6,] -0.91294910 -0.17528244
# [7,] 0.09910944 0.34982470
# [8,] 1.14457216 -0.04641726
# [9,] 0.43804614 -0.01776463
#[10,] 1.22382056 0.16267529
op<-par(mfrow=c(1,2))
plot(z)
plot(data.pca$x)

23-04-2012 16:21

Step by step principal components analysis using R | | Musings from a PhD candidateMusings from a...

http://davetang.org/muse/2012/02/01/step-by-step-principal-components-analysis-using-r/

Im still trying to figure out why the signs are inverted between calculating the principal components manually vs.
using prcomp(). I guess the relationships dont change but was curious as to why this was the case. However in the
tutorial listed at the start of this post, first principal components are identical however the second principal
components are inverted.
This entry was posted in R, Statistics and tagged pca, R, statistics. Bookmark the permalink.

Musings from a PhD candidate


Proudly powered by WordPress.

5 of 5

23-04-2012 16:21

Potrebbero piacerti anche