Sei sulla pagina 1di 15

Factor Analysis

Akanksha S. Kashikar∗

February 5, 2017

Introduction

Another important technique for the analysis of multivariate data-sets is Factor Analysis. Factor analysis
is mainly used when we suspect that the p observed variables are controlled by k non-observable variables
known as latent variables (k < p). For example, intelligence has many components such as linguistic ability,
numerical ability, spatial reasoning ability etc. These abilities cannot be measured directly. Suppose we give
a battery of tests to an individual. It consists of some tests related to linguistic ability such as comprehension.
All these scores related to language based tests are expected to be correlated to each other. Hence, we can
say that they are controlled by a common factor named as ‘linguistic ability’.

There are various methods of carrying out factor analysis. The two important methods of factor analysis are
maximum likelihood method and principal component method. Maximum likelihood method can be used
only when the data-set is from a multivariate normal population. The mathematical details can be found in
Hardle and Simar (2012, chap. 11) or Johnson and Wichern (2013, chap. 9).

Details with an Illustration

As an illustration, let us carry out factor analysis of Olympic Decathlon data. For carrying out factor analysis,
it is enough to have correlation matrix. The file ‘Olympic-corr.txt’ contains a correlation matrix for the data.
Suppose we have data on p variables (columns) and we want to extract k common factors. The degrees of
freedom for a k-factor model fitted on a p-variate data are given by,

(p − k)2 − (p + k)
DF = .
2

Therefore, we first have to make sure that (p − k)2 > (p + k). Towards that, let us check the dimension of
our data-set.

∗ Department of Statistics, Savitribai Phule Pune University


akanksha.kashikar@gmail.com

1
setwd("F:/Multivariate - Jan-May 2017/Reading Material/Factor Analysis")
O2 <- read.delim("Olympic-corr.txt",header=FALSE)
class(O2)

## [1] "data.frame"

O2=as.matrix(O2) #To convert it to matrix


NO=c("X100m.run","long.jump","shotput", "high.jump","X400m.run","X110.meter.hurdles",
"discus.throw","Pole.vault","Javelin","X1500m.run")
rownames(O2)=colnames(O2)=NO
dim(O2)

## [1] 10 10

We have data on 10 variables. Hence, to get positive degrees of freedom, value of k can be at most five. Let
us first fit a factor model with four factors using maximum likelihood method.

F=factanal(covmat=O2,factors=4, rotation="none")

The algorithm is iterative and hence it is important to check whether it has converged or not. We also need
to assess the importance of each factor to examine if we can reduce the number of factors. The output of
factanal command gives information on all these aspects. The components of the output can be listed and
viewed as follows.

names(F)

## [1] "converged" "loadings" "uniquenesses" "correlation"


## [5] "criteria" "factors" "dof" "method"
## [9] "n.obs" "call"

F$converged ##TRUE indicates that the algorithm has converged

## [1] TRUE

##

2
## Call:
## factanal(factors = 4, covmat = O2, rotation = "none")
##
## Uniquenesses:
## X100m.run long.jump shotput
## 0.010 0.388 0.089
## high.jump X400m.run X110.meter.hurdles
## 0.327 0.196 0.734
## discus.throw Pole.vault Javelin
## 0.304 0.420 0.725
## X1500m.run
## 0.600
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4
## X100m.run 0.993
## long.jump 0.665 0.252 0.239 0.220
## shotput 0.530 0.777 -0.141
## high.jump 0.363 0.428 0.421 0.425
## X400m.run 0.571 0.620 -0.304
## X110.meter.hurdles 0.343 0.190 0.323
## discus.throw 0.401 0.718 -0.102
## Pole.vault 0.439 0.407 0.390 0.263
## Javelin 0.218 0.461
## X1500m.run 0.609 -0.145
##
## Factor1 Factor2 Factor3 Factor4
## SS loadings 2.686 1.794 1.187 0.539
## Proportion Var 0.269 0.179 0.119 0.054
## Cumulative Var 0.269 0.448 0.567 0.621
##
## The degrees of freedom for the model is 11 and the fit was 0.0578

Uniquenesses represents the specific variance of each variable, i.e., the variation in that variable which
cannot be explained by the common factors. The command factanal uses correlation matrix to carry out
factor analysis. Hence, Uniquenesses will range from 0 to 1. Lower values of Uniquenesses indicate that
the common factors explain most of the variables completely and hence the model is good. In the current

3
output, X100m.run and shotput have very small values of Uniquenesses. Hence, we can say that the 4-factor
model explains most of the variability in these variables. On the other hand, for variables like Javelin,
X110.meter.hurdles, X1500m.run, the values of Uniquenesses are higher than 0.5. This shows that the
variation in these variables cannot be explained completely by the four common factors. Further, looking
at Proportion Var in the loadings part of the output, one can see that the contribution of fourth factor is
around 5%. In unrotated factor model, the factors are arranged in the decreasing order of their importance.
Hence, by excluding the fifth factor, we are not losing much. If we have the entire data available to us, to
examine whether the contribution from the remaining factors is statistically insignificant, we can refer to the
p-value reported by the factanal command. Another approach to decide the number of factors is to first fit
a model with maximum possible value of k and then plot the values of Proportion Var. We will get a plot
similar to the scree plot in Principal Component Analysis. The location of elbow can give us the number of
important factors.

The loadings represent the extent to which a factor affects any given variable. Also, sign of the loadings can
indicate groups present in the variables. Hence, loadings can help us give names to the common underlying
factors.

load<-F$loadings[,1:4] #estimated factor loadings


round(load,3)

## Factor1 Factor2 Factor3 Factor4


## X100m.run 0.993 -0.069 -0.021 0.002
## long.jump 0.665 0.252 0.239 0.220
## shotput 0.530 0.777 -0.141 -0.080
## high.jump 0.363 0.428 0.421 0.425
## X400m.run 0.571 0.019 0.620 -0.304
## X110.meter.hurdles 0.343 0.190 0.089 0.323
## discus.throw 0.401 0.718 -0.102 -0.095
## Pole.vault 0.439 0.407 0.390 0.263
## Javelin 0.218 0.461 0.084 -0.085
## X1500m.run -0.016 0.091 0.609 -0.145

The first factor is not much interpretable. shotput and discus.throw have high positive loadings on the second
factor. Hence, it can be termed as Arm Strength. To visualize such groups in the variables, let us plot the
loadings corresponding to the first four factors on a graph.

4
plot(load[,1],load[,2],type="n",xlab="Factor 1",ylab="Factor 2",
main="Unrotated Factors 1 & 2",xlim=c(min(load[,1])-0.1,max(load[,1])+0.1),
ylim=c(min(load[,2])-0.1,max(load[,2])+0.1),
font.main=1,cex.lab=1.1,cex.axis=1.1,cex.main=1.4)
text(load[,1],load[,2],colnames(O2),cex=1)
abline(h=0,v=0)

Unrotated Factors 1 & 2

shotput
discus.throw
0.6
Factor 2

Javelinhigh.jump
Pole.vault
long.jump
0.2

X110.meter.hurdles
X1500m.run
X400m.run
X100m.run
−0.2

0.0 0.2 0.4 0.6 0.8 1.0

Factor 1

plot(load[,3],load[,4],type="n",xlab="Factor 3",ylab="Factor 4",


main="Unrotated Factors 3 & 4",xlim=c(min(load[,3])-0.1,max(load[,3])+0.1),
ylim=c(min(load[,4])-0.1,max(load[,4])+0.1),
font.main=1,cex.lab=1.1,cex.axis=1.1,cex.main=1.4)
text(load[,3],load[,4],colnames(O2),cex=1)
abline(h=0,v=0)

5
Unrotated Factors 3 & 4

high.jump
0.0 0.2 0.4

X110.meter.hurdles
long.jump Pole.vault
Factor 4

X100m.run
shotput
discus.throw Javelin
X1500m.run

X400m.run
−0.4

−0.2 0.0 0.2 0.4 0.6

Factor 3

To get more interpretable factors, many a times factor loadings are subjected to rotation using an orthogonal
matrix. One of the common methods of rotation is varimax. Let us try this rotation on our loadings.

ROTF=varimax(load)
ROTF

## $loadings
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4
## X100m.run 0.928 0.205 0.296
## long.jump 0.451 0.280 0.155 0.554
## shotput 0.228 0.883 0.278
## high.jump 0.254 0.242 0.739
## X400m.run 0.519 0.142 0.701 0.151
## X110.meter.hurdles 0.173 0.136 0.465
## discus.throw 0.133 0.794 0.220
## Pole.vault 0.168 0.314 0.279 0.612
## Javelin 0.477 0.139 0.160
## X1500m.run 0.619 0.111

6
##
## Factor1 Factor2 Factor3 Factor4
## SS loadings 1.471 1.958 1.057 1.719
## Proportion Var 0.147 0.196 0.106 0.172
## Cumulative Var 0.147 0.343 0.449 0.621
##
## $rotmat
## [,1] [,2] [,3] [,4]
## [1,] 0.90835160 0.2640645 0.01883451 0.3237477
## [2,] -0.36054195 0.8863810 0.05304426 0.2855250
## [3,] -0.08393702 -0.2001050 0.91294419 0.3456089
## [4,] -0.19458019 -0.3233523 -0.40418372 0.8331971

The blank spaces in the above output represent near-zero (i.e., insignificant) values. After rotation, the
importance of factors gets changed. From Proportion Var, we can see that in the rotated model, Factor 2 and
Factor 4 are slightly more important than Factor 1.

RLD=ROTF$loadings[,1:4]
round(RLD,3)

## Factor1 Factor2 Factor3 Factor4


## X100m.run 0.928 0.205 -0.005 0.296
## long.jump 0.451 0.280 0.155 0.554
## shotput 0.228 0.883 -0.045 0.278
## high.jump 0.057 0.254 0.242 0.739
## X400m.run 0.519 0.142 0.701 0.151
## X110.meter.hurdles 0.173 0.136 -0.033 0.465
## discus.throw 0.133 0.794 -0.009 0.220
## Pole.vault 0.168 0.314 0.279 0.612
## Javelin 0.041 0.477 0.139 0.160
## X1500m.run -0.071 0.001 0.619 0.111

From the coefficients, we can see that Factor 2 can still be termed as Arm Strength. Factor 3 has high
positive loadings corresponding to X400m.run and X1500m.run. Hence, this factor can be termed as Running
Endurance. In Factor 1, loadings corresponding to X100m.run, X400m.run (and long.jump) are high. This
factor could be called Running Speed. Lastly, long.jump, high.jump, Pole.vault and X110.meter.hurdles load
significantly on Factor 4. Hence, this factor can be termed as Leg Strength. Similar grouping can be observed

7
in the loadings plot. It may be clear from this discussion that domain knowledge is very crucial while arriving
at a good model in factor analysis.

plot(RLD[,1],RLD[,2],type="n",xlab="Factor 1",ylab="Factor 2",


main="Varimax Rotated Factors 1 & 2",xlim=c(min(RLD[,1])-0.1,max(RLD[,1])+0.1),
ylim=c(min(RLD[,2])-0.1,max(RLD[,2])+0.1),
font.main=1,cex.lab=1.1,cex.axis=1.1,cex.main=1.4)
text(RLD[,1],RLD[,2],colnames(O2),cex=1)
abline(h=0,v=0)

Varimax Rotated Factors 1 & 2

shotput
0.8

discus.throw
Factor 2

Javelin
0.4

Pole.vault long.jump
high.jump
X100m.run
X110.meter.hurdles X400m.run
0.0

X1500m.run

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

Factor 1

plot(RLD[,3],RLD[,4],type="n",xlab="Factor 3",ylab="Factor 4",


main="Varimax Rotated Factors 3 & 4",xlim=c(min(RLD[,3])-0.1,max(RLD[,3])+0.1),
ylim=c(min(RLD[,4])-0.1,max(RLD[,4])+0.1),
font.main=1,cex.lab=1.1,cex.axis=1.1,cex.main=1.4)
text(RLD[,3],RLD[,4],colnames(O2),cex=1)
abline(h=0,v=0)

8
0.8 Varimax Rotated Factors 3 & 4

high.jump
0.6

Pole.vault
long.jump
Factor 4

X110.meter.hurdles
0.4

X100m.run
shotput
0.2

discus.throw
Javelin X400m.run
X1500m.run
0.0

0.0 0.2 0.4 0.6 0.8

Factor 3

Application

In the current section, we discuss one application of factor analysis. Some of the techniques mentioned here
are not part of our syllabus. However, this discussion will be helpful in understanding the objective of factor
analysis as well as the interpretation of results obtained in factor analysis.

This section is based on a case study related to appreciation of music by different people. It is a summary of
the research paper Mas-Herrero et al. (2013).

Objectives of the Study

It is generally agreed that music is one of the most pleasurable stimuli and that it has an important role in
emotion evocation and mood regulation. This is the case even though music, like other aesthetic stimuli,
is abstract and does not directly imply any obvious natural advantage, as do other biological reinforcers
such as food. It has been empirically demonstrated using behavioral measures that music elicits emotional
responses that are accompanied by physiological changes. In addition, several neuroimaging studies have
shown the activation of emotion and reward-related brain networks during pleasurable music listening. Thus,
the involvement of reward and emotional brain circuits for music could explain the widespread value people

9
assign to music, and may be crucial for understanding why this human activity persists across cultures and
generations.

However, even considering the strong emotional impact of music in humans, these affective responses are highly
specific to cultural and personal preferences, and large individual differences are observed across individuals
in how music is experienced. Indeed, little is known about the sources of this interindividual variability in
musical reward experiences, or to what degree the differences in the amount of pleasure experienced in music
listening are related to personality variables, or other temperamental dispositions, or to individual differences
in reward experience in other domains. Thus it is of special interest to understand which sources or latent
variables underlie this ability to experience reward and emotion due to musical processing in humans.

Several factors could contribute to the individual differences experienced in music reward. For example,
there is general agreement that music is capable of inducing a significant emotional impact in humans and
individual differences in this factor might explain to a certain degree the differences observed on the amount
of pleasure experienced in music. However, this effect might also be influenced (although not necessarily)
by the ability to perceive and decode emotions from music fragments. A second important aspect is the
ability of listeners to use music as a mood or hedonic (≈ Psychology term meaning ‘involving pleasurable
feelings’) regulator. Current empirical evidence suggests that this might be an important purpose of music
listening, used in order to change or release emotions, to enjoy, comfort, or even to relieve stress as well
as for relaxation purposes or as a background accompaniment to everyday activities. Moreover, music has
traditionally been effectively used in rituals, and more recently in marketing or film in order to manipulate
and induce hedonic states in humans. Mood improvement has also been observed in stroke patients after
intensive music listening. A third aspect to bear in mind as a source of individual differences is the strong
impact that music has in humans through the capacity to spontaneously and intuitively synchronize our body
movements to a rhythm’s beat, using simple movements (e.g., toe tapping or head nodding) or more complex
ones such as dancing. These activities likely are important because the experience of pleasure induced by the
practice of these activities involve the complex coordination of cortical and subcortical somatosensory-motor
brain networks. A further factor could be related to the capacity of music to serve as a magnet for human
social activities and for bonding individuals into groups. Indeed, one of the most important adaptative
functions of music that is crucial for its evolution might be the ability to promote social contact, an aspect
that is evident in all cultures. Social contact might be mediated through the inherent pleasure of sharing
music-related activities (concerts, music preferences, cultural events, dancing, etc.). Finally and related to
the last issue, large differences are usually observed in the way listeners extract, pursue, share, and seek
information regarding specific music pieces, composers, performers, or other information related to music.
This interest in “knowing about music” could be reflected in many situations and everyday activities, for
example, attending live concerts, talking about one’s favorite music, seeking formal knowledge about music
(e.g., classes or conferences), trying to learn to play an instrument, or simply increasing the amount of time
devoted to music listening. Listeners might as well experience pleasure when recognizing musical quotations

10
or allusions to other works. Studies have shown that there is a large variability in how people engage in
music-related activities. Some studies have also shown that shared music preferences create deep social bonds
across individuals, increasing the social attraction between them. In sum, several factors among the ones
previously listed could explain the differences in music pleasure experienced in humans.

The main aim of the present investigation is to provide a fine-grained description of the facets or factors
of music experience that could explain the variance observed in how people experience reward associated
with music listening and music-related activities. With that dimensional approach in mind, the authors first
developed a new psychometric measure, the Barcelona Music Reward Questionnaire (BMRQ), which was
administered to three groups of participants. Initially it was tested on a large sample of Spanish participants
through an internet application with a large number of test questions (Study 1). After adusting for the
modifications required on the basis of this pilot stidy, a new version of the questionnaire was administered to
a second large group of Spanish students (Study 2) in order to avoid an initial sample bias (the participation
of persons highly interested in music activities). Third, and with the aim of generalization, the questionnaire
was translated and adapted the questionnaire into English, and it was again administered through an internet
platform to a large, relatively unbiased international population (Study 3). In this third case, the instructions
did not indicate that the questions being addressed were related to music (to avoid the sample bias mentioned
above). Afterwards this English version of the questionnaire was analyzed using factor analysis.

A second important question is the degree to which reward-seeking tendencies in music are associated with the
capacity to experience reward in other reward related domains (e.g., physical reward experiences). This is an
interesting question that might shed some light on the debate about the specificity of the brain mechanisms
involved in music processing, and in particular on the involvement of the same reward mechanisms across
different type of domains (other biological and drug reinforcers). With that aim in mind, participants in the
first and third sample were also requested to answer other similar scales related to the domains of individual
differences in the susceptibility to avoid possible negative events (punishments) or the tendency to seek
positive experiences or rewards (BIS/BAS scales, PAS - frther details in the next section). This approach also
allowed the authors to explore the question of whether any individuals exhibit significant musical anhedonia
(Psycology term for inability to feel pleasure) in the absence of more general anhedonic traits.

Method

Initially a pool of 112 items (which served as variables) was created in order to cover a large range of activities
and situations associated with reward and pleasure experiences related to music from which to select a smaller
number of appropriate items. The first pool of items was created by the authors based on the theoretical
background and information regarding pleasure and music gathered from two focus groups (musicians,
nonmusicians). The initial content of the statements related to music experience could be initially categorized
in six broad categories: music seeking activities, mood regulation, emotion evocation, sensorymotor behavior,

11
social rewarding experiences, and musical memory. In addition, four items were included as a measure
of social desirability. Each item described situations that participants could experience in their daily life.
Participants were requested to indicate the level of agreement with the sentence by using a five-point scale
ranging from “fully disagree” (1) to “fully agree” (5). Hence, the observations were on the ordinal scale of 1
to 5.

Further, to correlate the results of factor analysis with other mental/emotional characteristics, versions
of the BIS/BAS and PAS scales were also included as questionnaires. The BIS/BAS scale evaluates two
general motivational systems underling behavior and affect reactions based on Gray’s personality theory
the behavioral inhibition system (BIS) which regulates aversive situations by moving away from unpleasant
events and the behavioural activation system (BAS), with three subscales (Reward Responsiveness, Drive,
and Fun Seeking), which regulate appetitive situations by moving toward desired events. Physical Anhedonia
Scale (PAS) evaluates difficulty in feeling physical and aesthetic pleasure in response to typical pleasurable
physical stimuli (food, beautiful scenes, etc.).

Results

Study 1 served as a pilot study. By analysing the data from Study 1 using various techniques including
factor analysis, the initial questionnaire was updated to include only 20 important items (by removing
ambiguous/less important items, i.e. variables). Studies 2 and 3 collected data only on these 20 items
(p = 20).

In this section, we discuss some of the interesting findings from these three studies.

As mentioned earlier, the instructions of the test in third study did not indicate that the study was specifically
focussed on music in order to avoid a sampling bias effect (e.g., that only people interested in music chose to
participate in the internet test). To verify that this has worked, participants were asked at the end of the
questionnaire whether they were aware that the test had anything to do with music in particular: 73.8%
of participants responded they were not aware that the study had anything to do with music in particular,
17.9% were aware of that, and 8.3% did not respond that question.

Since the data are ordinal, the authors used Minimum Rank Factor Analysis based on Polychoric correlation
matrix. The loadings and the factors were further subjected to oblique semi-specified Procrustean rotation to
get more interpretable results. The final loadings obtained from the factor analysis calculations are presented
below.

The variables have been rearranged to put the related items together. Significantly large loading values are
printed in bold. As per the observed results, the first five factors can be described as follows:

• MS: Musical Seeking - Looking for new music, Informing oneself about new music and Spending money

12
on music. Hardly listening to music, as expected has received large negative loading in this factor.

• EE: Emotional Evocation - All the variables which indicate the emotional connection (item no. 18, 12,
8, 3) have received high positive loading.

• MR: Mood Regulation - Variables related to mood such as relaxation, eliminitaing loneliness etc (item
no. 14,9,19,4) have large positive loadings on this factor.

• SM: Sensory-Motor - Dancing, Tapping, Humming have received large positive loadings whereas Not
dancing has received a large negative loading.

• SR: Social Reward - Activities involving other people (item no. 13, 1, 6, 16) have received large positive
loadings on this factor.

The scores obtained from this factor analysis were compared with the scores of the BAS/BIS, PAS scales.
The following correlation matrix presents these correlations and their significance.

As expected, Drive to engage in pleasurable activities is significantly positively correlated with Emotional
Evocation, Fun Seeking behaviour is significantly positively correlated with Sensory-Motor (representing
dancing, tapping feet etc) and Social Reward. Almost all the factor scores are significanlty positively correlated
with Reward seeking behaviour and Moving away from unpleasant situations. Further, all the factor scores are
significantly negatively correlated with PAS scores. This indicates that those who enjoy music (for any reason)
are in general capable of enjoying other physical pleasures such as food etc, whereas those who cannot enjoy
physical pleasures are generally incapable of enjoying music as well. This oservation can also be represented by
the following three dimensional graph of normalized scores for BMRQ and PAS. The number of participants
in each location of the scatter is represented as an increase in the elevation of the cone (i.e.,z-coordinate,
lighter colors representing larger number of participants). Notice that as expected considering the negative
correlation between the BMRQ and the PAS, most of the participants are represented in the right upper
part of the scatterplot, showing low PAS scores in parallel with high BMRQ scores. Importantly for the
present study, the grey circles indicate those participants with low sensitivity to musical reward (below
−1.5 SD from the mean) but normal scores in the anhedonia scale (between ±1.5 SD from the mean). This

13
population represents roughly 5.5% of the total sample. This is an interesting finding which may help in
further neurological research.

Many-a-times, comparison of factor scores across different groups may lead to some interesting conclusions.
Following figures represent such comparisons.

Musical Seeking factor seems to have a decreasing trend with respect to the age-groups. From the confidence
intervals of the scores plotted in the figure, it can be seen that the oldest groups has significantly low Musical
Seeking score than many of the younger age-groups. This seems intuitively correct. Further, from the
secod graph it can be seen that the trained participants presented significantly higher values in Musical
Seeking (p < 0.001), and Emotion Evocation (p < 0.05). Finally, both groups presented similar values in
Sensory-Motor and Mood-Regulation factors.

Lastly, as an example of how factor scores can be visualized, we present graphical representation of the factor
scores of three participants in the present study. The score for each factor is represented as a solid line
departing from the center and creating the impression of a star (longer line ≡ higher factor score). The dotted
line is obtained by joining the mean scores of each factor for the general population, while the surrounding
grey area represents one standard deviation above and below the mean value for each particular factor score.
Figure A represents a prototypical highly music-hedonic (highly interested in music as almost all the scores
are more than mean + SD) participant, while in Figure B, a music-anhedonic (not at all interested in music
as all the scores are below mean − SD) participant is depicted. The participant represented in Figure C
shows normal values in Emotion Evocation, Mood Regulation, and Social Reward and extreme values in
Musical Seeking (high) and the Sensory-Motor (low) factors.

14
References

Hardle, Wolfgang Karl, and Leopold Simar. 2012. Applied Multivariate Statistical Analysis. Springer-Verlag.

Johnson, Richard A., and Dean W. Wichern. 2013. Applied Multivariate Statistical Analysis. Indian edition
published by PHI Learning Private Ltd, Delhi.

Mas-Herrero, Ernest, Josep Marco-Pallares, Urbano Lorenzo-Seva, Robert J. Zatorre, and Antoni Rodriguez-
Fornells. 2013. “Individual Differences in Music Reward Experiences.” Music Perception 31 (2): 118–38.

15

Potrebbero piacerti anche