Sei sulla pagina 1di 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/304490328

Advice on Exploratory Factor Analysis

Working Paper · June 2016


DOI: 10.13140/RG.2.1.5013.9766

CITATIONS READS

6 14,926

1 author:

Peter Samuels
Birmingham City University
115 PUBLICATIONS   212 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Academic writing View project

Educational robotics enhancement classes View project

All content following this page was uploaded by Peter Samuels on 27 June 2016.

The user has requested enhancement of the downloaded file.


Advice on Exploratory Factor Analysis
Overview
Exploratory Factor Analysis (EFA) is a process which can be carried out to validate scales of
items in a questionnaire which has not been validated. This process is supported by SPSS.
Once a questionnaire has been validated, an appropriate process is Confirmatory Factor
Analysis. This is supported by the AMOS package, a ‘sister’ package to SPSS.
There are two forms of EFA known as Factor Analysis (FA) and Principal Component
Analysis (PCA). The reduced dimensions produced by a FA are known as factors whereas
those produced by a PCA are known as components. FA analyses the relationship between
the individual item variances and common variances shared between items whereas the
PCA analyses the relationships between the individual and total (common and error)
variances shared between items. FA is therefore preferable to PCA in the early stages of an
analysis as it allows you to measure the ratio of an item’s unique variance to its shared
variance, known as its communality. As dimension reduction techniques seek to identify
items with a shared variance, is advisable to remove any item with a communality score less
than 0.2 (Child, 2006). Items with low communality scores may indicate additional factors
which could be explored in further studies by measuring additional items (Costello and
Osborne, 2005).
Before carrying out an EFA the values of the bivariate correlation matrix of all items should
be analysed. High values are an indication of multicollinearity (although they are not a
necessary condition). Field (2013: 686) suggests removing one of a pair of items with
bivariate correlation scores greater than 0.8. There is no statistical means for deciding which
item of a pair to remove – this should be based on a qualitative interpretation. An additional
test is to look at the determinant of the matrix, which should be greater than 0.00001. A
lower score might indicate that groups of three or more questions have high
intercorrelations, so the threshold for item removal should be reduced until this condition is
satisfied.
The next step is to decide on an appropriate method. If you are only dealing with your
sample for further analysis (i.e. it is a population in terms of the EFA) use the Principal Axis
Factoring method. Otherwise, if you are trying to develop and instrument to be used with
other data sets in the future, use a sample-based EFA method such as Maximum Likelihood
or Kaiser’s alpha factoring.
Whether to rotate the factors and the type of rotation also needs to be decided. An
orthogonal rotation can improve the solution from the unrotated one but it forces the factors
to be independent of each other. The most popular orthogonal rotation technique is Varimax.
An oblique rotation allows a degree of correlation between the factors in order to improve the
intercorrelation between the items within the factors. Although Reise et al. (2000) give
several reasons why it should be considered, it is more difficult to interpret and should only
be considered if the orthogonal solution is unacceptable. More detail is provided in the
section at the end of the worked example.
The purpose of an EFA is to describe a multidimensional data set using fewer variables. The
items which make up these factors (or components) should generally be more correlated to
each other than the factors are to each other. Orthogonally rotated factors have zero (in
practice negligible) intercorrelation by definition. However, if EFA is being used to select
variables to create scales then these derived scales will have non-zero intercorrelations.
The adequacy of the sample size also needs to be assessed. There is a test of sample
adequacy called KMO which is available in SPSS. A minimum acceptable score for this test
is 0.5 (Kaiser, 1974). This statistic can also be calculated for each item with the same cut off

© 2016, Centre for Academic Success, Birmingham City University 1


criterion. If the sample size is less than 300 it is also worth looking at the average
communality of the retained items. According to MacCallum et al. (1999) an average value
above 0.6 is acceptable for samples less than 100, an average value between 0.5 and 0.6 is
acceptable for sample sizes between 100 and 200.
Factor loadings are also important. Tabachnick and Fidell (2014) recommend ignoring factor
loadings with an absolute value less than 0.32 (representing 10% of the shared variance).
Following the advice of Field (2013: 692) we recommend suppressing factor loadings less
than 0.3. Retained factors should have at least three items with a loading greater than 0.4.
These items should also not cross load highly on other factors. If the above rules are used
for factor suppression and retention, a consistent cross factor loading cut off is a maximum
of 75% of any factor loading. Any items which load on more than two factors would require a
lower cut off value.
There is also a relationship between sample size and acceptable factor loadings. According
to Stevens (2012) for a sample size of 100 factor loadings are significant at the 0.01 level
when they are larger than 0.512, for a sample of 200 they are significant when they are
larger than 0.364 and for a sample of 300 they are significant when they are larger than
0.298. According to Guadagnoli and Velicer (1988) a factor with four loadings greater than
0.6 is stable for sample sizes greater than 50 and a factor with 10 loadings greater than 0.4
is stable for a sample size greater than 150.
The number of factors to be retained needs to be decided. There are different criteria for
making this decision. It is probably sensible to use the SPSS default rule to start with. This
cuts off factor eigenvalues less than 1. The item loadings on each factor should then be
examined. Any item which does not load above 0.3 on any factor should be removed and the
analysis re-run. Items which load less than 0.4 on any factor should be removed one at a
time in reverse order of highest factor loading. Cross factor loadings should then be
considered using the cut-off rules described above. The number of factors can then be
adjusted. All retained factors should have at least three items with a loading greater than 0.4.
The proportion of the total variance explained by the retained factors should also be noted.
As a general rule this should be at least 50% (Streiner, 1994).
If the goal of the analysis is to create scales of unique items then the meaning of the group
of unique items which load on each factor should be interpreted to give each factor a name.
A PCA without rotation with a single component should then be run on each group of items
in turn. The size of the first eigenvalue and the items loading upon it should be noted. Items
with a loading less than 0.4 on this first component should be removed. Once each group
has stabilised the regression loadings on the component should be saved. This is a more
accurate measure of scale score than adding the item scores together as component
loadings may vary considerably.
A reliability analysis can then be undertaken for each scale. For further information see our
sheet on Advice on Reliability Analysis with Small Samples.
Finally, as a validation check, compare the average inter-item correlation for each scale with
the inter-scale correlation. The former should be higher than the latter.

Worked example
171 business men and women responded to a questionnaire on entrepreneurship which was
constructed from 8 groups of questions derived from existing questionnaires, comprising of a
total of 39 questions. Each of the questions comprised of a five point Likert response scale.
As the data from the questionnaire was to be used in a further analysis it was decided to
carry out an Exploratory Factor Analysis using the Principal Axis Factoring technique and a
Varimax rotation.

© 2016, Centre for Academic Success, Birmingham City University 2


A Pearson bivariate correlation of all the items was carried out in SPSS. A highlighting style
Condition was set for any correlations with an absolute value greater than 0.8 (the condition
shown below appears automatically when the Value is changed to Correlation):

This returned a table of correlations which included 9 unique pairs of correlations with an
absolute value greater than 0.8, with the lowest absolute value being 0.922. As this was
markedly higher than the threshold it was decided to remove one item from each of these
pairs based on a qualitative analysis of the items, leaving 30 items.
An EFA was then run on the remaining 30 items using a Principal Axis Factoring technique
with a Varimax rotation, providing the KMO statistics and determinant of the correlation
matrix, retaining all factors with eigenvalues greater than 1, sorting the factor coefficients by
size and suppressing all factor coefficients less than 0.3:

The communalities of the initial solution were observed. Two items initially had
communalities less than 0.2. These were removed in turn, staring with the smallest

© 2016, Centre for Academic Success, Birmingham City University 3


and re-observing the
communalities. Eventually
three items were removed,
leaving 27 items with
communalities greater than
0.2.
This led to a solution
comprising of 6 factors, each
having loadings of 0.4 on at
least 3 items.
However, several items in the
rotated factor matrix cross
loaded on more than one
factor. These were removed in
turn, starting with the item with
the highest ratio of loadings
on the most variables with the
lowest highest loading.
In addition, items with highest
factor components much less
than 0.4 were also removed.
This eventually yielded a
stable solution with 19
variables with 6 factors with
each factor still having at least
3 items with a loading greater
than 0.4 (see below).
1 The KMO statistic for this
solution was 0.805 and the
2 correlation matrix determinant
was greater than 0.0001. The
6 extracted factors accounted
for 69.1% of the total variance
in the data.
Even though there was still
some cross loadings on the
items, in view of the fact that
the questionnaire was
constructed from 8 groups of
questions, it was decided to
retain these 19 variables and
run a PCA on each of the
groups of items indicated by
the 6 factors.

© 2016, Centre for Academic Success, Birmingham City University 4


The groups of items retained for the PCA are indicated on the figure below.

© 2016, Centre for Academic Success, Birmingham City University 5


The settings for a PCA with a single factor are shown below. The Regression factor
scores were saved.

The results of the PCA for the 6 groups is shown below.

Group 1: Group 2: Group 3:


First eigenvalue = 2.961 First eigenvalue = 2.266 First eigenvalue = 1.937

Group 4: Group 5: Group 6:


First eigenvalue = 2.003 First eigenvalue = 1.942 First eigenvalue = 1.728

© 2016, Centre for Academic Success, Birmingham City University 6


All six groups were considered acceptable as the each had at least 3 component scores
greater than 0.7. The Cronbach’s alpha scores for the 6 groups were as follows:

Group 1 2 3 4 5 6
Cronbach’s alpha 0.791 0.744 0.725 0.745 0.728 0.630

Based on the sample size, the number of items within each scale, the overlap between some
groups and these Cronbach’s alpha values it was decided to retain groups 1 to 5 for further
analysis. A validation check was made of these groups’ intercorrelations.
The average intercorrelation
was 0.45 which was
unacceptably high (it was
higher than the average
intercorrelation within each
group, which was 0.43). It
was therefore decided to
return to the EFA and save
the factor score values from
the previous converged
solution.
As an orthogonal rotation had
been used, these factors
would have a negligible
intercorrelation. Both types of
combined variable were used
in an analysis of the data. A
qualitative interpretation was made of the scales.
An alternative is to attempt an oblique factor rotation, as discussed below.

Oblique factor rotation


If the orthogonal factor rotation does not lead to an acceptable solution an oblique rotation
can be considered. There are two methods available in SPSS: direct oblimin and promax.
Either method is acceptable, however it is advised that the default values of the coefficients
Delta and Kappa are not changed.
A Principal Axis FA with a direct oblimin oblique
rotation with Delta = 0 was carried out using the
same 30 items as the original FA above. Due to
problems of slow convergence, the number of
iterations was increased to 100.
Again, any items with a communality below 0.2
were removed in turn, starting with the item with
lowest value. Eventually three items were
removed.
An oblique rotation creates two additional factor
matrices called pattern and structure. It is the
pattern matrix which needs to be analysed in the
same way as the single matrix for orthogonal
rotations.

© 2016, Centre for Academic Success, Birmingham City University 7


Items were removed in the same manner
as before and the analysis was re-run.
The number of factors extracted was
reduced from 6 to 5 due to too few items
loading on the sixth factor above 0.4.
Further variables were removed according
to these conditions and also low
communality. Eventually the fifth factor
was also removed, leaving four factors
and 16 items, accounting for 60.9% of the
total variance and providing the unique
pattern matrix loadings shown. These
factors were also interpreted according to
their item loadings.
The regression scores were saved and
the inter factor correlations were
calculated. The average value was now
0.28. This was seen to be acceptable and
an improvement on the orthogonal
rotation as it was seen to better reflect the
actual interaction between factors.
References
Child, D. (2006). The Essentials of Factor
Analysis. 3rd edn. New York:
Continuum.
Costello, A. B. and Osborne, J. W. (2005)
Best practices in exploratory factor
analysis: four recommendations for
getting the most from your analysis.
Practical Assessment, Research &
Evaluation, 10(7), pp. 1-9.
Field, A. (2013) Discovering Statistics
using SPSS, 4th edn. London:
SAGE.
Guadagnoli, E. and Velicer, W. F. (1988) Relation of sample size to the stability of
component patterns. Psychological Bulletin, 103(2), pp. 265-275.
Kaiser, H. F. (1974) An index of factorial simplicity. Psychometrika, 39(1), pp. 31-36.
MacCallum, R. C., Widaman, K. F., Zhang, S. and Hong, S. (1999) Sample size in factor
analysis. Psychological Methods, 4(1), pp. 84-99.
Reise, S. P., Waller, N. G. and Comrey, A. L. (2000) Factor analysis and scale revision.
Psychological Assessment, 12(3), pp. 287-297.
Stevens, J. P. (2012) Applied Multivariate Statistics for the Social Sciences. 5th edn. London:
Routledge.
Streiner (1994) Figuring out factors: the use and misuse of factor analysis. Canadian Journal
of Psychiatry, 39(3), pp. 135-140.
Tabachnick, B. G. and Fidell, L. S. (2014) Using Multivariate Statistics. 6th edn. Harlow:
Pearson.

© 2016, Centre for Academic Success, Birmingham City University 8

View publication stats

Potrebbero piacerti anche