Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
y Thisiswhatpeoplegenerallymeanwhenthey
say"factoranalysis"
y Thisfamilyoftechniquesusesanestimateof
commonvarianceamongtheoriginalvariables
togeneratethefactorsolution
y Itisbasedonthefundamentalassumptionthat
someunderlyingfactors,whicharesmallerin
numberthantheobservedvariables,are
responsibleforthecovariation amongthem
Questionstoanswer
y Howmanydifferentfactorsareneededtoexplainthepatternof
relationshipsamongthesevariables?
y Whatisthenatureofthosefactors?
y Howwelldothehypothesizedfactorsexplaintheobserved
data?
y Howmuchpurelyrandomoruniquevariancedoeseach
observedvariableinclude?
y Aswithprevioustechniqueswehavediscussed,thegoalis
dimensionreduction,i.e.todescribeanumberofvariablesina
simplerform
Issues
y Factorsolutionsintheexploratoryendeavorwillbe
differentdependingonthedata,algorithmandother
researcherchoices
y Thegoalwithexploratoryfactoranalysisistodiscover
structure,notdetermineit
y Therearealsodifferencesinreportingsuchthatone
mayseequitedifferentresultsfromsampletosample,
studytostudy
FactorAnalysis
y Therearefourbasicsteps:
y datacollectionandgenerationofthecorrelationmatrix
y extractionofinitialfactorsolution
y rotationandinterpretation(alsovalidation)
y constructionofscalesorfactorscorestouseinfurtheranalyses
y Agoodfactor:
y Makessense
y Willbeeasytointerpret
y Possessessimplestructure
y
Itemshavelowcrossloadings
FactorAnalysis
y Factoranalysiscanbeseenasafamilyoftechniques,of
whichbothPCAandEFAaremembers1
y Factoranalysisisastatisticalapproachthatcanbeusedto
analyzeinterrelationshipsamongalargenumberof
variablesandtoexplainthesevariablesintermsoftheir
commonunderlyingdimensions(factors)
y Itinvolvesfindingawayofcondensingtheinformation
containedinanumberoforiginalvariablesintoasmaller
setofdimensions(factors)withaminimumlossof
information
PrincipalComponentsAnalysis
y Principlecomponentsanalysis(PCA)isastatisticaltechniqueapplied
toasinglesetofvariablestodiscoverwhichvariablesinthesetform
coherentsubsetsthatareindependentofoneanother
Providesauniquesolution,sothattheoriginaldata,thecovarianceor
correlationmatrix,canbereconstructedfromtheresults
y Looksatthetotalvarianceamongthevariables,sothesolution generated
willincludeasmanyfactors/componentsastherearevariables,althoughit
isunlikelythattheywillallmeetthecriteriaforretention
y
y Variablesthatarecorrelatedwithoneanotherwhicharealsolargely
independentofothersubsetsofvariablesarecombinedintofactors
y Factorsaregeneratedwhicharethoughttoberepresentativeofthe
underlyingprocessesthathavecreatedthecorrelationsamong
variables
y TheunderlyingnotionofPCAisthattheobservedvariablescanbe
transformedintolinearcombinationsofanunderlyingsetof
hypothesizedorunobservedcomponents(factors)
y PCAistypicallyexploratoryinnature
Commonfactormodel
y PCAandcommonfactoranalysismayutilizeasimilarmethodandare
conductedwithsimilargoalsinmind
y ThedifferencebetweenPCAandcommonFAinvolvestheunderlying
model
y
Thecommonfactormodelforfactoranalysis
y PCAassumesthatallvarianceiscommon,whichisakintoassuming
theyareperfectlyreliable1
y
Alluniquefactors,i.e.sourcesofvariabilitynotattributabletoafactor,
setequaltozero
y Thecommonfactormodelontheotherhandholdsthattheobserved
varianceineachmeasureisattributabletorelativelysmallset of
commonfactors(latentcharacteristicscommontotwoormore
variables),andasinglespecificfactorunrelatedtoanyother
underlyingfactorinthemodel
1
Comparisonofunderlyingmodels
y PCA
y ExtractionistheprocessofformingPCsaslinear
combinationsofthemeasuredvariablesaswehave
donewithourothertechniques
y
y
y
y Commonfactoranalysis
y EachmeasureXhastwocontributingsourcesof
variation:thecommonfactor andthespecificor
uniquefactor:2
y
y
y
X1 =1 +1
X2 =2 +2
Xf =f +f
Example
y Considerthefollowingexamplefrom
y Variables
y Paragraphcomprehension
y Sentencecompletion
y Wordmeaning
y Addition
y Countingdots
y Eachpersonsscoreisareflectionofthe
weightedcombinationofthecommon
factor(latentvariable)andmeasurement
error(uniqueness,unreliability)
y Intheseequations, representsthe
extenttowhicheachmeasurereflectsthe
underlyingcommonfactor
FactorAnalysis
y Whenstandardizedthevariancecanbedecomposed
asfollows:
var( X i ) = var(i + i ) = 2 + var( i ) = 1
y i,inunsquared form,isnowinterpretableasa
correlationcoefficientanditssquaretheproportion
ofthevariationinXaccountedforbythecommon
factor,i.e.thecommunality
y Theremainingisthatwhichisaccountedforbythe
specificfactororuniqueness(individualdifferences,
measurementerror,someotherknownfactore.g.
intelligence)
var( ) = ii
1 ii2 = communality
Measurementerror
y Theprincipalaxisextractionmethodof
EFAcanbedistinguishedfromPCAin
termsofsimplyhavingcommunalities
onthediagonalofthecorrelationmatrix
insteadof1s
y Whatdoesthismean?
y Having1s assumeseachitem/variableis
perfectlyreliable,i.e.thereisno
measurementerrorinitsabilityto
distinguishamongcases/individuals
L = V'RV
L = Eigenvalue matrix
V = Eigenvector matrix
R = Correlation matrix
R = AA'
A = Loading matrix
Measurementerror
y Thefactthatourestimatesinpsycharenotperfectlyreliable
suggeststhatweusemethodsthattakethisintoaccount
y ThisisareasontouseEFAoverPCAincasesinvolving
measurementscalesoritems,asthecommunalitiescanbeseen
asthelowerbound(i.e.conservative)estimateofthevariables
reliability
y Notehowever,thatlowcommunalitiesarenotinterpretedas
evidenceofpoorfitsomuchasevidencethatthevariables
analyzedhavelittleincommonwithoneanother,andthusare
notreliable measuresofaproposedfactorsolution
Twofactorsolution
y Fromourexamplebefore,wecould
haveselectedatwofactorsolution
y Quantitativevs.Verbalreasoning
y Nowwehavethreesourcesof
variationobservedintestscores
y Twocommonfactorsandoneunique
factor
y Asbefore,each reflectstheextent
towhicheachcommonfactor
contributestothevarianceofeach
testscore
y Thecommunalityforeachvariable
isnow2 +2
1
Analysis:theCorrelationMatrices
y ObservedCorrelationMatrix
y Notethatthemannerinwhichmissingvaluesaredealtwith
willdeterminetheobservedcorrelationmatrix
y IntypicalFAsettingsonemayhavequiteafew,andsocasewise
deletionwouldnotbeappropriate
y Bestwouldbetoproduceacorrelationmatrixresultingfrom
somemissingvaluesanalysis(e.g.EMalgorithm)
y ReproducedCorrelationMatrix
y Thatwhichisproducedbythefactorsolution
y
RecallthatinPCAitisidenticaltotheobserved
y Itistheproductofthematrixofloadingsandthetransposeof
thepatternmatrix(partialloadings)*
y ResidualCorrelationMatrix
y Thedifferencebetweenthetwo
R = AA'
R res = R - R
Analysis:isthedataworthreducing?
y TheKaiserMeyerOlkin MeasureofSamplingAdequacy
y Astatisticthatindicatestheproportionofvarianceinyour
variablesthatmightbecausedbycommonunderlyingfactors
y Anindexforcomparingthemagnitudesoftheobserved
correlationcoefficientstothemagnitudesofthepartialcorrelation
coefficients
y
Iftwovariablesshareacommonfactorwithothervariables,their
partialcorrelationwillbesmalloncethefactoristakenintoaccount
y Highvalues(closeto1.0)generallyindicatethatafactoranalysis
maybeusefulwithyourdata.
y Ifthevalueislessthan0.50,theresultsofthefactoranalysis
probablywon'tbeveryuseful.
y Bartlett'stestofsphericity
y Teststhehypothesisthatyourcorrelationmatrixisanidentity
matrix(1sonthediagonal,0soffdiagonals),whichwouldindicate
thatyourvariablesareunrelatedandthereforeunsuitablefor
structuredetection
Analysis:ExtractionMethods
y Principal(Axis)Factors
y Estimatesofcommunalities(SMC)areinthediagonal;usedasstartingvaluesfor
thecommunalityestimation(iterative)
y Removesuniqueanderrorvariance
y Solutiondependsonqualityoftheinitialcommunalityestimates
y MaximumLikelihood
y Computationallyintensivemethodforestimatingloadingsthatmaximizethe
likelihoodofsamplingtheobservedcorrelationmatrixfromapopulation
y Unweighted leastsquares
y MinimizeoffdiagonalresidualsbetweenreproducedandoriginalRmatrix
y Generalized(weighted)leastsquares
y Alsominimizestheoffdiagonalresiduals
y Variableswithlargercommunalitiesaregivenmoreweightinthe analysis
y Alphafactoring
y Maximizesthereliabilityofthefactors
y Imagefactoring
y Minimizesunique factorsconsistingofessentiallyonemeasuredvariable
Analysis:RotationMethods
y Afterextraction,initialinterpretationmaybedifficult
y Rotationisusedtoimproveinterpretabilityandutility
y ReferbacktothePCAmechanicshandoutforthe
geometricinterpretationofrotation1
y Byplacingavariableinthendimensionalspace
specifiedbythefactorsinvolved,factorloadingsare
thecosineoftheangleformedbyavectorfromthe
origintothatcoordinateandthefactoraxis
y
y
NotehowPCAwouldbedistinguishedfrom
multipleregression
PCAminimizesthesquareddistancestotheaxis,
witheachpointmappingontotheaxisforminga
rightangle(asopposedtodroppingstraightdown
inMR)
MRisinclinedtoaccountforvarianceintheDV,
wherePCAwilltiltmoretowhichevervariable
exhibitsthemostvariance
Analysis:RotationMethods
y Sofactorsaretheaxes
y OrthogonalFactorsareatrightangles
y Obliquerotationallowsforotherangles
y
Oftenachievesimplerstructure,thoughatthecostthatyoumust
alsoconsiderthefactorintercorrelationswheninterpreting
results
y Repositioningtheaxeschangestheloadingsonthefactorbut
keepstherelativepositioningofthepointsthesame
y Lengthofthelinefromtheorigintothevariablecoordinatesis
equaltothecommunalityforthatvariable
Example:Rotation
Note that the variance of the two factors for the original and
rotated solutions sum to the same amount
Analysis:RotationMethods
y Orthogonalrotationkeepsfactorsuncorrelated
whileincreasingthemeaningofthefactors
y Varimax mostpopular
y
y
Cleansupthefactors
Makeslargeloadingslargerandsmallloadingssmaller
y Quartimax
y
y
y
y
Cleansupthevariables
Eachvariableloadsmainlyononefactor
Varimax worksonthecolumnsoftheloadingmatrix;Quartimax
worksontherows
Notusedasoften;simplifyingvariablesisnotusuallyagoal
y Equamax
y
y
Hybridofthetwothattriestosimultaneouslysimplifyfactorsand
variables
Notthatpopulareither
Analysis:RotationMethods
y ObliqueRotationTechniques
y DirectOblimin
y
y
Beginswithanunrotated solution
Hasaparameter1 thatallowstheusertodefinetheamountof
correlationacceptable;gammavaluesnear4 orthogonal,
0leadstomildcorrelations(alsodirectquartimin)andclose
to1highlycorrelated
y Promax2
y
y
y
Solutionisorthogonallyrotatedinitially(varimax)
Thisisfollowedbyobliquerotation
Orthogonalloadingsareraisedtopowersinordertodrive
downsmalltomoderateloadings
Analysis:Orthogonalvs.Obliqueoutput
y OrthogonalRotation
y Factormatrix
y
Correlationbetweenobservedvariableandfactorfortheunrotated solution
y Patternvs.structurematrix
y
Structurematrix
y
y
Patternmatrix
y
Loadings,i.e.structurecoefficients
Correlationbetweenobservedvariableandfactor
Standardized,partialled coefficients(weights,loadings)
Thesestructureandpatternmatricesarethesameinorthogonalsolutions
andsowillnotbedistinguished
y FactorScoreCoefficientmatrix
y Coefficientsusedtocalculatefactorscores(likeregressioncoefficients)from
originalvariables(standardized)
Analysis:Orthogonalvs.Obliqueoutput
y ObliqueRotation
y Factormatrix
y
Correlationbetweenobservedvariableandfactorfortheunrotated solution
y StructureMatrix
y
y
Simplecorrelationbetweenfactorsandvariables
Factorloadingmatrix
y PatternMatrix
y
y
y
Uniquerelationshipbetweeneachfactorandvariablethattakesintoaccountthe
correlationbetweenthefactors
Thestandardizedregressioncoefficientfromthecommonfactormodel
Themorefactors,thelowerthepatterncoefficientsasarulesincetherewillbemore
commoncontributionstovarianceexplained
y Forobliquerotation,theresearcherlooksatboththestructure andpattern
coefficientswhenattributingalabeltoafactor
y FactorScoreCoefficientmatrix
y
Againusedtoderivefactorsscoresfromtheoriginalvariables
y FactorCorrelationMatrix
y
correlationbetweenthefactors
Analysis:Factorscores
y Factorscorescanbederivedinavarietyofways,someofwhich are
presentedhere1
y
Regression
y
Leastsquares
y
y
Minimizessquaredresidualsofscoresandtruefactorscores
SameapproachasabovebutusesthereproducedRmatrixinsteadofthe
original
Bartlett
y
Regressionfactorscoreshaveameanof0andvarianceequaltothesquared
multiplecorrelationbetweentheestimatedfactorscoresandthe truefactor
values.Theycanbecorrelatedevenwhenfactorsareassumedtobe
orthogonal.Thesumofsquaredresidualsbetweentrueandestimated
factorsoverindividualsisminimized.
Minimizestheeffectofuniquefactors(consistingofsinglevariables)
AndersonRubin
y
SameasBartlettsbutproducesorthogonalfactorscores
y Onceobtainedonecanusefactorscoresinotheranalyses
y
RecallPCregression
Otherstuff:Aniterativeprocess
y Togetaninitialestimateofthecommunalities,wecansimplystartwiththe
squaredmultiplecorrelationcoefficient(R2)foreachitemregressedonthe
otherremainingitems
y
Thereareotherapproaches,butwiththisonewecanseethatif ameasurewas
completelyunreliableitsR2 valuewouldbezero
y WeruntheEFAandcometoasolution,howevernowwecanestimatethe
communalitiesasthesumofthesquaredloadingsforanitemacrossthe
factors
y Wenowusethesenewestimatesascommunalitiesandrerun
y Thisisdoneuntilsuccessiveiterationsareessentiallyidentical
y
Convergenceisachieved
y Ifconvergenceisnotobtained,anothermethodoffactor
extraction,e.g.PCA,mustbeutilized,orifpossible,samplesize
increased
Measurementerrorvs.samplingerror
y Measurementerroristhevariancenotattributabletothefactor
anobservedvariablepurportedlyrepresents(1 Reliability)
y Samplingerroristhevariabilityinestimatesseenaswemove
fromonesampletothenext
y Justlikeeverypersonisdifferent,everysampletakenwouldbe
y Notethatwithsuccessiveiterations,weincreasethelikelihood
thatwearecapitalizingontheuniquesamplingerrorassociated
withagivendataset,thusmakingourresultslessgeneralizable
y Asonemightexpect,withlargersamples(andfewerfactorsto
consider)wehavelesstoworryaboutregardingsamplingerror,
andsomightallowformoreiterations
y Unfortunatelytherearenohardandfastrulesregardingthe
limitationofiterations,howeveryoushouldbeawareofthe
tradeoff
Otherstuff:Moreonsamplesize
y Howbig?
y Assumeyoullneedlots
y Fromsomesimulationstudies
y 1.4ormorevariablesperfactorwithlargestructurecoefficients
(e.g.greaterthan.6)mayworkwithevensmallsamples
y 2.10variablesormoreperfactor,loadings.4 N> 150
y 3.samplesize> 300
y Thelargerthecommunalites (i.e.themorereliable),the
betteroffyouare
Otherstuff:Howmanyfactors?
y ReferbacktoPCAnotes,therearemanywaysto
determinethis
y Butrecallthatwearedoingexploratory factoranalysis
y Assuch,justaswesuggestedwithclusteranalysis,go
withthesolutionthatmakesthemostsense
y Alsohowyouinterpretthemis,asitwaswith
previousanalyses,entirelysubjective
Otherstuff:Exploratoryvs.Confirmatory
y ExploratoryFA
y
y
y
y
Summarizingdatabygroupingcorrelatedvariables
Investigatingsetsofmeasuredvariablesrelatedtotheoreticalconstructs
Usuallydoneneartheonsetofresearch
ThetypeofFAandPCAwearetalkingabout
y ConfirmatoryFA
y
y
y
y
Moreadvancedtechnique
Whenfactorstructureisknownoratleasttheorized
Testinggeneralizationoffactorstructuretonewdata,etc.
ThisistestedthroughSEMmethods
Concreteexample
y BeerdatawedidwithPCA
y Seeappendixforcodeifyouwanttogoalong
y Whatinfluencesaconsumerschoicebehaviorwhenshoppingforbeer?
y 200consumersareaskedtorateonascaleof0100howimportantthey
considereachofsevenqualitieswhendecidingwhetherornotto buythesix
pack:
y
y
y
y
y
y
y
COSTofthesixpack,
SIZEofthebottle(volume)
PercentageofALCOHOLinthebeer
REPUTATion ofthebrand
COLORofthebeer
AROMAofthebeer
TASTEofthebeer
y FirstperformaPCAwithvarimax rotation
y Indescriptives checkcorrelationcoefficientsandKMOtestofsphericity
y Makesuretoselectthatthenumberoffactorstobeextractedequalsthe
numbervariables(7)
y Foreasierreadingyoumaywanttosuppressloadingslessthan.3inthe
optionsdialogbox
FirstwellrunaPCAandcomparetheresults
y Asalwaysfirstgettoknow
yourcorrelationmatrix
y Youshouldbeawareofthe
simplerelationshipsquite
well,andonecanalready
guessthefactorstructure
thatmaybefound
y Thefirstandlast3correlate
wellwithintheirgroup,
reputation,correlates
moderatelynegativelywith
alltheothers
y Ourtestshereindicatewell
beokayforfurtheranalysis
Correlation Matrix
Correlation
cost
size
alcohol
reputat
color
aroma
taste
cost
1.000
.832
.767
-.406
.018
-.046
-.064
size
.832
1.000
.904
-.392
.179
.098
.026
alcohol
.767
.904
1.000
-.463
.072
.044
.012
reputat
-.406
-.392
-.463
1.000
-.372
-.443
-.443
color
.018
.179
.072
-.372
1.000
.909
.903
aroma
-.046
.098
.044
-.443
.909
1.000
.870
Approx. Chi-Square
df
Sig.
.665
1637.869
21
.000
taste
-.064
.026
.012
-.443
.903
.870
1.000
PCA
y RecallwithPCAwe
extractall thevariance
y Atthispointitlooks
likewellstickwithtwo
components/factors
whichaccountfor
almost85%ofthe
variance
Communalities
cost
size
alcohol
reputat
color
aroma
taste
Initial
1.000
1.000
1.000
1.000
1.000
1.000
1.000
Extraction
1.000
1.000
1.000
1.000
1.000
1.000
1.000
PCA
y Wevegotsome
strongloadingshere,
butitsnoteasily
interpretable
y Perhapsarotationis
inorder
y Welldovarimax and
seewhatwecomeup
with
Component Matrixa
cost
size
alcohol
reputat
color
aroma
taste
.550
.667
.632
-.735
.760
.736
.710
.734
.675
.699
-.071
-.576
-.614
-.646
3
.064
.235
.066
.670
.233
.080
.032
Component
4
.384
-.085
-.283
.009
.042
-.037
.038
.011
-.078
.101
.037
-.021
-.218
.262
.075
-.137
.106
.060
-.142
.158
.027
7
-.022
.105
-.072
.005
-.115
.032
.077
PCA
y Justgoingbyeigenvalues >1,it
y
y
y
y
y
lookslikenowtheremaybeathird
factorworthconsidering
Here,loadings<.3havebeen
suppressed
Ah,muchnicer,andperhapswell
gowitha3factorinterpretation
Onefactorrelatedtopractical
concerns(howcheaplycanIget
drunk?)
Anothertoaestheticconcerns(isit
agoodbeer?)
Onefactorissimplyreputation
(willIlookcooldrinkingit?)
1
cost
size
alcohol
reputat
color
aroma
taste
3
.809
.972
.952
Component
4
.556
-.912
.977
.937
.939
ExploratoryFactorAnalysis
Communalities
y NowwelltrytheEFA
cost
size
alcohol
reputat
color
aroma
taste
y PrincipalAxisfactoring
y Varimax rotation
y Wellnowbetakinginto
accountmeasurementerror,
sothecommunalitieswillbe
different
y Ifwejusttaketheeigenvalues
greaterthan1approach,we
have2factorsaccountingfor
80%ofthetotalvariance
Initial
.738
.912
.866
.499
.922
.857
.881
Extraction
.745
.914
.866
.385
.892
.896
.902
Factor
1
2
3
4
5
6
7
Total
3.313
2.616
.575
.240
.134
.085
.037
Initial Eigenvalues
% of Variance Cumulative %
47.327
47.327
37.369
84.696
8.209
92.905
3.427
96.332
1.921
98.252
1.221
99.473
.527
100.000
EFA
y Hereareourinitialstructurecoefficients
beforerotation
y Similartobefore,notsowellinterpreted
y Howabouttherotatedsolution?
Factor Matrixa
Factor
cost
size
alcohol
reputat
color
aroma
taste
.494
.644
.595
-.614
.785
.759
.735
.708
.706
.715
-.088
-.526
-.565
-.601
EFA
y Muchbetteronceagain
y Butnotenowwehavethereputation
variableloadingonboth,and
negatively
y Asthismightbedifficultto
incorporateintoourinterpretationwe
mayjuststicktothosethatare
loadinghighly1
y However,thisisagoodexampleof
howyoumayendupwithdifferent
resultswhetheryoudoPCAorEFA
-.431
.942
.946
.950
2
.862
.953
.930
-.447
EFAvs.PCA
y Again,thereasontouseothermethodsofEFAratherthanPCAis to
takeintoaccountmeasurementerror
y Inpsych,thiswouldtherouteonewouldtypicallywanttotake
y
Becauseofthelackofmeasurementerror,physicalsciencestypicallydo
PCA
y However,inmanycasestheinterpretationwillnotchangeforthe
mostpart,moresoasmorevariablesareinvolved
y
Thecommunalitiesmakeuplessofthetotalvaluesinthecorrelation
matrixasweaddvariables
y
y
Ex.5variables10correlations,5communalities
10variables45corr 10communalities
y SomeoftheotherEFAmethodswillnotbeviablewithsomedatasets
y Gist:inmanysituationsyoullbefinewitheither,butperhapsyou
shouldhaveaninitialpreferenceforothermethodsbesidesPCA