Sei sulla pagina 1di 68

A Seminar Presentation on

Factor Analysis (Special


Concentration on
Exploratory & Confirmatory
Factor Analysis)
Presented By:
Group: F
Shah Zaman Rubayet 121 0847 660

Yousa Zariat (L) 1531126060

Adhnin Akter Summi 1530891060

Fatimatul Tabassum 1530891060


Factor Analysis- Definition
a process in which the values of observed data
are expressed as functions of a number of
possible causes in order to find which are the
most important.
Factor Analysis
> Factor analysis is a way to take a mass of data and
shrinking it to a smaller data set that is more
manageable and more understandable.
> It’s a way to find hidden patterns, show how those
patterns overlap and show what characteristics are
seen in multiple patterns.
> It can be a very useful tool for complex sets of data
involving psychological studies, socioeconomic
status and other involved concepts.
Factor & Factor Loadings
> A “factor” is a set of observed variables that have similar
response patterns; They are associated with a hidden
variable (called a confounding variable) that isn’t directly
measured. Factors are listed according to factor loadings,
or how much variation in the data they can explain.
> a confounder is a variable that influences both the
dependent variable and independent variable causing a
spurious association.
> Not all factors are created equal; some factors have more
weight than others.
Example: Factors
Imagine your HR Team conducts a survey for employee’s job satisfaction and the
results show the following factor loadings:
Variable Factor 1 Factor 2 Factor 3
Question 1 0.885 0.121 -0.033
Question 2 0.829 0.078 0.157
Question 3 0.777 0.190 0.540

The factors that affect the question the most (and therefore have the highest factor
loadings) are bolded. Factor loadings are similar to correlation coefficients in that they
can vary from -1 to 1. The closer factors are to -1 or 1, the more they affect the
variable. A factor loading of zero would indicate no effect.
Salient Features of Factor Analysis
> It is a process of providing an operational definition
for latent construct (through regression equation).
> Taking many variables and illustrating them with a
few “factors” or “components”
> Correlated variables are combined together and
separated from other variables with low or no
correlation
> There is not necessary for correlating the “factors”,
“components”, “variates”, etc.
Salient Features of Factor Analysis
> Patterns of correlations can be defined and used
as indicative of underlying theory (FA)
> Factor analysis can be determined whether
different measures or variable are tapping
aspects of a common dimension
> It removes redundancy or duplication from a set
of correlated variables
> They are formed that are relatively independent
of one another
Use of Factor Analysis
> Data reduction
> Scale development (a measurement instrument is constructed,
validated, and standardized)
> The evaluation of the psychometric quality of a measure, and
> The assessment of the dimensionality of a set of variables
> In the determination of a small number of factors based on a
particular number of inter-related quantitative variables factor
analysis is used
> FA analysis can directly measure speed, height, weight, etc.,
some variables such as egoism, creativity, happiness,
religiosity, comfort are not a single measurable entity.
Literature Review of Factor
Analysis
 If we think about historically, factor analysis was used primarily by
psychology and education; however its use within the health science
sector and it has become much more common during the past two
decades.
 Kline (1994), cited that with the emergence of powerful computers
and the statistical packages which go with them, factor analysis and
other multivariate methods are accessible to those individuals who
have never been prepared to comprehend them.
 Exploratory factor analysis (EFA) serves many important uses in
human resource development (HRD) research. One of the important
uses of FA among researchers consists of reducing relatively large sets
of variables into more manageable ones, developing and refining an
innovative instrument’s scales.
Requirements for Factor Analysis
 To perform a factor analysis, there has to be univariate and
multivariate normality within the data (Child, 2006). It is also
important that there is an absence of univariate and multivariate
outliers (Field, 2009). Also, a determining factor is based on the
assumption that there is a linear relationship between the factors
and the variables when computing the correlations (Gorsuch, 1983).
For something to be labeled as a factor it should have at least 3
variables, although this depends on the design of the study
(Tabachnick & Fidell, 2007).
 As a general guide, rotated factors that have 2 or fewer variables
should be interpreted with caution. A factor with 2 variables is only
considered reliable when the variables are highly correlated with
each another (r > .70) but fairly uncorrelated with other variables.
Requirements for Factor Analysis
 The recommended sample size is at least 300
participants, and the variables that are subjected to
factor analysis each should have at least 5 to 10
observations (Comrey & Lee, 1992). We normally
say that the ratio of respondents to variables should
be at least 10:1 and that the factors are considered
to be stable and to cross-validate with a ratio of
30:1. A larger sample size will diminish the error in
your data and so EFA generally works better with
larger sample sizes.
Factor analysis decision process

• Objectives of factor analysis


• Designing a factor analysis
• Assumptions in factor analysis
• Deriving factors and assessing overall fit
• Interpreting the factors
• Validation of factor analysis
• Additional use of factor analysis research
Graphical Presentation of a Factor Plot
Overview of the steps in a Factor
analysis
Significance of Factor Analysis
• Representing relationships among sets of variables parsimoniously
yet keeping factors meaningful is a major goal of factor analysis.
• A good factor solution of any measurement is both simple and
interpretable.
• When factors of a problem can be interpreted, new insights are
possible.
• Factor analysis is an important tool that can be used in the
development, refinement, and evaluation of tests, scales, and
measures that can be used in education and clinical contexts by
paramedics.
• It issues construct validity confirmation of self-reporting scales.
Applications Of Factor Analysis

The examination of three common applications of


factor analysis are given below:
a) Defining indicators of constructs

b) Defining dimensions for an existing measure

c) Selecting items or scales to be included in a


measure
Applications Of Factor Analysis
• Defining indicators of constructs:
1) Representing each construct of interest 4 or more
measures should be chosen
2) The choice of measures should be guided by theory as
much as possible, previous research, and logic.
• Defining dimensions for an existing measure:
1) The initial researcher should choose the variables to be
analyzed not by the person conducting the analysis.
2) Factor analysis is always done on a predetermined set
of items/scales.
Applications Of Factor Analysis
• Selecting items or scales to be included in a measure:
1) Factor analysis may be conducted to determine what items or
scales should be included and excluded from a measure.
2) Results of the analysis should not be used alone in making
decisions of inclusions or exclusions. Decisions should be taken in
conjunction with the theory and what is known about the
construct(s) that the items or scales assess
3) Results of factor analysis may not always be satisfactory:
o The items or scales may be poor indicators of the construct or
constructs.
o There may be too few items or scales to represent each underlying
dimension
Assumptions Underlying Factor
Analysis
• The measured variables are linearly related to the factors + errors.
• This assumption is likely to be violated if items limited response
scales (two-point response scale like True/False, Right/Wrong
items).
• The data should have a bi-variate normal distribution for each pair
of variables.
• Observations must be independent.
• The factor analysis model generally determines that variables are
formulated by common factors and unique factors. All unique
factors are assumed to be uncorrelated with each other and with
the common factors.
Factor Analysis Model
Each variable can be expressed as a linear combination of factors.
The factors are some common factors plus a unique factor. The
factor model is represented as:
Xi = Ai 1F1 + Ai 2F2 + Ai 3F3 + . . . + AimFm + ViUi

Xi = ith standardized variable


Aij= standardized multi coefficient of variance on common factor j
Fj = common factor j
Vi = standardized coefficient of variance on unique factor i
Ui = the unique factor for variable i
m = number of common factors
Factor Analysis Model
The first set of weights (factor score coefficients) are chosen so that the first
factor explains the largest portion of the total variance.
Then a second set of weights can be chosen. So, the second factor explains
most of the residual variance and subject to being uncorrelated with the first
factor.
The common factors themselves can be enlighten as linear combinations of
the observed variables.

Fi = Wi1X1 + Wi2X2 + Wi3X3 + . . . + WikXk


Where,
Fi = estimate of ith factor
Wi= weight or factor score coefficient
k = number of variables
Statistics Associated with Factor
Analysis
• Bartlett's test of sphericity: Bartlett's test of sphericity is conducted to
test the hypothesis that the variables are uncorrelated among the
population (i.e., the population correlation matrix is an identity matrix).
• Correlation matrix: Between all possible pairs of variables involved in
the analysis A correlation matrix is a lower triangle matrix displaying the
simple correlations.
• Communality: Amount of variance a variable shares with all the other
variables. This is the proportion of variance explained by the common
factors.
• Eigenvalue: Represents the total variance explained by each factor.
• Factor loadings: Correlations between the variables and the factors.
Statistics Associated with Factor
Analysis
• Factor scores: Factor scores are compositely estimated
for each respondent on the derived factors.
• Kaiser-Meyer-Olkin (KMO) measure of sampling
adequacy: Used to examine the appropriateness of
factor analysis. High values (between 0.5 and 1.0)
indicate appropriateness. Values below 0.5 imply not.
• Percentage of variance: The percentage of the total
variance attributed to each factor.
• Scree plot: A scree plot is a plot of the Eigenvalues
against the number of factors in order of extraction.
Example of Factor Analysis
a) ABC Bank is a very large local private bank
b) The HR team surveyed 100 contractual
employees, to know the job satisfaction level
c) It obtained data on 7 different variables from
the employees.
d) Before doing further analysis, the HR Team
ran a Factor Analysis to see if the data could be
reduced.
Example of Factor Analysis
In a bank, HR Team wanted to know the job satisfaction of the contractual
employees

The team gathered data on 7 variables and they are given below:
1. Respect from Co workers
2. Working conditions
3. Relationship with supervisors
4. Flexibility
5. Work Pressure & Stress Level
6. Financial Rewards
7. Opportunity for Advancement
Each variety was measured on a 10 cm graphic rating scale.
Types of Factor Analysis

• There are two types of factor analysis

• 1. Exploratory factor analysis (EFA)

• 2. Confirmatory factor analysis (CFA)


Exploratory Factor Analysis
(EFA)
Exploratory Factor Analysis
Exploratory : When the dimensions/factors are theoretically
unknown

Exploratory Factor Analysis (EFA) is a statistical approach to


determining the correlation among the variables in a dataset.
This type of analysis provides a factor structure (a grouping of
variables based on strong correlations).
It is used to reduce data to a smaller set of summary variables
and to explore the underlining theoretical structure of the
phenomena.  It is used to identify the structure of the
relationship between the variable and the respondent. 
Example

• Retail firm identified 80 characteristics of retail stores


and their services that consumers mentioned as
affecting their patronage choice among stores.
• Retailer want to find the broader dimensions on
which he can conduct a survey
• Factor analysis will be used here
Steps in EFA
• Selecting variables/items
• Preparing/checking correlation matrix
• Extracting factors
• Determining the number of factors
• Rotating factors
• Interpreting results
• Verify structure by establishing
• construct validity
Objective
• Data summarization
*Definition of structure
• Data reduction
*Purpose is to retain the nature and character of original
variables but reduce their numbers to simplify the subsequent
multivariate analysis.
Exploratory factor analysis
Exploratory factor analysis can be performed by using the
following two methods:
1. R-type factor analysis: When factors are calculated from the
correlation matrix, then it is called R-type factor analysis.
2. Q-type factor analysis: When factors are calculated from the
individual respondent, then it said to be Q-type factor
analysis.
• Both type of factor analysis use correlation matrix as an input
data.
• With R type we use traditional correlation matrix
• In Q type factor analysis there would be a factor matrix that
would identify similar individuals.
Difference between Q analysis and cluster
analysis
• Q type factor analysis is based on the inter correlations
between respondents while cluster analysis forms grouping
based on distance based similarity measure between
respondent's scores on variables being analyzed.
• Variable selection and measurement issues
• Metric variables should be there
• If non metric then use dummy variables to represent categories
of non metric variables
• If all non metric then use Boolean factor ana
Assumptions

• Basic assumption: some underlying structure does exist in set


of selected variables (ensure that observed patterns are
conceptually valid).
• sample is homogenous with respect to underlying factor
structure.
• Departure from normality and linearity can apply to extent that
they diminish observed correlation
• Some degree of multi collinearity is desirable
Continue…

• Researcher must ensure that data matrix has sufficient


correlations to justify application of factor analysis(No equal or
low correlations).
• Correlation among variables can be analyzed by partial
correlation (Correlation which is unexplained when effect of
other variables taken into account). High partial correlation
means factor analysis is inappropriate. Rule of thumb is to
consider correlation above 0.7 as high. 17
Continue…

• Another method of determining appropriateness of factor


analysis is Bartlett test of sphericity which provide statistical
significance that correlation matrix has significant correlation
among at least some variables
• Bartlett test should be significant i.e. less than 0.05 this
means that the variables are correlated highly enough to
provide a reasonable basis for factor analysis. This indicate
that correlation matrix is significantly different from an
identity matrix in which correlations between variables are all
zero.
Driving factor

• There are two methods for driving factor, these two methods are as
follows:
• Principle component factor analysis method: This method is used when
we need to drive the minimum number of factors and explain the
maximum portion of variance in the original variable.
• Common factor analysis: This method is used when the researchers do
not know the nature of the factor to be extracted and the common error
variance.
Bartlett's test of sphericity
KMO statistics
KMO-measures >.9
KMO and Bartlett's Test are superb!
Kaiser-Meyer-Olkin Measure of Sampling KMO measures the ratio of the squared
Adequacy. ,930 correlation between variables
to the squared partial correlation
Bartlett's Test of Approx. Chi-Square 19334,492
Sphericity between variables.
df 253
Sig. ,000

KMO measures for


individual factors are
Bartlett's test tests if the R-matrix is an produced on the diagonal
identity matrix (matrix with only 1's in the of the anti-image corr
diagonal and 0's off-diagonal). However, matrix 
we want to have correlated variables, so The KMO-measures
give us a hint at
the off-diagonal elements should NOT be
which variables should
0. Thus, the test should be significant,
be excluded from
i.e., the R-matrix should NOT be an
the factor analysis
identity matrix.
Continue…
• Another measure is measure of sampling adequacy. This index
ranges from 0 to 1 reaching 1 when each variable is perfectly
correlated. This must exceed 0.5 for both the overall test and
individual value 20
Initial considerations: sample size

The reliability of factor analysis relies on the sample size.


As a 'rule of thumb', there should be 10-15 subjects per variable.
The stability of a factor solution depends on:
1. Absolute sample size
2. Magnitude of factor loading (>.6)
3. Communalities (>.6; the higher the better)
The KMO*-measure is the ratio of the squared correlation between variables
to the squared partial correlation between variables. It ranges from 0-1.
Values between .7 and .8 are good. They suggest a factor analysis.
*KMO: Kaiser-Meyer-Olkin measure of sampling adequacy
Continue…
Selection of factors to be extracted
• Theory is the first criteria to determine the number of factors
to be extracted.  From theory, we know that the number of
factors extracted does make sense.  Most researchers use the
Eigenvalue criteria for the number of factors to be extracted.
Value of the percentage and variance explained method is also
used for exploratory factor analysis.  We can use the scree test
criteria for the selection of factors.  In this method, Eigenvalue
is plotted on a graph and factors are selected.
Data screening
• The variables in the questionnaire should inter-correlate if they
measure the same thing. Questions that tap the same sub-
variable, e.g., worry, intrusive thoughts, or physiological
arousal, should be highly correlated.

• If there are questions that are not inter-correlated with others,


they should not be entered into the factor analysis.

• If questions correlate too highly, extreme multi-collinearity or


even singularity (perfectly correlated variables) result.
• Too low and too high intercorrelations should be avoided.
• Finally, variables should be roughly normally distributed.
Eigen values (EVs)
• Each factor has an Eigen value (EV) which indicates the
amount of overall variance that each factor is able to account
for.
• EVs for successive factors have lower values.
• Rule of thumb: Eigen values over 1 are ‘stable’ (Kaiser's
criterion).
• EVs can also be expressed as %s.
• The total of all EVs is the number of variables. (Each variable
contributes a variance of one.)
Scree plot
• A line graph of EVs.
• Depicts amount of variance explained by each factor.
• Cut-off: Look for where additional factors fail to add
appreciably to the cumulative explained variance.
• 1st factor explains the most variance.
• Last factor explains the least amount of variance.
Scree plot

•After 2 or after 4 factors, the curve inflects.


•Since we have a huge sample, Eigenvalues can still be well
interpreted >1, so retaining 4 is justified.
•However, it is also possible to retain just 2.
Communalities
• Each variable has a communality = the proportion of its
variance explained by the extracted factors . Ranges
between 0 and 1.
• High communalities (> .5): Extracted factors explain
most of the variance in the variable
• Low communalities (< .5): A variable has considerable
variance unexplained by the extracted factors –Consider
extracting more factors to explain more variance or
removing this item from the EFA Communalities.
Factor loadings
Factor loadings indicate the relative importance of each item to
each factor.
• A factor matrix shows variables in rows and factors in columns.
• Factors are weighted combinations of variables. Factor loading
matrix
Factor rotation
• Factors are rotated for better interpretation since unrotated
factors are ambiguous. The goal of rotation is to attain an
optimal simple structure which attempts to have each
variable load on as few factors as possible, but maximizes
the number of high loadings on each variable (Rummel,
1970).
• Until the FLs are rotated, they are difficult to interpret. –
Seldom see a simple unrotated factor structure –Many
variables will load on 2 or more factors
• Rotation of the FL matrix helps to find a more
interpretable factor structure
Confirmatory Factor Analysis (CFA)
Confirmatory Factor Analysis

• Confirmatory factor analysis (CFA) is a more complex approach


that tests the hypothesis that the items are associated with specific
factors.
• CFA uses structural equation modeling to test a measurement model
whereby loading on the factors allows for evaluation of relationships
between observed variables and unobserved variables.
• Hypothesized models are tested against actual data, and the analysis
would demonstrate loadings of observed variables on the latent
variables (factors), as well as the correlation between the latent
variables.
Confirmatory Factor Analysis
• The researcher uses knowledge of the theory, empirical research,
or both, postulates the relationship pattern a priori and then tests
the hypothesis statistically.
The use of CFA could be impacted by
• The research hypothesis being testing
• The requirement of sufficient sample size (e.g., n>200 cases per
parameter estimate)
• Measurement instruments
• Multivariate normality
• Parameter identification
• Outliers
• Missing data
• Interpretation of model fit indices (Schumacker & Lomax, 1996).
Standard CFA model
Standard CFA model:
• Standard CFA model consist of simple structure. Each measure
or indicator loads on one and only one factor which implies no
double loadings. There will be no correlated errors and the
latent variables correlated
• A Simple Structure CFA model is identified if there are, at least,
two indicators per latent variable and the errors of those two or
more indicators are uncorrelated with each other and with at
least one other indicator on the other latent variables.
Confirmatory Factor Analysis
A suggested approach to CFA proceeds through the following process:
• Review the relevant theory and research literature to support model
specification
• Specify a model (e.g., diagram, equations)
• Determine model identification (e.g., if unique values can be found for
parameter estimation; the number of degrees of freedom (df,) for model
testing is positive)
• Collect data and conduct preliminary descriptive statistical analysis
(e.g., scaling, missing data, co-linearity issues, outlier detection)
• Estimate parameters in the model
• Assess model fit
• Present and interpret the results.
Testing in CFA and Structural Equation Modeling

• Structural equation modeling software is typically used for performing


confirmatory factor analysis. LISREL, EQS, AMOS, Mplus and laavan
package in R are popular software programs.
• CFA is also frequently used as a first step to assess the proposed
measurement model in a structural equation model.
• Many of the rules of interpretation regarding assessment of model fit
and model modification in structural equation modeling apply equally to
CFA.
• CFA is distinguished from structural equation modeling by the fact that
in CFA, there are no directed arrows between latent factors. In other
words, while in CFA factors are not presumed to directly cause one
another, SEM often does specify particular factors and variables to be
causal in nature.
• In the context of SEM, the CFA is often called 'the measurement model',
while the relations between the latent variables (with directed arrows)
are called 'the structural model. In principle, the more complicated
model should fit for the test to be valid.
Evaluation of model fit
• Most statistical methods only require one statistical test to determine the
significance of the analyses. However, in CFA, several statistical tests are
used to determine how well the model fits to the data. Note that a good fit
between the model and the data does not mean that the model is “correct”,
or even that it explains a large proportion of the covariance. A “good model
fit” only indicates that the model is plausible. When reporting the results of
a confirmatory factor analysis, one is urged to report:
– a) The proposed models
– b) Any modifications made
– c) Which measures identify each latent variable
– d) Correlations between latent variables
– e) Any other pertinent information, such as whether constraints are used.
Continue…
• A good model fit indices determine how well the a priori model
fits, or reproduces the data. A good model fit indices include, but
are not limited to, the Chi-Squared test, RMSEA, GFI, AGFI,
RMR, and SRMR.
Chi-squared test:
• The chi-squared test indicates the difference between observed and
expected covariance matrices.
• Values closer to zero indicate a better fit; smaller difference
between expected and observed covariance matrices.
• Chi-squared statistics can also be used to directly compare the fit of
nested models to the data.
• One difficulty with the chi-squared test of model fit, however, is
that researchers may fail to reject an inappropriate model in small
sample sizes and reject an appropriate model in large sample sizes.
As a result, other measures of fit have been developed.
Continue…
• Root mean square error of approximation (RMSEA):
• The root mean square error of approximation (RMSEA) avoids issues of
sample size by analyzing the discrepancy between the hypothesized
model, with optimally chosen parameter estimates, and the population
covariance matrix.
•  The RMSEA ranges from 0 to 1, with smaller values indicating better
model fit.
• A value of .06 or less is indicative of acceptable model fit.
Continue…
Goodness of fit index (GFI) and adjusted goodness of fit index (AGFI) :
• The goodness of fit index (GFI) is a measure of fit between the
hypothesized model and the observed covariance matrix.
• The adjusted goodness of fit index (AGFI) corrects the GFI, which is
affected by the number of indicators of each latent variable.
• The GFI and AGFI range between 0 and 1, with a value of over .9 generally
indicating acceptable model fit.
Continue…
Comparative fit index (CFI):
• The comparative fit index (CFI) analyzes the model fit by examining the
discrepancy between the data and the hypothesized model, while
adjusting for the issues of sample size inherent in the chi-squared test of
model fit, and the normed fit index.
• CFI values range from 0 to 1, with larger values indicating better fit.
Previously, a CFI value of .90 or larger was considered to indicate
acceptable model fit.
• However, recent studies have indicated that a value greater than .90 is
needed to ensure that miss-specified models are not deemed acceptable
(Hu & Bentler, 1999).
• Thus, a CFI value of .95 or higher is presently accepted as an indicator of
good fit (Hu & Bentler, 1999).
EFA VS. CFA
• Both exploratory factor analysis (EFA) and confirmatory factor analysis
(CFA) are employed to understand shared variance of measured variables
that is believed to be attributable to a factor or latent construct.
• Both techniques are based on linear statistical models and statistical tests
associated with both methods are valid if certain assumptions are met.
• Both techniques assume a normal distribution and incorporate measured
variables and latent constructs.
• Despite this similarity, however, EFA and CFA are conceptually and
statistically distinct analyses.
EFA VS. CFA
• The goal of EFA is to identify factors based on data and to maximize the
amount of variance explained. The researcher is not required to have any
specific hypotheses about how many factors will emerge, and what items or
variables these factors will comprise. If these hypotheses exist, they are not
incorporated into and do not affect the results of the statistical analyses.
• By contrast, CFA evaluates a priori hypotheses and is largely driven by
theory. CFA analyses require the researcher to hypothesize, in advance, the
number of factors, whether or not these factors are correlated, and which
items/measures load onto and reflect which factors.
• As such, in contrast to exploratory factor analysis, where all loadings are
free to vary, CFA allows for the explicit constraint of certain loadings to be
zero.
EFA VS. CFA
• EFA is sometimes reported in research when CFA would be a better
statistical approach. It has been argued that CFA can be restrictive and
inappropriate when used in an exploratory fashion.
• However, the idea that CFA is solely a “confirmatory” analysis may
sometimes be misleading, as modification indices used in CFA are somewhat
exploratory in nature. Modification indices show the improvement in model
fit if a particular coefficient were to become unconstrained.
• EFA and CFA do not have to be mutually exclusive analyses; EFA has been
argued to be a reasonable follow up to a poor-fitting CFA model.
In A Nutshell…..
• In conclusion we can say that EFA explores or reduces the huge
no. of factors into few, by showing which factor loads on which
latent factor. To find how many latent factors, it depends some
thumb rule like Eigen value greater than 1, screen plots, factor
loading greater than 0.55 and cross loadings less than 0.45.
• And CFA is a model which tests to which extent the relations are
valid. To be more confident and certain about the hypothesized
structural model, one need to look at composite reliability and
average variance extracted and also to perform advanced test if
validity like convergent and discriminant validity tests. All these
are not conducted in EFA, it is just a preliminary analysis. one can
also exclude the factor loading which are not meeting the above
mentioned criteria.
THANK YOU

Potrebbero piacerti anche