Sei sulla pagina 1di 25

Factor Analysis

19-2

What is factor analysis ?

 Factor analysis is a general name denoting a class of


Procedures primarily used for data reduction and
summarization.

 Variables are not classified as either dependent or


independent. Instead, the whole set of interdependent
relationships among variables is examined in order to
define a set of common dimensions called Factors.
19-3

Purpose of Factor Analysis

 To identify underlying dimensions called Factors, that


explain
the correlations among a set of variables.
-- lifestyle statements may be used to measure the
psychographic profile of consumers.

 To identify a new, smaller set of uncorrelated


variables to
replace the original set of correlated variables for
subsequent
analysis such as Regression or Discriminant Analysis.
-- psychographic factors may be used as
independent
variables to explain the difference between
loyal and
non loyal customers.
19-4

Assumptions

 Models are usually based on linear relationships

 Models assume that the data collected are interval scaled

 Multicollinearity in the data is desirable because the objective is to


identify interrelated set of variables.

 The data should be amenable for factor analysis. It should not be


such that a variable is only correlated with itself and no correlation
exists with any other variables. This is like an Identity Matrix.
Factor analysis cannot be done on such data.
19-5

An Example
A study conducted to determine customers perception and attributes
of an airline. A set of 10 statements were constructed and responden
were asked to rate in a 7 point scale
( 1= completely agree, 7 = completely disagree )
Statements were as follows:
1. The Airline is always on time
2. The seats are very comfortable
3. I love the food they provide
4. Their air-hostesses are very courteous
5. My boss/friend flies with the same airline
6. The airlines have younger aircrafts
7. I get the advantage of a frequent flyer
program
8. It suits my schedule
9. My mom feels safe when I fly in this airline
10. Flying by this airline compliments my lifestyle
and
19-6

Example Contd..
 Do the ten different statements
indicate 10 different factors which
influence a customer to fly by this
airline ?
OR
 Is there any correlations between these

statements so that we can identify only


a few factors such that some of these
statements can be associated to these
factors.
19-7

Factor Analysis – basic ideas


Each of the statement indicated in the example is considered as a
Variable. Hence for each respondent there will be a score against
each variable.
Ex: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
respondent 1 2 2 4 3 5 3 5 7 6 2

We can attach suitable weights to each of the variable scores and a


Weighted sum of these can be calculated.
Ex: weight for V1 = 0.3 , weight for V2 = 0.1 etc
Hence a score called Factor Score can be calculated as

Factor Score ( Resp 1) = W1x2 + W2x2+ W3x4+w4x3+……….

Similarly factor score can be calculated for each respondent.


If there were 20 respondents, we would get a table containing
20 factor scores.
Factor Analysis – basic ideas 19-8

contd
 The weights which are assigned to each of the variables are not
taken arbitrarily but are chosen such that the variance in the
factor scores obtained is the maximum.

 Once the first set of weights are obtained, a new set of weights
are obtained so that the new set of factor scores shows the
maximum variance but keeping in mind that these set of factor
scores are uncorrelated with the first set of factor scores.

 This process is repeated till all the variance is explained by these


factors.

 The first set of factor scores obtained is now correlated with


the data for the variable 1 to 10 . This is called factor loadings
Thus factor loading is the correlation between the factor scores
and the variables.
Factor Analysis – basic ideas 19-9

contd

An example would clarify what we have discussed so


far.

A file in excel data sheet can now be looked at to


understand
what we have just discussed.

The factors thus extracted are done using a technique called


Principal – Component Analysis.
Determining the number of 19-10

factors
 It is possible to extract as many factors as there
are variables but the very purpose of factor
analysis will be defeated and hence a smaller
number of factors need to be found.
Question is --- how many?

Several procedures are available:

-- Determine based on Eigenvalues.


An eigenvalue represents the amount of
variance associated with the factor. Generally
only factors with an Eigenvalue of >1.0 is
included.
Determining the number of 19-11

factors

 Determination based on Scree Plot.

A scree plot is a plot of the eigenvalues against


the number of factors. Typically, the plot has a
distinct break with a gradual trailing off with the
rest of the factors. This trailing off is referred to as
Scree.
19-12

Scree Plot
3.0

Eigenvalue 2.5

2.0

1.5

1.0

0.5

0.0
1 2 4 5 6
3Component Number
Determining the number of 19-13

factors

 Determination based on percentage of Variance.

The number of factors extracted is determined so that the cumulative


percentage of variance reaches a satisfactory level.
The amount of variance explained can vary with situation but
above 60% is considered satisfactory.
How to check suitability for Factor 19-14

Analysis
 Kaiser-Meyer-Olkin ( KMO ) measure of sampling
adequacy . This index compares the magnitude
of observed correlation coefficients to the
magnitude of partial correlation coefficients.
Typically it should be
> 0.5 is considered as good enough for
conducting
Factor analysis for the data under consideration.

 Bartlett test of sphericity : It is a test used to


examine the hypothesis that the variables are
uncorrelated in the population. If the hypothesis
can be rejected then the data is suitable for
factor analysis.
19-15

Conducting Factor Analysis


RESPONDENT
NUMBER V1 V2 V3 V4 V5 V6
1 7
.00 3.00 6.00 4
.00 2.00 4.00
2 1
.00 3.00 2.00 4
.00 5.00 4.00
3 6
.00 2.00 7.00 4
.00 1.00 3.00
4 4
.00 5.00 4.00 6
.00 2.00 5.00
5 1
.00 2.00 2.00 3
.00 6.00 2.00
6 6
.00 3.00 6.00 4
.00 2.00 4.00
7 5
.00 3.00 6.00 3
.00 4.00 3.00
8 6
.00 4.00 7.00 4
.00 1.00 4.00
9 3
.00 4.00 2.00 3
.00 6.00 3.00
1
0 2
.00 6.00 2.00 6
.00 7.00 6.00
1
1 6
.00 4.00 7.00 3
.00 2.00 3.00
1
2 2
.00 3.00 1.00 4
.00 5.00 4.00
1
3 7
.00 2.00 6.00 4
.00 1.00 3.00
1
4 4
.00 6.00 4.00 5
.00 3.00 6.00
1
5 1
.00 3.00 2.00 2
.00 6.00 4.00
1
6 6
.00 4.00 6.00 3
.00 3.00 4.00
1
7 5
.00 3.00 6.00 3
.00 3.00 4.00
1
8 7
.00 3.00 7.00 4
.00 1.00 4.00
1
9 2
.00 4.00 3.00 3
.00 6.00 3.00
2
0 3
.00 5.00 3.00 6
.00 4.00 6.00
2
1 1
.00 3.00 2.00 3
.00 5.00 3.00
2
2 5
.00 4.00 5.00 4
.00 2.00 4.00
2
3 2
.00 2.00 1.00 5
.00 4.00 4.00
2
4 4
.00 6.00 4.00 6
.00 4.00 7.00
2
5 6
.00 5.00 4.00 2
.00 1.00 4.00
2
6 3
.00 5.00 4.00 6
.00 4.00 7.00
2
7 4
.00 4.00 7.00 2
.00 2.00 5.00
2
8 3
.00 7.00 2.00 6
.00 4.00 3.00
2
9 4
.00 6.00 3.00 7
.00 2.00 7.00
3
0 2
.00 3.00 2.00 4
.00 7.00 2.00
19-16

Correlation Matrix

Variables
V1 1.0
V2 -0.5
Results of Principal Components 19-17

Analysis

Communalities
Variables I nit
V1 1.0
I nitial Eigen values
V2 1.0
V3 1.0
Results of Principal Components 19-18

Analysis

Extraction Sums of
Factor Eigen
Factor Matrix value
1 2.731
2
Variables 2.218
Fa
19-19
Conducting Factor Analysis
Rotate Factors
 Although the initial or unrotated factor matrix
indicates the relationship between the factors and
individual variables, it seldom results in factors that
can be interpreted, because the factors are
correlated with many variables. Therefore, through
rotation the factor matrix is transformed into a
simpler one that is easier to interpret.
 In rotating the factors, we would like each factor to
have nonzero, or significant, loadings or coefficients
for only some of the variables. Likewise, we would
like each variable to have nonzero or significant
loadings with only a few factors, if possible with
only one.
 The rotation is called orthogonal rotation if the
axes are maintained at right angles.
19-20
Conducting Factor Analysis
Rotate Factors
 The most commonly used method for rotation
is the varimax procedure. This is an
orthogonal method of rotation that minimizes
the number of variables with high loadings on
a factor, thereby enhancing the interpretability
of the factors. Orthogonal rotation results in
factors that are uncorrelated.
 The rotation is called oblique rotation when
the axes are not maintained at right angles,
and the factors are correlated. Sometimes,
allowing for correlations among factors can
simplify the factor pattern matrix. Oblique
rotation should be used when factors in the
population are likely to be strongly correlated.
Results of Principal Components 19-21

Analysis

Rotation Sums of S
Factor Eigenvalu
Rotated1 Factor M
2.68
2 2.26
Variables F
V1
19-22
Conducting Factor Analysis
Interpret Factors

 A factor can then be interpreted in terms of


the variables that load high on it.

 Another useful aid in interpretation is to plot


the variables, using the factor loadings as
coordinates. Variables at the end of an axis
are those that have high loadings on only that
factor, and hence describe the factor.
19-23

Factor Loading Plot


Rotated Component Matrix

factor
Variable 1 2
Factor Plot in Rotated Space
Factor 1 V1 0.962 -2.66E-
02
1.0 V4 ∗∗ ∗ V6 V2 -5.72E-02 0.848
V2
V3 0.934 -0.146
0.5 V4 -9.83E-02 0.854
Factor
V5 -0.933 -8.40E-
0.0
V1∗ 02
∗ V5 V3∗ V6 8.337E-02 0.885
2

-0.5

-1.0

-1.0 -0.5 0.0 0.5 1.0


19-24

A few examples

We can now take few examples


with hypothetical data and run
factor analysis using SPSS package.
19-25