Sei sulla pagina 1di 5

Copy Right : Ra i Unive rsit y

11.556 197
R
E
S
E
A
R
C
H

M
E
T
H
O
D
O
L
O
G
Y
So far we have looked at techniques of multiple regression where
we essentially check out the association between a dependent
variable and several independent variables. We now look at a
technique which also measure association but looks at relations
of interdependence. That is we now investigate relations of
interdependence where no one variable is dependent on another.
In this situation all variables are treated as independent variables.
In this and the next lesson we shall be introduced to Factor analysis.
Factor analysis is popular multivariate technique which measures
association between variables. The technique is highly complex
and makes use of sophisticated statistical techniques which are
beyond the scope of our course. Therefore our p[presentation of
this technique will focus on its intuitive rationale and applications.
Since most multivariate techniques are run by most statistical
packages easily our emphasis will be on providing the student
with exposure to relevant application s and how to interpret
computer output and to run a factor analysis on the computer.
By the end of this lesson you should be able to
Understand the analytical and intuitive concepts of Factor
analysis
Determine the types of applications for which we can use
factor analysis.
Analyze and interpret computer output generated for a
factor analysis.
What is Factor Analysis?
The main objective of Factor analysis is to summarize a large
number of underlying factors into a smaller number of variables
or factors which represent the basic factors underlying the data.
Factor analysis is used to uncover the latent structure (dimensions)
of a set of variables. It reduces attribute space from a larger number
of variables to a smaller number of factors and as such is a non-
dependent procedure (that is, it does not assume a dependent
variable is specified).
WE can best explain factor analysis with a non technical analogy:
A mother sees various bumps and shapes under a blanket at the
bottom of a bed. When one shape moves toward the top of the
bed, all the other bumps and shapes move toward the top also, so
the mother concludes that what is under the blanket is a single
thing, most likely her child. Similarly, factor analysis takes as input
a number of measures and tests which are analogous to the bumps
and shapes. Those that move together are considered a single
thing and are labeled a factor.
That is, in factor analysis the researcher is assuming that there is a
child out there in the form of an underlying factor, and he or
she takes simultaneous movement (correlation) as evidence of its
existence. If correlation is spurious for some reason, this inference
will be mistaken, of course, so it is important when conducting
factor analysis that possible variables which might introduce
spuriousness, such as anteceding causes, be included in the analysis.
Typical Problem Studied Using Factor
Analysis
Factor analysis is used to study a complex product or service to
identify the major characteristics considered important by
consumers.
The two major uses of factor analysis
1. To simplify a set of data by reducing a large number of measures
(which in some way may be interrelated and causing
multicollinearity) for a set of respondents to a smaller more
manageable set which are not interrelated and still retain most
of the original information .
2. To identify the underlying structure of the data in which a very
large number of variables may really be measuring a small
number of basic characteristics or constructs of our sample.
For e.g a survey may throw up bet 15-20 attributes which a
consumer considers when buying a product. However there is
a need to find out what are the key drivers.
Factor analysis identifies latent or underlying factors from an
array of seemingly imp variables.
Uses of Factor Analysis
To reduce a large number of variables to a smaller number of
factors for modeling purposes, where the large number of variables
precludes modeling all the measures individually. As such, factor
analysis is integrated in structural equation modeling (Sem),
helping create the latent variables modeled by Sem. However,
factor analysis can be and is often used on a stand-alone basis for
similar purposes.
To select a subset of variables from a larger set, based on which
original variables have the highest correlations with the principal
component factors.
To create a set of factors to be treated as uncorrelated variables
as one approach to handling multicollinearity in such procedures
as multiple regression
To validate a scale or index by demonstrating that its constituent
items load on the same factor, and to drop proposed scale
items which cross-load on more than one factor.
To establish that multiple tests measure the same factor, thereby
giving justification for administering fewer tests.
To identify clusters of cases and/ or outliers.
To determine network groups by determining which sets of
people cluster together (using Q-mode factor analysis, discussed
below)
Applications
The main applications of factor analysis are in marketing research.
Some of the application are as follows:
LESSON 33:
FACTOR ANALYSIS
Copy Right : Ra i Unive rsit y
198 11.556
R
E
S
E
A
R
C
H

M
E
T
H
O
D
O
L
O
G
Y
1. Developing perceptual maps;
Factor analysis is often used to determine the dimensions or
critieria by which consumers evaluate brands and how each
brand is seen on each dimension.
2. Determining the underlying dimensions of the data.:
A factor analysis of data on TV viewing indicates that there are
seven different types of programmes that are independent of
the network offering as perceived by the viewers: movies, adult
entertainment, westerns, family entertainment, adventure plots,
unrealistic events, sin
3. Identifying market segments; and positioning of products;
An example of this is a factor analysis of data on desires sought
on the last vacation taken by 1750 respondents revealed six
benefit segements for vacationers:
Those who vacation for the purpose of visiting friends
and relatives, and not sight seeing,
2, Visiting friends and relatives and plus sight seeing,
3, Sightseeing,
4, Outdoor vacations 5
Resort vacationing
6. Foreign Vacationing.
3. It can be used for condensing or simplifying data:
An example of this : In a study of consumer involvement
across a number of product categories, 19 items were reduced
to four factors of :
1. Perceived product importance/ perceived importance s of
negative consequences of a mispurchase
2. Subjective probability of a mispurchase
3. Pleasure of owing/ using product. The value of the product
as a cue to the type of person who owns it Each of these
factors was independent and there was no multicollinearity.
4. Testing of hypotheses about the structure of a data set.
Confirmatory factor analysis can be used to test whether the
variables in a data set come from a specifies number of factors.
Basic Principles of Factor Analysis
Factor analysis is part of the multiple general linear hypothesis
(MLGH) family of procedures and makes many of the same
assumptions as multiple regression:
Linear relationships,
Interval or near-interval data, ,
Proper specification (relevant variables included,
extraneous ones excluded),
Lack of high multicollinearity, and
Assumption of multivariate normality for purposes of
significance testing.
There are several different types of factor analysis, with the most
common being principal components analysis (PCA). However,
principal axis factoring (PAF), also called common factor analysis,
is preferred for purposes of confirmatory factory analysis.
Factor Analysis-The Theory
Factor analysis is a complex statistical technique which works on
the basis of consumer responses to identify similarities or
associations across factors. It analyzes correlations between
variables, reduces their numbers by grouping them in to fewer
factors.
How it Works
Factor analysis applies an advanced form of correlation analysis to
a no. of factors / statements or attributes. If several of the
statements are highly correlated, it is thought that these statements
measure some factor common to all of them.
A typical study will throw up many such factors. For each such the
researchers have to use their judgment to determine what a
particular factor represents. Factor analysis can only be applied to
continuous or intervally scaled variables.
Factor Analysis - The Process
We now take the case of a marketing research study where factor
analysis is most popularly used. We begin by administering a
questionnaire to all consumers. What factor analysis does is it
identifies two or more questions that result in responses that are
highly correlated. Thus it looks at interdependencies or
interrelationships among data.The analysis begins by observing
the correlation and determining whether there are significant
correlations between them.
Factor analysis is best illustrated with the help of an example:
Example
A two wheeler manufacturer is interested in determining which
variables his customers think of as being imp when they consider
his product. The respondents were asked indicate on a 7 pt scale
(1: completely agree, 7 : completely disgree) with a set of 10
statements relating to their perceptions and some attributes about
two wheelers. Factor analysis would then aim to reduce 10 factors
to a few core factors.
The statements are as:
1. I use a two-wheeler because its affordable.
2. It gives me a sense of freedom to own a two wheeler
3. Low maintenance costs make it very economical in long run
4. A two-wheeler is essentially a mans vehicle.
5. I feel very powerful when I am on my two-wheeler.
6. Some of my friends who dont have one are jealous of me.
7. I feel good whenever I see ads for my two wheeler on TV or
magazines
8. My vehicle gives me a comfortable ride.
9. I think two wheelers are a safe way to travel.
10.Three people should be allowed to travel on a 2 wheeler.
The answers given by 20 resp is inputed into the computer.
What the factor analysis does statistically is to group together
those variables whose responses are highly correlated. Then from
the groups of factors or statements we choose an overall factor
which appears to represent what all the factors in the group appear
to mean.
Copy Right : Ra i Unive rsit y
11.556 199
R
E
S
E
A
R
C
H

M
E
T
H
O
D
O
L
O
G
Y
Interpretation of Computer Output
Factor analysis identifies factors or attributes which are strongly
correlated with each other uncorrelated with other.
The Process of Identification is Complex and is Broadly as
Follows
Factor analysis selects one factor at a time which explains the
maximum variance in the standardized scores than any other factor
combination. Each additional factor selected is likely to explain
less of the than the first factor. The process continues till additional
factors do not reduce the unexplained variance in standardized
scores.
Before we turn to an actual computer output we need to
understand some terms which appear on the computer output
and represent critical stages in the analysis. Generally the analytical
procedure follows a series of steps to arrive at a solution.
The starting point is generating a correlation matrix of the original
data set where responses to each variable or statement is correlated
with others. We then construct new variables on the basis of
attributes or variables which are highly correlated with each other.
There are many different ways of extracting factors. Principal
Components analysis is the most frequently used approach. By
this method a set of variables is transformed into a new set of
factors that are uncorrelated with each other.
These factors are constructed by finding the best linear combination
of variables that accounts for the maximum possible variation in
the data. Each factor is defined as the best linear combination of
variables in terms of explaining the variance not accounted for by
the preceding factor. Additional factors maybe selected till all the
variance is accounted for . Usually the factor extraction process is
stopped after the unexplained variance is below a specified level.
Meaning of Key Terms Used in Factor Analysis
to understand and interpret a computer output of a factor analysis
we need to understand the meaning of certain terms.
1. Variance: A factor analysis is like a regression analysis and tries
to best fit factors to a scatter diagram of responses in such a
way that factors explain the variance associated with responses
to each statement. We aim to get factors such that we explain as
much variance associated with each statement in the study.
2. Standardized scores of an individuals response: Standardized
scores are used because responses to different questions can
use different scales (e.g. 5pt, 7 pt, etc) To allow for comparability
all responses are standardized. This is done by calculating an
Individuals standard score on a statement or a attribute
Standardized score = [actual response to statement]-[mean
response of all respondents to the statement]/ standard deviation
of all responses to the statement.
Thus each persons score is actually a measure of how many
standard deviations his response lies from the mean response
calculated across all respondents.
3. Correlation coefficients
We calculate the correlations coefficients associated with
standardized scores of responses to each pair of statements.
We give a simple example below for six pairs of statements.
The gives a correlation matrix for them is given below. For
simplicity we assume the correlation between two statements
is either one (perfect ) or zero.
St 1 2 3 4 5
1
2
3
4
5
1 1 1 0 0
1 1 0 0
1 0 0
1 1
1
As can be seen statements 1, 2, 3 are correlated with each other and
unrelated to 4, 5 which are correlated with each other. This suggests
all the underlying factors can be grouped into two core factors
which are unrelated to each other.
Computer Output
Table 1
Statement/
variables
F1 F2 F3
Communalities
1
2
3
4
5
.86 .12 .04
.84 .18 .1
.68 .24 .15
.1 .92 .06
.06 .94 .07
.76
.75
.54
.86
.89
Eigen
values
1.9 1.85 .83
Computer Output
We now turn to interpreting the computer output. We begin with
a part of the sample output which shows Above inn in table2 we
present a sample output of a factor analysis. On the horizontal
row we have the different factors. On the vertical column we have
the different variables. We have taken five variables and the data is
reduced to 3 possible factors.
We now explain some of the terms in output and what they
mean;
1. What is a Factor?
Each factor is a linear combination of its component factors. That
is the factor analysis aims to reduce each variable to a linear
combination of a set of actors. Thus is x1..x5 are are our original
variables and we have three factors , then factor analysis expresses
each variable as a linear combination of the three factors. The
parameters I11 I 12, etc are the factor loadings and e1 to e5 are the
error terms.
3 3 13 2 12 1 11 1
F X I X I F I X + + =
3 23 2 22 2 21 2
X I F I F I X + + =
The factor model isilar to the regression model there are a few
independent variables termed factors which help explain the
variation in the dependent variable or x. The factor loadings are
Copy Right : Ra i Unive rsit y
200 11.556
R
E
S
E
A
R
C
H

M
E
T
H
O
D
O
L
O
G
Y
therefore the correlation between the factors and the variable. The
error term consists of the variation in the factors which is not
explained by the factors.
2. Factor Loadings
These are the correlation bet the factor and the statement or
variables standardized response . For example F1 and x1 is .86.
Factor 1 is highly correlated to statement 1 and 2 and least to
statement 4.The loadings are derived using the principle of least
squares . The factor loadings are then placed in a correlation matrix
between the variables and the factors. As shown in table 2. A
factor is identified by those items that have a relatively high factor
loading on that factor and a relatively low factor loading on other
factors.
3. Naming Factors
4. The data shows F1 is a good fit on data from st1, 2, 3 but poor
for st4,5. Thus a researcher would look at the basic factor being
measured by these factors and club them together as
representing an overall factor.
5. Communalities+
How well a factor fits the data from all respondents for any given
statement? Communalities measure the percentage of total
variation in any variable or statement which is explained by all the
factors. The communality can be found by squaring the factor
loadings of a variable across all factors and then summing. For
each statement communalities indicate the proportion of variance
in responses to statements, which is explained by the three factors.
For example .89 of variance in response to statement 5 and only
.54 of variance of responses to statement 3 is explained by the
three factors. Communalities provide information on how well
the factors fit the data. Since in this case the three factors account
for most of the variance associated with each of the statements
we can say the 3 factors fit the data quite well.It can also be thought
of as a measure of the uniquenwss of a variable. A low
communality of figure indicates that the variable is statistically
independent and cannot be combined with other variables.
5. Eigen Values
Indicate how well any given factor fits the data from all the
respondents on all the statements. There is an Eigen value for
every factor. The higher is the Eigen value for a factor the higher is
the amount of variance explained by the factor. The most common
approach to determining the number of factors to retain is to
examine the Eigen values .
The Eigen values are defines as : the sum of the squared factor
loadings for that factor. Thus for example for factor 1 the Eigen
value is found by
Eigen value for factor 1= .(86)+ .84
2
+.68
2
+.1
2
+.06
2
= 1.91
All computer programmes provide the Eigen value or the
percentage of variance explained. Before extraction it is assumed
each of the original variables has a Eigen value of 1. We would
therefore expect any factor which is linear combination of some
of the original values to have an Eigen value greater than one.
Therefore we usually only retain factors which have an Eigen
value>1.
An alternative to the Eigen value is to look at the percentage of
variation in original variables accounted for by the jth factor.
Percentage of variation in original variables accounted for by the
jth facto
= Eigen value j/ Number of factors
6. Cumulative percentage of variation;/ proportion of variance
explained Most output give both cumulative percentage of
variation accounted for by a factor and the proportion of total
variation in the data accounted for by one factor. Since factor
analysis is designed to reduce the number of original variables
a key question is how many factors should be genrated? It is
possible to keep generating factors till they equal the number
of original variables. In which case the factors would be useless.
We usually rely on some rule of thumbs:
1. All factors included prior to rotation must explain atleast as
much variation as an average variable. If there are five variables
each variable should account for 20% of the variation in the
data. Therefore we should look at the percentage of variation
explained by a factor . In our example we can see that factor 1
explains 55% of the variation in the data; factor 2 35.5% of the
variation. However Factors 3, 4, 5, explain less than 10% of the
variantion in the data. Therefore we would probably drop these
factors. In our example we have dropped factors 3,4,5 and
retained 1,2.. The cumulative term in the putput essentially
explains the cumulative variance explained by the factors .The
two are essentially the same thing.
When interpreting output we look at cumulative percentage of
variation . We see cum pct is 80.3 for all 3 factors. Therefore we are
able to economize information contained in 10 original variables
to 3 factors losing only 20% of original information. Most
computer programmes also give the percentage of variance
explained as well as the Eigen values. Another rule of thumb
requires that we retain sufficient factors which explain a satisfactory
percentage of total variance( usually over 70%)
After deciding upon extracted factors in stage 1 the researcher has
to interpret and name
the factors. This is done by identifying which factor is associated
with which of the original variables, i.e
by looking at the factor loadings. If factor 1 has high loading with
variables 1,2,3. Then it is assumed it is a linear combination of
these vars and is given a suitable name representing these variables.
The original factor matrix is used for this purpose.
How Many Factors?
1. A related rule of thumb is to look for a large drop in variance
explained between two factors in the PCA solution. For
example if there is a variances explained by five factors ( before
rotation ) is 40%, 30%, 20% 6% and 4% there is a drop in
variance for the fourth factor which might signal a relatively
unimportant factor.
2. Eigenvalue criteria
An eigen value represents the amount of variance in the original
variables explained associated with afactor. Only factors with
Eigen values>1 are retained. The sum of the factor loadings
of each variable on a factor represent a Eigen value. Or the toal
Copy Right : Ra i Unive rsit y
11.556 201
R
E
S
E
A
R
C
H

M
E
T
H
O
D
O
L
O
G
Y
variance explained by the factor. A factor with an Eigen value<1
is worse than a single variable. And should be dropped.
3. Screen plot Criteria
This is a plot of the Eigen values against the number of
factors. In order of extraction. The plotusually has a distinct
brak between the steep slope of factors with large Eigen values
and a gradual trailing off associated with the rest of the factors.
This is referred to as the screen. Experience has shown that the
point at which the screen begins denotes the true number of
factors. This is shown in figure 21.3. We would choose three
factors. However as the third has a very low Eigen value we
would drop it.
4. Percentage of Variance Criteria
The number of factors extracted is determined so that the
cumulative percentage of variance extracted reaches a satisfactory
level usually at least 70%.
5. 5. Significance Test Criteria
We can also determine the statistical significance of separate
Eigen values and retain only those factors which are statistically
significant. The problem with this criteria is that in large sample
many factors may be statistically significant. Even though they
account for only a small proportion total. variation.
Factor Interpretation
How is the factor interpreted? Interpretations are based on factor
loadings which are correlations between the factors and the original
variables. The factor loadings are shown in table 21.1.for our study.
Thus correlation between factor 1 and x1 is .29. It therefore provides
an indicator of the extent to which the original variables are
correlated with each factor and the extent of correlation. This
correlation is then used as the basis for identifying factors and
labeling them. Thus for example variables3,4,5 combine to form
to define the first factor, possibly personal factor. Because these
variables stress the personal aspects of bank transactions. The
second factor is highly correlated with the first two variables. They
might be termed small bank factors. Because both these appear to
be linked to small bank factors.
Factor Rotation
Factor analysis can generate several solutions for any data set. Each
solution is termed a rotation. Each time the factors are rotated the
factor loadings change .and so does the interpretation of the factors.
Rotation involves moving the components or the axes to improve
the fit of the data. This will not change the total variation explained
by the retained factors but will shift the relative percentage explained
by each factor. Most computers automatically provide Varimax
scheme. Factor rotation is continued till the factors stabilize and
there is relatively little change .
There are many such programmes of rotation such as Varimax,
Promax, etc. \ In Varimax rotation the factors
Point to Ponder
Factor analysis is used to identify underlying dimensions in the
data by reducing the number of variables .
The input of factor analysis is asset of variables for each object
in the sample.
Ouptut: the most important outputs are the factor loadings,
factor scores and variance explained percentages eigen values.
Factor loadings are used to interpret the factors. Sometimes an
analyst picks on two factors which load heavily on a factor to
represent the factor as a whole. The percentage of variance
explained and eigen values help determine which factors to
retain.
Key assumption of this analysis is that the factors underlying
the variables is and the variables completely represented by the
factors. This means the list of variables should be complete.
1. Limitations:
Tends to be highly subjective process. The determination of
number of factors , their interpretation, and rotation all involve
considerable skill and judgement of the analyst.
Also factor analysis does not take to statistical testing , therefore
it id difficult to know if the results are merely accidental or
actually reflect something meaningful.
We can now examine how a factor analysis is conducted using an
example
I want to be known personally at my bank and to be
treated with special courtesy.
If a financial institution treated me in an impersonal or
uncaring way, I would never patronize that organization
again We assume that a pilot study was conducted using
15 -respondents. Table 2 shows the pilot study data and
the correlations among the variables. A factor analysis
program usually starts by calculating variable-by-variable
correlation matrix. This In fact, it is quite possible to
input.
Insert table 2