Sei sulla pagina 1di 27

Factor Analysis

Rajdeep Chakraborti

What is factor analysis?


Factor analysis is a statistical approach that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of their common underlying dimensions (factors).
The statistical approach involves finding a way of condensing the information contained in a number of original variables into a smaller set of dimensions (factors) with a minimum loss of information.

Types of Factor Analysis


Exploratory Factor Analysis. Q Factor Analysis. (Between the individual) R Factor Analysis. (Between the Variables)

Confirmatory Factor Analysis.

Goals of Factor Analysis


(1) To reduce the number of variables and (2) To detect structure in the relationships between variables, that is to classify variables.

Factor can be thought of as manifestation of an abstract underlying dimension

Factor to measure restaurant image


F1: Convenience
Locality Speed of service Convenient hour Proximity to home / work

F2: Restaurant Atmosphere

Good Atmosphere Attractive dcor Spacious Cleanliness of restaurant Organization of service Neatness / uncluttered

Basic concepts
Suppose we have students grade for Math, Physics and English. Let us assume that students performance in these courses is the function of their general intelligence (F) and their aptitude for the subject areas. Hence, students grade for any course is a function of
Their general intelligence Their aptitude for a given course.

Brief Aside in Path Analysis

Local (i.e. conditional independence): Given the factor, observed variables are independent of one another. Cov( Xj ,Xk | F ) = 0 Xs are only related to each other through their common relationship with F.

1 2

X1 X2 X3

e1 e2 e3

Orthogonal One Factor Model


X1 = 1F + e1 X2 = 2F + e2

Xm = mF + em
Coefficients () are pattern loadings The variable is called the indicator or measure of F. F is responsible for correlation between the indicators. It is also referred as common or latent factor or an unobservable construct.

Total variance of any indicator variable can be decomposed into two components:
Variance that is common with general intelligence, F and is given by square of pattern loading Commonality of the indicator with the common factor The variance that is with the specific factor e which the difference between the variance of the variable and the commonality Unique/specific/error variance

Key Concepts
F is latent (i.e.unobserved, underlying) variable

Xs are observed (i.e. manifest) variables


ej is measurement error for Xj. j is the loading for Xj.

Extracting Principal Components (most common kind of Factor Analysis)


Principal Components extends the previous logic of expressing two or more variables with a single factor. In the case of multiple variables, the computations become more involved, but the basic principle of expressing two or more variables by a single factor remains the same.

Extracting Principal Components


Extracts as many factors as there are variables. Tries to extract factors that are independent of each other The first factor explains the highest percentage of variance, the second factor next highest.

Suppose the variables have a 0.00 correlation Variable A No line (that is, no

principal component) will help you predict.

Variable B

What do principle components tell us?


The principal component is fitted so as to minimize the sum of squared residuals Residuals are the difference between each point and the principal component Residuals = Error Score

Picture an eigenvector cutting through some data points

x - axis

Now lets single out two


This distance is the residual.

This observation is right on the line. So it is predicted perfectly!

x - axis

This also works for three dimensional space


Variable A

Each participant gets three scores


Variable B

One for each variable

Each observation is based on these three coordinates


Variable C

But observations may still cluster


Variable A
Some are far

Variable B
Some are near

Variable C

But two eigenvectors seem to work


Variable A

Variable B

Variable C

Assumptions in Factor Analysis


Bartlett Test of Sphericity A statistical test for
the presence of correlations among the variables. Its a test of significance.

Measurement of Sampling Adequacy (MSA) The degree of inter-correlations among the


variables. (Acceptable value is >0.6) Cronbachs Alpha Measure of reliability of data that ranges from 0 to 1. acceptable limit 0.6 to 0.7 min. KMO TEST Its a measure of sampling adequacy for all the variables. (Acceptable value > 0.6)

How many respondents do you need?


Remember the subjects-to-variables (STV) ratio!
This is only a rule-of-thumb

You need at least seven respondents for each variable.


But no fewer than 250 participants regardless of STV. And more is always better.

Determining the number of eigenvectors: Two stopping rules


Percentage of variance criterion
Set an a priori rule (e.g., 75% of variance). Stop when eigen values account for less than this. Best when only a few strong variables define the data Otherwise you get bloated specifics (PCs with only one variable)

A priori criterion
Attempts to replicate the structure of others Best when one has a good theoretical idea of what to expect.

Key Terms in Factor Analysis


Correlation Matrix Table showing intercorrelation among all variables. Factor An underlying dimension that represents few original variables. Factor Loading Correlation between original variables and the factor. Squired factor loading indicates what % of the variance in an original variable is explained by a factor.

Rotation of Factors
The factor loadings could be plot in a scatter plot, with each variable represented as a point.
The axis of this plot could be rotated in any direction without changing the relative locations of the points to each other; however, the actual coordinates of the points, that is, the factor loadings would change. Sometimes such rotations allow a clearer view of the factors

Rotating Factors (Intuitively)


F2 F2
3 1 2 3 1 2

F1
4

F1

x1 x2 x3 x4

Factor 1 0.5 0.6 -0.7 -0.5

Factor 2 0.5 0.6 0.7 -0.5

x1 x2 x3 x4

Factor 1 0 0 -0.9 0

Factor 2 0.6 0.7 0 -0.9

Rotational Strategies
The goal of all of rotational strategies is to obtain a clear pattern of loadings, that is, factors that are somehow clearly marked by high loadings for some variables and low loadings for others.
Typical rotational strategies are varimax (variance maximizing), quartimax, and equamax.

ANALYSIS & INTERPRETATION:

FACTOR ANALYSIS.

SPSS and/or SAS

Potrebbero piacerti anche