Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Multivariate Analysis
Factor Analysis
z Several variables are to be studied in multivariate analysis. z These variables may / may not be mutually independent of each other z Some may hold strong correlation with some other variables. Multi-collinearity may exist among variables. z Data analysis methods in this situation are called Interdependence Methods.
Research Studies
z Several variables are to be studied z Purpose is to establish a cause-and-effect relationship z One dependent (effect) variable and several independent (cause) variables z Data are obtained on them from sample z Data analysis methods in such situations are called Dependence Methods.
Many
Many
Many
Metric
Categorical
Metric
Categorical
Metric
Metric
Factor Analysis
Factor Analysis
z To define the underlying structure among the variables in the analysis. z Examines the interrelationships among a large number of variables and then attempts to explain them in terms of their common underlying dimensions, referred to as factors. z Examines entire set of inter-dependent relationships without making any distinction between dependent and independent variables. z Reduces the total number of variables in the research study to a smaller number of factors by combining a few correlated variables into a factor.
What is a Factor
A factor is a linear combination of the observed original variables V1 ,V2 , . . ,Vn:
Fi = Wi1V1 + Wi2V2 + Wi3V3 + . . . + WinVn where Fi = The ith factor (i = 1, 2,..,m n) Wi = weight (factor score coefficient) n = number of original variables.
Factor Analysis
z Discovers a smaller set of uncorrelated factors (m) to represent the original set of correlated variables (n) significantly (m n) z These factors do not have multi-collinearity, i.e. they are orthogonal to each other z They can then be used in further multivariate analysis (regression or discriminant analysis).
Example # 1
z Evaluate credit card usage & behavior of customers
Initial set of variables is large: Age, Gender, Marital Status, Income, Education, Employment Status, Credit History, Family Background: Total 8 variables
Example # 1
Reduction of 8 variables into 3 factors (i = 3): z F t 1: Factor 1 Heavy H weightage i ht f for age, gender, d & marital it l status t t and low weightages to other variables
Example # 1
These 3 un-correlated factors can be identified by common characteristics of variables with heavy weightages & named accordingly as follows: z Factor 1: (age, gender, marital status) as Demographic Status Factor 2: (income, education, employment status) as Socioeconomic Status
Factor 2: Heavy weightage for income, education, employment status & low weightages to others z
Factor 3: Heavy weightage for credit history & family background and low weightages to other variables.
Example # 2
z z Evaluate customer motivation for buying a two wheeler Initial set of variables is large: 1. Affordable 2. Sense of freedom 3. Economical 4. Mans vehicle 5. Feel powerful 6. Friends jealous 7. Feel good to see ad of this brand 8. Comfortable ride 9. Safe travel 10. Ride for three.
Example # 2
Reduction of 10 variables to 3 factors: z Pride: (mans vehicle, feel powerful, sense of freedom, friends jealous, feel good to see ad of this brand)
2. It converts correlated variables into the desired number of un-correlated factors Tool: Principal Component Method.
Example # 3
z To determine the benefits consumers seek from purchase of a toothpaste z Sample of 30 persons was interviewed
z Assists checking adequacy of sample size (KMO test) z Gives initial eigen values z They determine the minimum number of factors that can represent all variables.
z Respondents were asked to indicate their degree of agreement with the following statements using a 7 point scale: (1=Strongly agree, 7=Strongly disagree)
Bartletts Test
z For valid factor analysis, many variables must be correlated with each other
z That means, if each original variable is completely independent of each of the remaining n-1 variables, there is no need to perform factor analysis
1.000
z i.e. if zero correlation among all variables z H0: Correlation matrix is unit matrix.
Unit Matrix
V1 V1 V2 V3 ------Vn 1 0 0 ------0 V2 0 1 0 ------0 V3 0 0 1 ------0 ---0 0 0 ------0 ---0 0 0 ------0 Vn 0 0 0 ------1
Bartletts Test
z For valid factor analysis, many variables must be correlated with each other z H0 : Correlation matrix is unit matrix z Here, SPSS gives p level < 0.05 z Reject j H0 with 95% level of confidence z So, correlation matrix is not unit matrix Conclusion: Factor analysis can be validly done.
KMO Test
z Kaiser-Meyer-Olkin measure of sampling adequacy in this case= 0.660 z Values of KMO between 0.5 and 1.0 suggest that sample is adequate for carrying out factor analysis. Otherwise, we must draw additional sample. z Here, 0.660 > 0.5 z Conclusion: Sample is adequate z Thus, these two tests together confirm appropriateness of factor analysis.
Eigen Value
z Variance of each standardized variable is 1 z Total variance in study = Number of variables (here 6)
z Fi = W i1V1 + W i2V2 + W i3V3 + . . . . . . . . . . . . . + W i6V6
z Variance explained by a factor is called Eigen Value of that factor z It depends on (a) weights for different variables and (b) correlations l ti b between t th the f factor t & each h variable i bl ( (called ll d Factor F t Loadings) z Higher the eigen value of the factor, bigger is the amount of variance explained by the factor.
z so that the second factor accounts for most of the residual variance, subject to being uncorrelated with first factor z Process goes on till cumulative variance explained crosses a desired level, usually 60%.
Factor Rotation
z Initial factor matrix rarely results in factors that can be easily interpreted
Factor Matrix
Variables V1 V2 V3 V4 V5 V6 Factor 1 0.928 -0.301 0.936 -0.342 -0.869 0 869 -0.177 Factor 2 0.253 0.795 0.131 0.789 -0.351 0 351 0.871
z Therefore, through a process of rotation, the initial factor matrix is transformed into a simpler matrix that is easier to interpret
z It leads to identify which factors are strongly associated with which original variables.
Rotation of Factors
Factor Rotation the th reference f axes of f th the f factors t are t tuned d about b t th the origin i i until some other position has been reached.
1. Orthogonal = axes are maintained at 90 degrees. 2. Oblique = axes are not maintained at 90 degrees.
Since unrotated factor solutions extract factors based on how much variance they account for, with each subsequent factor accounting for less variance. So the ultimate effect of rotating the factor matrix is to redistribute the variance from earlier factors to later ones to achieve a simpler, theoretically more meaningful factor pattern.
+1.0
+.50
V2
-1.0
-.50
+.50
+1.0 V3 V4 V5
Unrotated Factor I
-1.0
-.50
+.50
-.50
Rotated Factor I
-.50 V5
+1.0 V3 V4 Oblique
Unrotated Factor I
Rotation: Factor I
-1.0
Simplification means attempting to making zero values either: in rows (variables, i.e. maximizing a variables loading on a single factor) making as many values in rows as close to zero as possible, possible OR
in columns (factors, i.e. making the number of high loading as few as possible) - making as many values in each column as close to zero as possible.
Factor Rotation
In rotating the factors, we would like each factor to have significant loadings or coefficients for some of the variables. The process of rotation is called orthogonal rotation if the axes are maintained at right angles Let us see how it is done.
-1
0 V2 -1
Factor 1 +1
-1
0 V2 -1
Factor 1 +1
0 V2 -1 Factor 1 +1
z Variation explained by V1 = (-0.2)2 + (0.9)2 = 0.85 z Variation explained by V2 = (0.7)2 + (0.1)2 = 0.50
z Note that variation explained remains unchanged z Some of the loadings are too large or too small z Now, we can reach meaningful conclusion.
Interpretation of Factors
A factor can then be interpreted in terms of the variables that load high on it from rotated factor matrix. matrix
FACTOR 1 has high coefficients for: z V1: Buy a toothpaste that prevents cavities z V3: Toothpaste should strengthen your gums z V5: Prevention of tooth decay is not an important benefit (Note: Coefficient is negative) FACTOR 1 may be labelled as Health Factor.
Interpretation of Factors
F2 = 0.011V1 + 0.375V2 0.043V3 + 0.377V4 0.059V5 + 0.395V6
Conclusion
From the data gathered from 30 respondents on 6 basic variables variables, the most important benefits consumers seek from purchase of a toothpaste are HEALTH and AESTHETICS Health has 45.5 % importance Aesthetics has 36.9 % importance.
FACTOR 2 has high coefficients on: z V2: Like a toothpaste that gives shiny teeth z V4: Prefer toothpaste that freshens breath p concern is attractive teeth z V6: Most important FACTOR 2 may be labelled as Aesthetic Factor
Assumptions
Multicollinearity: Assessed using MSA
adequacy). (measure of sampling
Factor analysis is performed most often only on metric variables, although specialized methods exist for the use of dummy variables. A small number of dummy variables can be included in a set of metric variables that are factor analyzed. If a study is being designed to reveal factor structure, strive to have at least five variables for each proposed factor. For sample size: o the sample must have observations > variables variables. o the minimum absolute sample size >50 observations. Number of observations per variable, with a minimum of five and at least ten observations per variable.
KMO can be used to identify which variables to drop from the factor analysis
because they lack multicollinearity.
There is a KMO statistic for each individual variable, and their sum is the
KMO overall o erall statistic. statistic KMO varies aries from 0 to 1 1.0. 0
Overall KMO should be >0.50 to proceed with factor analysis. If it is not, remove the variable with the lowest individual KMO statistic value
one at a time until KMO overall rises above .50, and each individual variable KMO is above .50.
There must be a strong conceptual foundation to support the assumption that a structure does exist before the factor analysis is performed. performed A statistically significant Bartletts test of sphericity (sig. > .05) indicates that sufficient correlations exist among the variables to proceed. Measure of Sampling Adequacy (MSA) values must exceed .50 for both the overall test and each individual variable. Variables with values <0.50 should be omitted from the factor analysis one at a time, with the smallest one being omitted each time.
Although both component and common factor analysis models yield similar results in common research settings (>30 variables or communalities of >0.60 for most variables):
9 the component analysis model is most appropriate when data reduction is paramount. 9 the common factor model is best in well-specified theoretical applications.
Consideration of several alternative solutions (one more and one less factor than the initial solution) to ensure the best structure is identified.
An optimal structure exists when all variables have high loadings only on a single factor. Variables that cross-load (load highly on two or more factors) are usually deleted unless theoretically justified. Variables should generally have communalities of >0.50 to be retained in the analysis. Re-specification of a factor analysis can include options such as: o deleting a variable(s), o changing rotation methods, and/or o increasing or decreasing the number of factors.
To be considered significant:
o A smaller loading (i.e. +0.30) is enough either a larger sample size, or a larger number of variables being analyzed. o A larger loading (i.e. + 0.50 and above) is needed for a smaller sample size.
conservative and should be considered only as starting points needed for including a variable for further consideration.