Sei sulla pagina 1di 18

Series F

TQM Training Module on


Factor Analysis
Doc. No. F-10.01.20180415
Revision 01; 15th April 2018
Authors: Pankaj Lochan

For further clarifications, write to pankaj.lochan@jsw.inver


© Total Quality Management, JSW Group
TQM Training Series: 6 series with 66 training modules
This is a training module on Factor Analysis(F-10)
Series-A Series-C Series-D Series-F
Basic Problem Solving Tools Quality Management Basics Productivity & Efficiency Tools Advanced Statistical Tools
A-01 Flow Charts C-01 Quality Mgmt. Basics D-01 Value Stream Mapping (VSM) F-01 Sampling & Distribution
A-02 Cause & Effects Diagram C-02 Basic Statistics D-02 Time & Motion Study F-02 Hypothesis Testing
A-03 Stratification C-03 Statistical Process Control D-03 SMED F-03 Regression
A-04 Scatter Diagram C-04 KPI Drill Down D-04 Wrench Time Analysis F-04 Basics of DoE
A-05 Control Charts C-05 KPI Benchmarking D-05 Queuing Theory F-05 Factorial DoE
A-06 Check Sheets C-06 Strategic Analysis Tools D-06 Inventory Management F-06 Principal Component Analysis
A-07 Histogram C-07 Policy Management D-07 Linear Programming Problem F-07 Cluster Analysis
C-08 Policy Diagnosis F-08 Conjoint Analysis
A-08 Pareto Charts D-08 Game Theory
C-09 Daily Management F-09 Discriminant Analysis
A-09 Graphs D-09 OEE
C-10 Daily Mgmt. in Maintenance F-10 Factor Analysis
D-10 PERT & CPM
Series-B C-11 Cross Functional Mgmt. F-11 Response Surface Method
C-12 Quality Assurance Basics
Basic Management Tools F-12 Taguchi DoE
C-13 MSA
B-01 Brainstorming Series-E F-13 Weibull Analysis
C-14 PFD, FMEA, Control Plan
B-02 Affinity Diagram Decision-making Tools
C-15 Cost of Poor Quality (COPQ)
B-03 Arrow Diagram C-16 Improvement Fundamentals E-01 Quality Function Deployment
B-04 Tree Diagram C-17 4i Methodology E-02 Fault Tree Analysis
B-05 PDPC C-18 5S E-03 AHP & Paired Analysis
B-06 Matrix Diagram C-19 Quality Circles E-04 Pugh Matrix
B-07 Matrix Data Analysis C-20 QC Story Approach E-05 Time Series Analysis
B-08 Relation Diagram C-21 Kaizen, OPL, Poka Yoke
FACTOR ANALYSES

• The purpose of Factor Analysis is similar to Principal Component Analysis, to


summarize the data covariance structure in a few dimension of the data. However, the
emphasis in factor analysis is the identification of underlying “factors” that might explain
the dimensions associated with large data variability. The factors are not actually
observed but are thought to be underlying driving forces.
• Factor Analysis is used to study the patterns of relationship among the nature of the
independent variables that affect them, even though those independent variables were
not measured directly.
• For example, marks obtained by students are likely to depend upon intelligence, study
time etc. however, data of these are not collected. Thus answers obtained by Factor
Analysis are necessarily more hypothetical and tentative than is true when independent
variables are observed directly.
• The inferred independent variables are called Factors.
FACTOR ANALYSIS ANSWERS

What is the
nature of those
factors
2
How many different How well do the
factors are needed to 3 hypothesized factors
explain the pattern of 1 explain the observed
relationship among data?
these variables?

How much purely random or unique


4 variance does each observed variable
include?
EXAMPLE

You record the following characteristics for 14 census tracts:


total population (Pop), median years of schooling (School),
total employment (Employ), employment in health services
(Health), and median home value (Home).
You perform principal components analysis to understand the
underlying data structure. You use the correlation matrix to
standardize the measurements because they are not
measured with the same scale.

Data is in File: EXH_MVAR.MTW


PRINCIPLE COMPONENT ANALYSIS
EXTRACTION METHOD

1. You can mention the Number


of factors to extract, if you don’t
Minitab by default calculates
Number of factors = No. of
Variables
2. Select Principal Components
as method of extraction.

3. Click on Graphs

Stat > Multivariate > Factor


Analysis
EXAMPLE

1. Select “Scree Plot” Check “Sort Loading” (Sorting is done by the maximum
absolute loading for any factor. Variables that have their highest
absolute loading on factor 1 are printed first, in sorted order.
2. Click OK Variables with their highest absolute loadings on factor 2 are
printed next, in sorted order, and so on.)
3. Click Results
Click OK till the end.
INTERPRETING OUTPUT

Principal Component Factor Analysis of the Correlation Matrix

Unrotated Factor Loadings and Communalities Since, we did not mention the number of factors to be
extracted, Minitab by default extracted 5 (because no. of
Variable Factor1 Factor2 Factor3 Factor4 Factor5 Communality variables = 5) factors.
Pop 0.972 0.149 -0.006 0.170 0.067 1.000
School 0.545 0.715 0.415 -0.140 -0.001 1.000 All factors perfectly explain the variability in the data.
Employ 0.989 0.005 -0.089 0.083 -0.085 1.000 Factor 1, 2 &3 together explains 97.8% of variability in
Home -0.303 0.797 -0.523 0.005 -0.002 1.000 data.
Health 0.847 -0.352 -0.344 -0.200 0.022 1.000
Factor 4 and 5 explain only 2.1% of variability which is
Variance 3.0289 1.2911 0.5725 0.0954 0.0121 5.0000 very less, hence we can ignore Factor 4 and 5.
% Var 0.606 0.258 0.114 0.019 0.002 1.000 Next step would be to compare model with 2 and 3
factors, if two factor model fell short of explaining.
HOW MANY FACTORS TO
CONSIDER

Scree Plot of Pop, ..., Health


Criteria to select number of
3.0
factors
Scree test: Consider factors until the scree plot becomes flat. In
2.5
this case after factor 3, the plot becomes flat

2.0
Eigenvalue

Latent Root criteria: Select factors with eigenvalue > 1. Using


1.5 this, only first two factors should be selected,

1.0

Prior Expertise: Use prior expertise and knowledge to judge


0.5 number of factors.

0.0

1 2 3 4 5
% Variance Criteria: Consider factors which explain large
number of variability in the data.
Factor Number
MAXIMUM LIKELIHOOD
EXTRACTION METHOD

1. Select Variables
2. Select No. of
factors to extract
3. Select “Maximum
likelihood”
4. Select “Varimax”
Rotation
5. Click on “Graphs”
Stat > Multivariate > Factor
Analysis
EXAMPLE

Select “Scree Plot” Check “Sort Loading” (Sorting is done


and “Loading Plot for by the maximum absolute loading for any
factor. Variables that have their highest
first 2 factors” absolute loading on factor 1 are printed first,
in sorted order. Variables with their highest
absolute loadings on factor 2 are printed
Click OK next, in sorted order, and so on.)

Click Results Click OK till the end.


INTERPRETING OUTPUT

Maximum Likelihood Factor Analysis of the Correlation Matrix


* NOTE * Heywood case

Unrotated Factor Loadings and Communalities


Output is Three tables of loading and
communalities: 1. Unrotated
Variable Factor1 Factor2 Communality
Pop 0.971 0.160 0.968 2. Rotated
School 0.494 0.833 0.938
Employ 1.000 -0.000 1.000
3. Sorted and Rotated
Home -0.249 0.375 0.202
Health 0.848 -0.395 0.875

Variance 2.9678 1.0159 3.9837 These two factors explain 79.7 % of


% Var 0.594 0.203 0.797
total data variability. Further after
Rotated Factor Loadings and Communalities
Varimax Rotation
varimax rotation the % Var is more
balanced among the two factors.
Variable Factor1 Factor2 Communality
Pop 0.718 0.673 0.968
School -0.052 0.967 0.938
Employ 0.831 0.556 1.000
Home -0.415 0.173 0.202
Health 0.924 0.143 0.875

Variance 2.2354 1.7483 3.9837


% Var 0.447 0.350 0.797
INTERPRETING OUTPUT

Factor 1 has large positive loadings on Health (0.924), Employ


(0.831), and Pop (0.718), and a -0.415 loading on Home while the
loading on School (-0.052) is small.
Factor 2 has a large positive loading on School of 0.967 and loadings
Sorted Rotated Factor Loadings and Communalities of 0.556 and 0.673, respectively, on Employ and Pop, and small
loadings on Health and Home.
Variable Factor1 Factor2 Communality Factor 1 & 2 explains 44.7% & 35.0% of variation in data respectively.
Health 0.924 0.143 0.875
Employ 0.831 0.556 1.000
Pop 0.718 0.673 0.968 Let's give a possible interpretation to the factors.
Home -0.415 0.173 0.202
School -0.052 0.967 0.938 The first factor positively loads on population size and on two
variables, Employ and Health, that generally increase with population
Variance 2.2354 1.7483 3.9837 size. It negatively loads on home value, but this may be largely
% Var 0.447 0.350 0.797 influenced by one point. We might consider factor 1 to be a "health
care - population size" factor.

The second factor might be considered to be a "education - population


size" factor. Both Health and School are correlated with Pop and
Employ, but not much with each other
INTERPRETING OUTPUT

Loading Plot of Pop, ..., Health Scree Plot of Pop, ..., Health
1.0 School
3.0

0.8 2.5

Pop
2.0
Second Factor

0.6 Employ

Eigenvalue
1.5

0.4
1.0

0.2 Home
Health 0.5

0.0 0.0

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1 2 3 4 5

First Factor Factor Number

Further, from loading plot it is clearly visible that You can conclude that the first two factors account
Factor 1 has high +ve loading on variable Pop, for most of the total variability in data (given by the
Health and Employ, and has -ve loading on Home. eigenvalues). The remaining factors account for a
very small proportion of the variability (close to
zero) and are likely unimportant.
TIPS

Which Matrix to use Use correlation matrix if the variables are measured by different scales and you want to standardize them or if the
When? variance differ widely among variables. You can use covariance or correlation matrix in all other scenarios.

If the factors errors after obtained after fitting factor model are not assumed to follow normal distribution, use PCA
method. If it follows Normal distribution use Maximum Likelihood Extraction Method.
Which Extraction
method to use? However it is advised to first conduct Factor Analysis using PCA as extraction method as it gives you an idea as to
how many factors explain the extent of variation in your data you are interested to study

Equimax (Gamma = No. of factor/2): Focuses on rotating initial factor. It maximizes loading on variable on one factor
and minimizes on other factors.
Varimax(Gamma = 1): Many variables can have high loadings close (±1) and some low close to 0 on one factor. This
provides clear positive or negative association.
Which Rotation Quartimax(Gamma = 0): This is in between the two methods Quartimax and Varimax.
method to use?
If you use a method with a low value of gamma, the rotation will tend to simplify the rows of the loadings; if you use a
method with a high value of gamma, the rotation will tend to simplify the columns of the loadings.

Varimax Rotation is highly recommended.


TIPS

Represent how much a factor influences a variable. High Factor Sample Factor Sample
loadings (positive or negative) indicate that the factor Loading size Loading size
strongly influences the variable. Low loadings indicate
that the factor has a weak influence on the variables. The 0.3 350 0.55 100
What is the
largest loading either positive indicates that the
threshold of
Loading value to be
contribution of the variables increase with increasing 0.35 250 0.6 85
loading in a dimension and negative loading indicates a
considered as
significant
decrease. 0.4 200 0.65 70
Examine the loading pattern to determine to which factor
each variable belongs. In the unrotated loading table, the 0.45 150 0.7 60
loadings are difficult to interpret, so examine the rotated
loading table to interpret the loading pattern. 0.5 120 0.75 50

Though the techniques are similar, they actually are designed for different purposes.
When to use PCA? Principal components analysis is used to reduce data into a smaller number of components,
When to use Factor Factor analysis is used to understand what constructs underlie the data.
Analysis The two analyses are often performed on the same data. For example, you can conduct a principal components analysis to
determine the number of factors to extract in a factor analytic study.
Please login to:
tqm.jsw.in
to read the training modules
THANK YOU

Potrebbero piacerti anche