Sei sulla pagina 1di 8

Problem Statement 1

Cereal Data Factor Analysis

As part of a study of consumer consideration of ready-to-eat cereals sponsored by Kellogg Australia, Roberts and Lattin
(1991) surveyed consumers regarding their perceptions of their favourite brands of cereals. Each respondent was
asked to evaluate three preferred brands on each of 25 different attributes. Respondents used a five-point Likert scale
to indicate the extent to which each brand possessed the given attribute. For the purpose of this assignment, a subset
of the data collected by Roberts and Lattin, reflecting the evaluations of the 12 most frequently cited cereal brands in
the sample (in the original study, a total of 40 different brands were evaluated by 121 respondents, but the majority
of brands were rated by only a small number of consumers). The 25 attributes and 12 brands are listed below.

Cereal Brand Attributes 1-12 Attributes 13-25


All Bran Filling Family
Cerola Muesli Natural Calories
Just Right Fibre Plain
Kellogg’s corn flakes Sweet Crisp
Komplete Easy Regular
Nutrigrain Salt Sugar
Purina Muesli Satisfying Fruit
Rice Bubbles Energy Process
Special K Fun Quality
Sustain Kids Treat
Vitabrit Soggy Boring
Weetbix Economical Nutritious
Health

In total 116 respondents provided 235 observations of the 12 selected brands. How do you characterize the
consideration behaviour of the 12 selected brands? Analyse and interpret your results using factor analysis.

Solution

Load the given dataset

Do the basic exploratory analysis to check if given data is correct and as per expectations.
From the structure and summary statistics of data we find that there are 235 observations and 26 variables. There are
12 different brands of cereals. All the attributes should have values between 1 and 5 but we see some values as 6. We
replace these values with 6. We also do a check to see if there are any NULL values. Lastly, upon carefully examining
the data we note that there are negative attributes like Soggy, Boring. We need to revalue these attributes.

To find out if there is correlation between variables, we do Bartlett’s test of sphericity. The test gives a p-value to
indicate if some of the variables are correlated.

We see that p-value < 0.05 which means null hypothesis is rejected and there is enough evidence to support that there
is correlation between variables, and we can do dimensionality reduction. We remove first column above because it
is not an attribute which can be measured, it is just the type of cereal.
Next we do a KMO test. This test indicates if there is a possibility to get at least 1 factor from the variables.

We see that overall MSA is > 0.7, which means strong factors can be extracted and sample is adequate.

We make a screen plot to get the possible number of factors.

Examining the screen plot, we decide to check for 4 -6 factors solution. To make a FA model on these number of
factors we get the factor loading matrix which subsequently gives the communalities and eigen values. The method
we use is principal axis factoring with no rotation.
From the above diagram we get to know which variable is loading on which factor. If there is no loading of variable on
any factor that it can be removed, and factor analysis is iterated.

From the communality matrix, we infer that attribute 6 – Easy has extremely low communality and it can be removed
for next iteration. The cumulative eigen values indicate how much variability in personality is explained by 4 factors.
Percentage of variance explained in this case will be sum of below which is around 58%. But ideally, we aim for close
to 65%.
Now we again do the whole process of Bartlett’s test, KMO test, FA, communality matrix and calculate new Percentage
of variance explained but this time we remove attribute 6 – Easy.
In this iteration percentage of variance explained comes out to be 59.6% which is OK but not good enough. We keep
these iterations going. We will also do this with 5 factors and 6 factors as well. After doing all these iterations we create
the following table.

No of factors Bartlett's KMO Communality % of VE


4 0 0.85 58
4 0 0.86 Remove Easy 59.6
4 0 0.86 Remove Soggy 59.3
4 0 0.86 Remove Easy and Soggy 61
4 0 0.86 Remove Easy and Process 61.1
4 0 0.86 Remove Easy, Soggy and Process 63
5 0 0.85 58
5 0 0.86 Remove Easy 59.6
5 0 0.86 Remove Easy and Process 61.1
6 0 0.86 Remove Easy and Process 61.1

From the above table it is evident that we get higher % of VE values even for 6 factor solution but we can’t take either
5 factor or 6 factor solution here because upon checking the factor loading matrices (see below), variables have no
loadings on these factors.
We decide to go with 4 factor solution in which we get maximum % of VE explained i.e. 63%. We also have a limitation
to not remove variables more than 10% of the total number. Removing 3 variables is just on the limit but here for the
sake of accuracy let’s just take it. Below are all the details for the selected iteration.
Conclusion

From all the calculated solutions the recommended solution is one in which we got the 63% of VE explained by
removing the variables – Soggy, Easy and Process. This is the final solution we go ahead with. So, the complete FA
gives following result. Given below is the interpretation of FA.

Factor 1 (Health) -> Health, Nutritious, Fibre, Quality, Natural, Filling, Satisfying, Energy, Regular, Boring

Factor 2 (Taste) -> Sweet, Sugar, Calories, Treat, Salt, Fun, Crisp

Factor 3 (Family) -> Kids, Family, Economical, Fruit

Factor 4 (Plain) -> Plain

Potrebbero piacerti anche