Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Correspondence &
Cluster Analysis
Correspondence & Cluster Analysis
Table of Contents
Welcome………………………………………………………………………… 2
Correspondence Analysis…………………………………………………… 3
Cluster Analysis……………………………………………………………….. 28
Crosstab…………………………………………………………………. 46
Selection of Lifestyle Statements…………………………………….. 46
Run Cluster Analysis…………………………………………………… 47
Interpreting the results…………………………………………………. 47
Importing the clusters back into Choices3……………………………47
1
Correspondence & Cluster Analysis
Welcome
Thank you for licensing this product from KMR-SPC Software.
The aim of this guide is to help you to run a Correspondence Analysis and if
appropriate a Cluster Analysis from the results of the Correspondence Map.
Please call the helpdesk with any queries on +44 (0)20 7831 5455 or email on
helpdesk@kmrspc.com asking for the Choices3 team.
2
Correspondence & Cluster Analysis
CORRESPONDENCE
ANALYSIS
3
Correspondence & Cluster Analysis
The following is a basic map plotting attitudinal statements against the shoe
shops that people have stated they use.
4
Correspondence & Cluster Analysis
• Enter the target market (usually your brands) in the columns checking sample sizes
are greater than 200
• Enter either any agree or definitely agree lifestyle statements as rows
• Enter ‘all users’ of the market as your filter (If you intend to run Cluster analysis the
sample must be greater than 2000)
• Edit your headings so they are concise (in ‘Edit Table’ area)
• [If you want to ‘overlay’ info. enter this into columns (e.g. demographics/media)]
• Save and then Run the crosstab
• Select correspondence analysis using the icon or going to the “Analysis”
options
The Correspondence map will be generated within the Choices Viewer along with the
related statistics.
At this stage, before editing the map, you will want to select the statements that best
describe your map and eliminate the rest. There are two methods available:
• In the Choices Viewer, select the Statistics view and expand General Statistics
• Click on "Rows"
• Click on the "Dist" column (this sorts the rows by ‘Chi-distance’)
• Right-click on the rows and choose "Select top n…" and then choose the number of
statements you wish to include in the map (usually about 15-30)
• Right-click and choose "Invert selection"
• Right-click and select "Change status" …and then "to passive"
• Select the map from the analysis tree
• Right click on the map and choose ”Select” and “All passive rows”
• Right click again on the map and select "Hide"
• Edit the map by moving the labels and changing text where necessary
• To rename the map, from the toolbar select “Edit” and “Title”
• To insert labels for the x and y axis, from the toolbar select “Insert” and “New label”
• If you are going on to do a cluster analysis - print the statements used in the map:
Ensure you are in the “Statistics" view and then choose "File" and "Print"
Clean-up method
The clean-up method simply requires the user to specify the number of rows to select in
order to tidy up the map.
• If you are going on to do a cluster analysis, print the statements used in the map:
ensure you are in the Rows view of General Statistics and then choose “File” and
“Print”.
5
Correspondence & Cluster Analysis
(Firstly, you should ensure the variance of the map is sufficiently high – the
combined variance for axis 1 and 2 needs to be over 60%.
(ii)Assessing the relationship between two brands is done by measuring the angle between the
lines that are drawn from the two brands to the centre (origin) of the map: An angle closer
to 0°/180° means a higher positive or negative relationship respectively between the brands.
Otherwise right-angles between brands, or thereabouts (i.e. angles of 90% or 270%)
indicates little or no relationship.
The Correspondence Analysis program will search for correlation within the
data and will produce a map based on the two 'themes' which were strongest
within the data. The most important theme will form the basis of the x-axis and
the second most important, the y-axis.
For instance, in the previous example above (on p 4), one end of the x-axis
might reflect ‘Real Men’ who believe real ale is the only beer worth drinking
and that skincare products are for women and the other end ‘Image
Conscious’ people who are more concerned with fast cars and designer
clothes.
In this case, as with all correspondence maps, the vertical or ‘y’-axis is much
less important than the horizontal (it has a relatively low ‘contribution level’ –
discussed later in the manual). However, you might differentiate between
those whose attitudes lean towards being financially aware versus those who
are more traditional.
Now we will run through a number of key questions you might ask about the
Correspondence Analysis Programme. Remember, if the data doesn't seem to
present any distinct patterns, you may need to study the combination of
variables that you are using and/or re-run your analysis.
6
Correspondence & Cluster Analysis
The variance explained figure is a measure of how well the map is explaining
the variables in it. Ideally on a survey such as TGI at least 60% of the
variation within the market should be explained by the first 2 axes. However, in
reality this may not happen, especially if very few of the brands or variables
overlap (e.g. the statements “My diet is mainly vegetarian” and “ I am a
vegetarian”). If you are in the map view itself this information is given in the
bottom left hand corner of the map.
If the figure is low (we would recommend for a correspondence map that the
minimum acceptable level is 60%) it indicates that these axes do not give a
sufficient explanation of the data. Thus the calculations are probably not
significant enough to create a whole map and the map will not sufficiently
explain the differences between the brands. Note that statistically any set of
data will contain some variance but not all are sufficiently strong. Also, users
of some products might be very similar attitudinally and might be better
differentiated against other variables such as demographics.
Each axis should reflect a dimension within the data, which can be summed up
or described by the user using appropriately descriptive labels. Examples of
dimensions might be introverted / extroverted or traditional / innovative.
The correspondence map can plot any 2 dimensions and will plot the two
strongest ones. However, you should also look at the other axes to see how
other polarities express themselves within the data. This is explained on
page17.
The brands around the centre of the map will be those that are 'average', or
not as strongly differentiated as the brands around the outside of the map.
Brands near the edge of the map are those which have more extreme variation
or differences from other brands and attitudes. In practice these might be the
smaller brands which may attract a more specialist or distinctive consumer.
There are two main measurements that you can make with a ruler and/or a
protractor shown below. You should remember that the x-axis would have
been stretched or shrunk to fit on your screen so it will not be shown true to
scale.
To find out the correlation between two brands, simply draw a line from each
one to the origin, and measure the angle between them. An angle of 0º
represents 100% correlation, 180º shows 100% negative correlation, and 90º
(or 270º) shows no correlation. Brands B and C are diametrically opposite; i.e.
there is a strong negative correlation. It is important to know that these brands
are opposites in the market. This is as opposed to A and B, which are
7
Correspondence & Cluster Analysis
positioned in a similar area of the map and consequently have similar market
positions.
RED
BLUE
8
Correspondence & Cluster Analysis
Relationship becoming
more strongly positive.
Relationship becoming
more strongly negative
You can see how different statements relate to a brand. Draw a line from the
brand through the origin, and then draw perpendicular lines from each
statement to the line (i.e. at 90º).
The relationship between the Brand A and the lifestyle statements X,Y and Z is
shown by the point where the statement’s intersection line hits the Brands
origin line. Positive relationships lie on the same side of the origin as the
brand. Negative relationships lie on the other side of the origin to the brand.
In the example shown above consumers of Brand A have a strong agreement
with statement Z. Consumers of Brand A disagree more strongly with
Statement X than Statement Y. The closer the brand is to the origin along the
statement line, the weaker the relationship. The further out towards the edges
of the map the brand is, the stronger the relationship.
Example:
Look at the top 12 statements in the list (in red, the closest to Ravel) and try to
find a common theme. In this example, these statements could be part of the
“Image Conscious” theme.
9
Correspondence & Cluster Analysis
RED
BLUE
10
Correspondence & Cluster Analysis
The statistics view contains information, which will allow you to describe your
correspondence map in more detail. An example of the Column statistics view
is given below along with explanations of each of its components and how they
might be used.
Please note that by clicking on the column heading (e.g. ‘Mass’, ‘Inertia’)
enables you to sort by that statistic in descending order.
11
Correspondence & Cluster Analysis
These numbers represent the original numeric order of the variables that were
assigned immediately after the creation of the correspondence map.
Subsequently you may use this row / column to re-order your variables to their
original order should you so wish.
Key
The key represents a code reference for your variable. Note: A default code is
given to each variable if no code can be found.
Mass
The Mass figure represents the percent of data in the crosstab that is in that
row or column.
This is most useful if your map is based upon ‘projected’ figures (i.e. the 000s
figure in your crosstab), rather than the ‘Vertical Percent’ since then the mass
would represent the size of the brand.
NB Choices will automatically use Vertical Percent as your map basis. This
means that your brands are measured in terms of the percentage of those
12
Correspondence & Cluster Analysis
using it. Please contact the KMR-SPC Helpdesk if you would like advice on
using different statistics as your map basis.
‘Distance’ refers to ‘Chi² Distance’ on the map, this figure is important for
measuring the distance of variables from the centre of the map, or the ‘origin’.
This Distance is the squared distance of row/column point from the origin of
the map; Inertia of row/column divided by its % mass
Chi² Distances are statistical values used to make the correspondence map.
The higher the value the more discriminating the attribute. They are most
useful for assessing the discrimination power of your attributes in a
conventional correspondence map.
Chi² Distances
The chi² distance measures how well theoretical data 'fits' observed data. It is
calculated by measuring an 'expected' value for each cell and comparing this
with the actual observed data. The 'expected' value is that which would occur if
there were no relationship between the row and column.
Brands with large differences between observed and expected values will have
a high distinctiveness, while those with an average performance will have low
distinctiveness. In the map, distinctiveness corresponds to the distance from
the origin, but measured over all the dimensions not just the two shown on the
map. Often a small brand has the most distinctive image.
Inertia
This figure shows how strongly each variable contributes to determining the
overall shape of the map, and is a breakdown of the ‘variance explained’
figure. You will find it most useful for discriminating between brands usually
your columns.
This figure is calculated by multiplying the mass by the distance. The total of
eigenvalues across all dimensions is total inertia.
Axis Statistics
Co-ordinates
The Co-ordinates view shows the position of each row/column point on each
axis. The overall distance of each point from the origin has already been fixed
above. The position on each axis will depend on how much of the inertia of
that row/column is explained by that dimension. On each axis the Sum of the
squared co-ordinates (i.e. squared distance) times the mass of each point
gives the inertia of that dimension.
Axes 1 and 2 represent the actual co-ordinates used to construct the default
map. Negative numbers mean that the point is on the opposite side of the
origin to positive numbers on the same axis.
13
Correspondence & Cluster Analysis
By looking at the table above you can see that on the x-axis (axis 1) the points
on the right of the map (positive values) are: Ravel, Dolcis, Next, House of
Fraser, Bally, Barratts, Debenhams, Saxone and John Lewis. Moreover,
Ravel has the biggest value of these; i.e. in this case it would be furthest to the
right. (Please note however, that it is possible to ‘flip’ your axes on the map;
consequently the above would relate to the left of the map and not the right.)
You can also sort each column by clicking on the tab label at the top of each
column. This may reveal other axes (other than axis 1 and 2), which could be
better at explaining some of your key variables.
For both of these views each row of data represents one variable. Similarly,
each of the rows sums to 100%, reflecting the importance of that axis in
explaining the variable.
These views reveal that there is more than one axis that you can use for your
analysis. Although the initial correspondence map is based upon axis 1 and 2
(the best axes to explain your variables overall), you may choose other axes
which are stronger in explaining variables which you deem as key to the
analysis.
Absolute Contributions – add to 100% down all rows or columns for a single
axis (i.e. vertical percents). Shows the percent of all inertia on that axis which
is due to that row or column.
14
Correspondence & Cluster Analysis
Relative Contributions – add to 100% across all axes for a single row or
column (i.e. horizontal percents). Shows the percent of all inertia in that row or
column which is explained by that axis.
Eigenvalues View
The Eigenvalue for each dimension gives the amount of variation explained by
that dimension. These values are used to calculate the correspondence map.
The sum of the active Eigenvalues is the total of all of the Chi² deviations for
every cell in the table. The larger the number the more a table will deviate from
expected values.
The dimensions of the Correspondence Map are trying to explain this sum and
the output shows various statistics for the dimensions that usually explain most
of the variation in the data.
%
The % column gives the percentage of variation explained by
dimension.
%+
The %+ column gives the percentage of variation explained by all
dimensions up to and including the current one.
Pie Chart
15
Correspondence & Cluster Analysis
16
Correspondence & Cluster Analysis
As you become more ambitious you may want to use one of the alternative
axes – perhaps one of the axes other than 1 or 2 show better discrimination for
the brands you are looking at. To do this, in the map view go to the View
menu and select Add Map. Alternatively you can use the add map icon
on the toolbar.
Enter a title and select the axes you wish to show on the new map. The new
map will be displayed in the Choices Viewer.
Active, Passive
Passive (Vs Active) - Points/variables start off as ‘active’ (i.e. they contribute
to the map calculations), but when made passive they no longer contribute to
the shape of the map. Assuming they have not been hidden (see below),
these passive variables are plotted in green so you can see where they would
lie on the map.
Passive points on a map have no mass, so do not affect the shape of the map.
They are excluded from the table used to calculate the map, and then their
positions are superimposed on the map afterwards. The position of a passive
row is fixed by its pattern of answers across the active columns. So a passive
row goes close to the columns it is strongest on, as with an active row. In the
same way, a passive column is fixed by answers across active rows. So
17
Correspondence & Cluster Analysis
changing points to passive means the map is redrawn excluding those points,
and then passive points are positioned afterwards. Overlaying demographics
and media means adding these as passive points. So the map is unchanged –
it is still drawn based on active rows and columns only.
NB: By using this feature the whole map is re-drawn meaning that any editing
will be lost. This is because the variables upon which the map is based are re-
calculated.
You may choose to make points passive for the following reasons:
1) Low sample sizes If you have included brands with low sample
sizes (less than 200) in your map, they
should be made passive since they can be
statistically unreliable.
Hiding Points
Hide (Vs Unhide) - Removed from view on the map; passive points will be
removed completely, and active points will still contribute to the shape of the
map but won't be shown. (You can hide items by using the mouse and right
clicking and selecting Hide. To unhide items use the View menu and select
Hidden Objects then simply select those items you wish to view).
You might want to make points hidden for the following reasons:
1) Too much data on the map If you have too many brands or statements
and you just want to focus on a few
variables you can 'hide' certain variables so
they are not plotted on the map. NB: They
will still have an influence on the shape of
the map.
18
Correspondence & Cluster Analysis
19
Correspondence & Cluster Analysis
To format and improve the appearance of the points on your map, first select
them by clicking on them and then use the right mouse button and select
properties. Alternatively select the points you wish to format and use the Point
Properties icon on the toolbar.
You will then be provided with the ‘Point Properties’ box shown below.
Depending upon what you have selected (see ‘select’ option for doing multiple
and/or specific selections) the editing menu will give you various options under
the tab names. Please note however, that the best way to learn the editing
options and appreciate how they can improve the appearance of your map is
to go in and have a go!
Font Options
Provides typical Windows style editing options.
Symbol Options
Here you have a number of options to change map symbol sizes and shape.
For example, you may wish to distinguish Row from Column points through a
different shaped symbol. It can also be used to undo the effects of the 3-D
statistics option (see p.23).
Label Options
Here you can choose from a variety of options with which to highlight your
labels, such as using sunken, raised, shortened labels or by changing their
colour.
20
Correspondence & Cluster Analysis
Printing
Maps can be printed to your default printer. The following is the group of
options that you access in Print in the usual Windows manner.
Print Setup
Here you can change the default printer and/or the paper size and orientation.
Print Preview
Use this option to preview your output (this is not accessible in the
Eigenvalues Pie Chart view).
21
Correspondence & Cluster Analysis
Overlaying Data
Once you have generated your map you can overlay any other survey data
onto the map to see where it would be placed. Because a crosstabulation must
be re-run to overlay data, any previous editing that you have done will be lost.
Common examples of the sort of information you might want to overlay are:
1) Media consumption
2) Frequency information
3) Cluster groups onto the original correspondence map
4) Non-users of the brand(s) you are interested in
5) Demographic groups such as age or social grade
♦ Work from the original spec file that you used to generate the
correspondence map. Add the extra information (e.g. TV programmes) as
columns.
♦ Re-run the correspondence map from the crosstab.
♦ Make the points you are overlaying passive first, so they don't influence
the shape of the map.
♦ Remove the less discriminating lifestyle statements.
♦ Tidy up the map.
The points that have been overlaid will now be superimposed onto your map.
These are coloured green by default.
NB: If you know beforehand that you wish to overlay other information, you
can include these from the outset. In such cases it is important to remember
to make these variables passive.
22
Correspondence & Cluster Analysis
To use the 3-D display of variables, select the Analysis Wizard icon
from the toolbar or select Analysis Wizard from the Analysis menu. Select the
option “Vary symbol size by statistic”. The statistics are split between General
and Detailed statistics (the screen you will see is shown below), choose the
option which best suits your requirements, and follow the appropriate
instructions.
23
Correspondence & Cluster Analysis
3D Correspondence Mapping
3D mapping allows you to see variables plotted on the main 3 axes of the
Correspondence map in 3D.
Once in the 3D mapping view, use the following icons on the toolbar to format the
3D view:
This “Toggle Fog” icon allows you to change the clarity of the 3D view.
The “View Labels” icon allows you to see the labels for all of the variables
plotted.
The “Small Symbols” icon changes the size of the symbol denoting the
position of the variable on the map.
The “Large Symbols” icon changes the size of the symbol denoting the
position of the variable on the map.
The “Wire Frame” icon changes the texture of the 3 axes to a wire frame
look.
The “3D Glasses” icon allows you to see the map in a fully 3-dimensional
view. Use actual 3D glasses for the full effect.
The “Move In” icon allows you to enlarge the 3D view and zoom in.
The “Move Out” icon allows you to reduce the 3D view and zoom out.
To manually rotate your 3D map, click on your left mouse button, hold down and
move in the required direction.
To choose any of these settings, select the icons you require to enable them.
All of these options can also be enabled from the “Options” and “View” menus on
the toolbar.
24
Correspondence & Cluster Analysis
Menu Commands
The file menu contains basic Windows commands for opening, closing, saving
and printing maps.
Page Setup and Print Preview allow you to change the orientation and
margins, and see how the final print will look.
Using Export, you can export the map as an enhanced metafile which can then
be inserted into Word and PowerPoint documents etc. The map statistics can
be copied and pasted in to Excel if required.
You are also able to open up the last seven files that you worked on.
Of particular note in the edit menu is the facility to give the map a title using the
‘Title…’ option (see below).
25
Correspondence & Cluster Analysis
Here, as previously mentioned, you can also change the status of variables to
and from active or passive status (i.e. changing whether particular variables
contribute or do not contribute to the map construction – initially most, if not all,
of your variables will be active).
Working much like the standard windows view menu, these options not only
include details of exactly what your hidden points are (and allow you to unhide
them), but also give you the option to flip the axis around on the map display.
You can use the select menu to select/highlight points by your specifications.
For example you can select all passive / active / row / column variables etc.
Similarly the selection wizard gives the option to do more complex selections
dependent upon the statistical values of points: You are also asked how many
of the top scoring variables you wish to select.
26
Correspondence & Cluster Analysis
Clean-up map and auto-clean-up map can also be accessed through the
select menu. Clean-up map will prompt you to “Select top Chi Distance
values for rows”. Enter the number of rows required for map. It is possible to
set this number as the default using the tick box in this dialogue box. The map
will now show just the top number of rows selected. Alternatively auto clean-
up will automatically tidy up the map taking the default number of rows set in
the clean-up map option.
27
Correspondence & Cluster Analysis
CLUSTER
ANALYSIS
28
Correspondence & Cluster Analysis
Cluster Analysis can be applied to any set of comparable variables and is commonly
used to segment people based on their responses to a series of attitudinal
statements.
Cluster Analysis can be used for example to create attitudinal groups of respondents
where-by the respondents within each group have responded similarly to a battery of
attitudinal statements. These groups can then prove to be very powerful
discriminators within a given market.
This Cluster Analysis program provides an easy means of selecting the target
population and input variables. The analysis can be run to any given level and the
results can be viewed interactively on-screen. The program provides links with the
Choices3 analysis package. This allows the definition of the target market from
within Choices and the export of selected solutions back into Choices for further
analysis.
29
Correspondence & Cluster Analysis
• In Choices3, using the original input file, from the toolbar select "Tools" and "Save
Cluster Filter File". This uses your filter as part of the cluster program and forms the
universe to be segmented.
• You should ensure that the sample size for your filter is greater than 2000.
• You will be asked if you wish to run the cluster analysis. Select "Yes".
The interpretation below consists of three stages; the first two establish if there is a minimum and a
maximum number of cluster groups that you should use, based upon some basic statistics. The last
stage is more creative and involves the user selecting the best solution (e.g. Solution 6 – which will
have 6 groups in it) for describing your market:
(i) Go to ‘Cluster Report’ and establish if there is a minimum number of groups that you can use –
when using TGI data a Variance Explained of >12 should be used.
(ii) (Also in Cluster Report) check the maximum number of groups you can use by ensuring the
smallest group figure is >200.
(iii) You will now need to decide which Cluster Solution is most appropriate:
To do this, start by looking at all the groups in Cluster Solution 3 and summarise the
characteristics of each group within it in terms of their overall attitudes (give each an
appropriate name to summarise). Next, repeat the process with the next higher Solutions e.g. 4
and then 5 and so on - You should find a point where using further cluster solutions adds no
information or indeed loses some group definition. At this stage you have found the optimum
cluster solution for dividing the market.
• In Choices go to the top toolbar and select "Tools" and "Import Cluster Solution" -
your cluster solution will appear at the bottom of your dictionary.
• These can either be used to run further crosstabs or put into the original crosstab
under columns and then run as a correspondence analysis. The solutions should be
made passive so as not to affect the map but to show where they appear in relation to
your market and lifestyle statements
30
Correspondence & Cluster Analysis
A correspondence map should be carried out first in order to get the most
discriminating lifestyle statements. After identifying the top 15-20 statements in
order of Chi Distance, print out the list of statements.
To run a cluster analysis from Choices3 you must first create a filter, or base of
respondents which the cluster analysis will use. Typically this filter is the target
market (e.g. Bottled Lager users, Heavy Shampoo users, Everyone who has
bought shoes etc). The filter should contain at least 2000 respondents.
Defining clusters from a filter less than this may result in an unreliable size for
the smaller cluster groups in your favoured solution.
♦ Within Choices3 add your target market to the filter. This would usually be
the same filter that you used for the Correspondence Analysis. You may
find it useful to look up the sample size before running the program.
You do not have to run Cluster Analysis at this point. You may start the
program later to run the analysis with the filter you have saved.
31
Correspondence & Cluster Analysis
You are faced with the dialogue box below whether you have launched the
Cluster software through Choices3 or from the Shortcut.
♦ Either select 'Start a new cluster project' and click OK. If the program is
already running select 'File/New Project' from the main menu.
♦ From the New Cluster Project dialogue select a Database to use for the
project. The cluster database must correspond to the survey you are
using in Choices3
♦ Enter a title for the analysis, and a project name. You may find it helpful to
use the same naming convention that you used for the Correspondence
Analysis. Click OK.
32
Correspondence & Cluster Analysis
The Cluster Definition window defines the filter and variables to use in the
analysis.
♦ To select a filter, click the 'Change Filter' button. The currently selected
filter is shown next to the Change Filter Button (including the sample size).
♦ Choose a filter and click the OK button.
33
Correspondence & Cluster Analysis
♦ Select the statements to use. Using the print out of the most discriminating
lifestyle statements, select them from the database listed on the left-hand
side. The variables you have selected are displayed in the right hand list.
To select a variable, highlight it and either double-click or click the
appropriate button.
To highlight more than one variable at a time, click and drag with the mouse to
highlight a range or hold down the Control (CTRL) key and click with the left-
hand mouse button to highlight non-adjacent variables.
It is recommended that you choose a maximum of 25 statements; usually 15-
20 are selected.
To run the analysis, click the run button on the speed bar. This button is only
activated if the currently active window is the Cluster Definition Window.
Run analysis:
Enter the maximum number of cluster groups to create, and click the OK
button to start the analysis process. (If you selected '6' groups, Choices would
create not only the 6-cluster solution, but also 5, 4, 3 and 2-cluster solutions.
34
Correspondence & Cluster Analysis
Where possible, the program will give an indication of progress for each part of
the process. An analysis can be cancelled at the end of each data pass. To
cancel a running analysis click the Cancel button and wait for the system to
respond at the end of the data pass.
35
Correspondence & Cluster Analysis
Once the Cluster Analysis has finished running there are various statistics that
you can print out or view on screen:
All reports are accessible from the View menu. It is best to work through the
results in the following order:
Summary Statistics
This window shows overall mean (average) and standard deviation (measures
degree of spread in answers) for total sample (target population). This
information can be useful for identifying statements with highly-skewed
distributions. All variables are normalised to a mean of 0 and a standard
deviation of 1 before the cluster analysis runs - this ensures that each variable
is given equal importance in the analysis.
36
Correspondence & Cluster Analysis
NB: Both the Cluster Solution Window and the Cluster Group Window will
appear if you click on “View” “Cluster Solution”.
This window shows information for all cluster groups within a given solution.
The report provides a way of examining an overview of a given solution - the
same figure is displayed for each group alongside each other. The main
objective of this report and the Cluster Group Report is to examine the detailed
biases within each cluster group, and build up a summary description or
'picture' for each group.
The cluster groups have been formed by grouping together individuals with
similar responses to the variables. Not everyone in a given group has
responded in exactly the same way, but there will be overall biases displayed
by one group in contrast with another. You should interpret these biases and
try to understand why these individuals have been grouped in this way.
i) Standard Deviations from the mean (Mean for Group - Mean for Sample)
Standard Deviation for Sample
ii) Absolute Deviations from the mean Mean for Group - Mean for Sample
In all cases large (positive) numbers show high agreement, low (negative)
numbers mean high disagreement.
It is recommended that you use Standard Deviation from the mean to interpret
the cluster groups. The numbers given by this statistic are the biases for this
cluster group (compared with the overall sample) which are standardised into
units of standard deviations. It is important to use this statistic since the
analysis has used standardised data when forming the groups and is
comparable between variables (the same deviation is just as meaningful on
one variable as it is on another).
This window shows overall mean (average) and standard deviation (measures
degree of spread in answers) for total sample (target population). This
information can be useful for identifying statements with highly-skewed
37
Correspondence & Cluster Analysis
The Absolute Deviation shows the average variation from the overall mean in
respondents’ answers.
Colour Coding
The cells in the report are colour-coded to help interpretation. Red numbers
represent a positive deviation whereas blue numbers represent a negative
deviation. In both cases a light colour represents a large deviation, whereas a
darker colour represents a smaller, but still important, deviation.
Light Red Positive deviation greater than 1 standard deviation from the
mean
38
Correspondence & Cluster Analysis
Choosing the most appropriate cluster solution to use is an art, and the
decision should be based on a number of things.
♦ Look at each of the cluster groups in detail, taking them back into Choices to
work out the demographic characteristics of each group. Now see which
groups make sense and which are interesting. Compare this with other cluster
solutions to see which interesting groups you have gained or lost. Does the
analysis meet the original aims, or should you consider using a different target
or a different set of variables?
♦ In going from say 5 groups to 6 groups, what tends to happen is that you
maintain the 5 groups and gain a new group, i.e. the new solution will not
change dramatically. You might also lose one group and gain 2 new ones. Try
to identify which groups are new and which are lost. Ask yourself if the new
groups are useful, or whether you have lost a very interesting group.
♦ The sample size of the smallest group is important since this will restrict the
detail of further analysis that can be sensibly performed on that group in
Choices. If you were interested in examining brands with small penetrations,
you should use fewer groups but with larger sample sizes in each group.
The number of groups from 2 up to the The % variance in the total sample explained by
maximum specified for the run splitting the sample into the given number of
groups
The % of variance explained indicates how much of the original variance in the
data is explained by splitting the respondents into 2,3,4,5 groups etc. The level
of variance achieved will vary depending on the nature of the data being
analysed. For example, a larger sample size and a larger number of variables
will tend to give smaller levels of explanation. Guidelines to work with are that
15% is the minimum acceptable and it is rare to ever get above 30%. Size of
39
Correspondence & Cluster Analysis
smallest and largest groups are shown as a summary. You don’t want a very
small group, or a very large one.
The more cluster groups you have, the more variance is explained (which is
good). However, the variance explained by additional cluster levels will rise by
a diminishing amount each time. Large numbers of cluster groups can become
unmanageable, and may yield low sample sizes.
40
Correspondence & Cluster Analysis
Once an analysis has been run the next step is usually to take the resultant
groups back into Choices for further interpretation. They can be
crosstabulated against anything else in the survey. In order to do this you need
to tell the system which groups to make available for analysis in Choices. This
is not done automatically since generally only one solution is needed, and
most analyses will generate several possible solutions.
Choices3
Choose the cluster solution(s) you wish to import into the Choices3 dictionary.
NB The files for the cluster solutions are saved with the data, so once imported
everyone who accesses Choices3 on you system is able to view the solutions
you import.
Once imported, the groups in the solution may be selected like any other
variable in the survey. (See previous screen)
41
Correspondence & Cluster Analysis
Clusters can now be crosstabulated against anything else in the survey. They
can also be set up as targets in media analysis.
For further analysis you may find it helpful to save a definition file with the
group names edited.
42
Correspondence & Cluster Analysis
It is helpful to see where the groups lie in relationship to each other on the
original Correspondence Map. This may also help in deciding which of the
cluster solutions you are going to use for targeting purposes. You may find
that one group is a more refined version of another group and hence will
respond to the same targeting strategy
The cluster groups generated from the example of the Shoe Market have been
overlaid onto the original correspondence map.
It is very important at this stage to make your cluster groups passive. This
is done before going through the process of ‘tidying’ up the Correspondence
Map as before.
43
Correspondence & Cluster Analysis
1) Crosstab
Set a filter of “Bought shoes in last 12 months”. This can be done by coding all
the shoe retailers together.
With the brands of shoe shop in the answer panel, right click and select sample
and weighted from the context menu. Then click on the word sample to sort by
sample size to identify any brands with a low sample size. These should not be
used in the analysis, and can either be deleted or combined with other brands if
appropriate.
Add the shoe shop brands to the columns and add ‘Any Agree’ Lifestyle
statements to the rows. There is a definition file set up which you can use or
alternatively you can select the lifestyle statements from the dictionary yourself.
Save the crosstab as a spec file. You may find it beneficial to use the same file
naming convention for the spec file, correspondence map and cluster project. Run
the crosstab.
Print the lifestyle statements needed for your cluster project from the row statistics
view.
After saving the filter in the Coding Window you may proceed to the Cluster
Analysis module. Alternatively the Cluster Analysis module may be opened by
means of a separate shortcut under Start/Programs/ .
Choose the correct survey database. This is the same as the survey you are
using in the coding window. Give the project a title i.e. ‘Shoe Market Place’ and a
Name i.e. ‘Shoes’.
From the list of lifestyle statements on the left-hand side of the screen, select from
your printout the top 15-20 statements that you have previously identified from the
row statistics.
Now check that all the statements chosen (i.e. in the right-hand column) are
relevant to the market chosen. If you think they are not relevant, or are too similar
to another statement, click on the left-arrow button to remove them from the list.
You may decide to include statements that were not so important in terms of Chi
Distance on the Correspondence Map.
44
Correspondence & Cluster Analysis
Click on the button, and select how many cluster solutions to generate.
Since the software will generate every cluster solution up to and including the one
you select, it is better to choose too many rather than too few. Typically, any
number between 4 and 8 might be chosen but this will vary on the market you are
looking at.
Once the cluster analysis has run, there are some statistics you should look at
within the cluster software which will help you to understand the cluster groups.
These are explained within the 'Interpreting the results' section on p36. However,
the "View Cluster Solution" option will be of particular interest.
By looking at the top 5 or so statements in each group of the solution, you can
begin to get a feel for how the cluster groups vary attitudinally from each other.
Those marked with a negative sign mean the respondents don't agree with the
statement.
You may find it helpful to print the list of lifestyle statements for each of the cluster
groups for each solution generated. The lifestyle statements are ranked in
descending order for every group. It can be helpful to give the cluster groups
names. We are looking to identify the cluster groups as people we may see in the
street.
Within the Cluster software examine each of the groups in each cluster solution.
One of the most powerful ways of interpreting the cluster groups is to import them
back into Choices for further analysis.
First, import your chosen cluster group(s) into Choices by selecting Tools/Import
cluster solution on the Choices3 Coding Window menu bar.
45
Correspondence & Cluster Analysis
What you choose to look at will vary by the market you are looking at, but typically
you might want to know the following:
i) Demographic Profile
CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4
Social Grade DE, Age 15-34, Social Social Grade DE, Social Grade AB,
Age 65+ or <24, Grade ABC1, Age 55+, Not Age 35-64, Work
Not working Working F/T working P/Time
iv) TV Programmes
CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4
Family Affairs Streetmate Family Guy Gardener’s World
Doctors The Priory Stargate Horizon
Wheel of Fortune CD UK Driven Timewatch
That’s Esther She’s Gotta Have It Top Gear Dispatches
City Hospital Dawson’s Creek Robot Wars Panorama
Trisha Hollyoaks F1 Grand Prix Watchdog
46
The Statistics Explained
Cluster Analysis
If the population distribution in a sample is not homogenous, respondents often
clump together in clusters. Gaps may also indicate that there is a mixture of
several displaced distributions.
Cluster Methodology
The clustering process begins with comparing the distance of each
observation from the mean vectors ('Centroids') of each of the proposed
clusters in the sample of n observations. The observation is assigned to the
cluster with the nearest mean vector. The distances are recomputed and
reassignments are made as necessary. The process continues until all
observations are in clusters with minimum distances to their mean vectors.
K-Means Algorithm
The cluster analysis program uses a K-Means partitioning algorithm. A partitioning
algorithm moves from a smaller number of groups to a larger number of groups, as
opposed to a joining algorithm, which does the reverse.
The program works up from an initial starting point of one group (the target
population), then builds a 2-cluster solution, a 3-cluster solution and so on up to a
maximum number of clusters specified by the user. This is known as a 'Multi-K-Means'
analysis, where 'K' refers to the number of clusters chosen by the user. The starting
partition at each instance is derived by splitting an existing group into 2. The program
selects and splits the group with the greatest variance when moving from level to level.
Squared distances are taken for every individual. The individual with the highest score
is taken, and then the person who is the most dissimilar.
i) Standardise Variables
Variables are standardised (normalised) to a mean of 0 and a variance of 1, which
ensures that each variable is given an equal weighting in the analysis.
If the number of groups is still smaller than the number required, the group with the
largest variance is split, and phases (ii) and (iii) are repeated.
Variance
The variance is the average of the sum of squared deviations.
The smaller the variance in the population, the more accurate will be a sample
taken from that population.
Standard Deviation
Once squared, the square root of the variance is taken as a measure of
dispersion.
Further reading:
Hartigan, J.A (1975) Clustering Algorithms (Wiley, New York)
Everitt, B (1974) Cluster Analysis (Heinemann, London)
MacQueen, J (1967) "Some methods for classification and analysis of
multivariate observations", Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability, University of California Press,
Berkeley
48
Correspondence & Cluster Analysis
FILE
New Project Creates a new cluster project and makes it the active window. You will be
prompted to save the document when you close the application.
Open Project Displays the Open File dialog box, so you can select a file to load into a new
document window. You can also create a new document by naming a file that
does not currently exist.
Close Project Takes you back to the initial screen where you can start a new cluster project or
work on an existing cluster project.
Save Project Saves the document in the active window. If the document is unnamed, the
Save As dialog box is displayed so you can name the file, and choose where it
is to be saved.
Save As Allows you to save a document under a new name, or in a new location on disk.
The command displays the Save File As dialog box. You can enter the new file
name, including the drive and the directory. All windows containing this file are
updated with the new name. If you choose an existing file name, you are asked
if you want to overwrite the existing file.
Print This prints the contents of the active window. Use File/Print Preview to see how
the document will be laid out on printer pages. Use File/Print Setup to select a
printer, and to set printer options.
Print Preview This opens a special window that shows how the active document will appear
when printed. The preview window shows one or two pages of the active
document as they would be laid out on printed pages. Controls on the window
allow you to page through the pages of the document.
Print Setup This displays the Printer Setup dialog box, which allows you to select and
configure the printer to be used.
Import filter Here you can upload a filter that was saved in Choices 1.
Exit This takes you out of the Cluster program. Make sure that you have saved your
file first.
EDIT
Copy Enables user to copy any highlighted data and paste into documents such as
Word, Excel etc
VIEW
Project Displays the active project window.
Summary statistics For each variable (statement) shows mean and standard deviation.
Main Report For each solution, shows the amount of variance and the size of the smallest
and largest groups in each solution. Allows you to view any individual cluster
solution.
Cluster Report For each solution, shows the amount of variance and the size of the smallest
and largest groups in each solution.
Cluster Solution Displays, for a given solution, each group; shows the standard deviation from
the mean, absolute deviation from the mean and absolute mean for each
statement.
Cluster Group Enables user to view the different groups within a cluster solution.
Cluster Log Provides a record of how each cluster group was arrived at, with the number of
passes and points moved.
49
Correspondence & Cluster Analysis
ANALYSIS
Create membership data Choices 1 - creates a file that you can load into Choices 1.
Create membership data Choices 2 - creates a file that you can load into Choices 2.
OPTIONS
System Options This shows the directories used by Choices, and may be useful if you want to
know where certain files are stored.
Project options Lets you change the title of the project and the number of clusters.
WINDOW
Cascade Displays all the available windows in overlapped form, so that the title bar of each
is visible.
Arrange Icons Arranges all iconized windows into rows along the bottom of the applications main
window.
Close All Lets you close all windows, report windows, cluster group windows or cluster
solution windows.
HELP
Help Provides help on running a cluster analysis.
50