Sei sulla pagina 1di 51

Guide to using

Correspondence &
Cluster Analysis
Correspondence & Cluster Analysis

Table of Contents

Welcome………………………………………………………………………… 2

Aims of this Guide……………………………………………………… 2


Queries and Support…………………………………………………… 2

Correspondence Analysis…………………………………………………… 3

What is Correspondence Analysis?………………………………….. 4


Correspondence Analysis step by step……………………………… 5
Setting up the crosstab in Choices3…………………………………..5
Editing the Correspondence Map…………………………………….. 5
Interpreting the map……………………………………………………. 6
The Statistics View……………………………………………………... 11
Headings in the General Statistics Views for rows and column…… 12
Axis Statistics…………………………………………………………… 13
Eigenvalues Table View……………………………………………….. 15
Looking at the map in different ways…………………………………. 17
Formatting the map display…………………………………………….20
Printing……………………………………………………………………21
Overlaying Data………………………………………………………… 22
Incorporating 3-D Statistics into your map……………………………23
3D Correspondence Mapping………………………………………… 24

Cluster Analysis……………………………………………………………….. 28

What is Cluster Analysis………………………………………………..29


Cluster Analysis step by step…………………………………………..30
How to set up a Cluster Analysis…………………………………….. 32
Interpreting the results…………………………………………………. 37
Summary Statistics…………………………………………………….. 37
Cluster Report Window………………………………………………… 37
Cluster Solution Window………………………………………………. 38
Cluster Groups Window……………………………………………….. 38
How many cluster groups should I choose?………………………… 41
Taking your clusters back into Choices3…………………………….. 43
Overlaying your cluster solution onto the original map…………….. 45

An example TGI Cluster Analysis: The Shoe Market…………………….46

Crosstab…………………………………………………………………. 46
Selection of Lifestyle Statements…………………………………….. 46
Run Cluster Analysis…………………………………………………… 47
Interpreting the results…………………………………………………. 47
Importing the clusters back into Choices3……………………………47

The Statistics Explained………………………………………………………49

1
Correspondence & Cluster Analysis

Welcome
Thank you for licensing this product from KMR-SPC Software.

Aims of this Guide

The aim of this guide is to help you to run a Correspondence Analysis and if
appropriate a Cluster Analysis from the results of the Correspondence Map.

Correspondence Analysis is an integrated part of the Choices3 software and


results are shown in the Choices Viewer. Cluster Analysis is a module of the
Choices3 software and functions as part of the Choices3 software.

Advancing technology and the on going development by the team at KMR


Software has allowed these advanced statistical techniques to be available on
your PC desktop. The processing takes a remarkably short amount of time.
However the essence of these techniques is the time and thought put in by you
to get the results out of the analyses to back up the strategy or story you want
to present. Please set aside enough time to make considered decisions about
the results of the analysis. The software is powerful enough to allow you to do
this without waiting long periods of time for the results.

Recent enhancements to Correspondence Analysis include a new “clean-up”


tool that allows quick visual interpretation of the map. Colour coded reports
provide another unique perspective on relationships between variables on the
map.

Training in these techniques is available from KMR Software as part of your


licence agreement.

Queries and Support

Please call the helpdesk with any queries on +44 (0)20 7831 5455 or email on
helpdesk@kmrspc.com asking for the Choices3 team.

2
Correspondence & Cluster Analysis

CORRESPONDENCE
ANALYSIS

3
Correspondence & Cluster Analysis

What is Correspondence Analysis?

Correspondence Analysis is a market segmentation technique that graphically


represents the relationship between brands or products and other variables
such as attitudes, media titles etc.

It is also used as a preliminary step to Cluster Analysis, determining the most


discriminatory Lifestyle statements for the chosen market.

Correspondence Analysis runs from a crosstab. Usually, the brands or


products are the columns and the attitudinal statements (or other variables)
are the rows.

An Example of a Basic Correspondence Map

Types of Major Shoe Retailers Vs Attitudes

The following is a basic map plotting attitudinal statements against the shoe
shops that people have stated they use.

4
Correspondence & Cluster Analysis

Correspondence Analysis Step by Step

Setting up the crosstab in Choices3

• Enter the target market (usually your brands) in the columns checking sample sizes
are greater than 200
• Enter either any agree or definitely agree lifestyle statements as rows
• Enter ‘all users’ of the market as your filter (If you intend to run Cluster analysis the
sample must be greater than 2000)
• Edit your headings so they are concise (in ‘Edit Table’ area)
• [If you want to ‘overlay’ info. enter this into columns (e.g. demographics/media)]
• Save and then Run the crosstab
• Select correspondence analysis using the icon or going to the “Analysis”
options
The Correspondence map will be generated within the Choices Viewer along with the
related statistics.

Editing the Correspondence Map

At this stage, before editing the map, you will want to select the statements that best
describe your map and eliminate the rest. There are two methods available:

Manual clean-up method

• In the Choices Viewer, select the Statistics view and expand General Statistics
• Click on "Rows"
• Click on the "Dist" column (this sorts the rows by ‘Chi-distance’)
• Right-click on the rows and choose "Select top n…" and then choose the number of
statements you wish to include in the map (usually about 15-30)
• Right-click and choose "Invert selection"
• Right-click and select "Change status" …and then "to passive"
• Select the map from the analysis tree
• Right click on the map and choose ”Select” and “All passive rows”
• Right click again on the map and select "Hide"
• Edit the map by moving the labels and changing text where necessary
• To rename the map, from the toolbar select “Edit” and “Title”
• To insert labels for the x and y axis, from the toolbar select “Insert” and “New label”
• If you are going on to do a cluster analysis - print the statements used in the map:
Ensure you are in the “Statistics" view and then choose "File" and "Print"

Clean-up method

The clean-up method simply requires the user to specify the number of rows to select in
order to tidy up the map.

• From the “Select” menu, choose “Clean-up Map”


• When prompted to “Select top Chi Distance values for rows” enter the number of
rows required for map. It is possible to set this number as the default using the tick
box in this dialogue box.
• The map will now show just the top number of rows selected.
• Alternatively auto clean-up will automatically tidy up the map taking the default number
of rows set in the clean-up map option. From the “Select” menu, choose “Auto clean-
up map” or use the icon.

• If you are going on to do a cluster analysis, print the statements used in the map:
ensure you are in the Rows view of General Statistics and then choose “File” and
“Print”.

5
Correspondence & Cluster Analysis

Interpreting the Correspondence Map

(Firstly, you should ensure the variance of the map is sufficiently high – the
combined variance for axis 1 and 2 needs to be over 60%.

(ii)Assessing the relationship between two brands is done by measuring the angle between the
lines that are drawn from the two brands to the centre (origin) of the map: An angle closer
to 0°/180° means a higher positive or negative relationship respectively between the brands.
Otherwise right-angles between brands, or thereabouts (i.e. angles of 90% or 270%)
indicates little or no relationship.

(iii)Assessing the relationship of brands to a statement is done by taking a line from a


statement, going through the ‘origin’ to the other side of the map. The distance of brands to
the statement, along this line, determines the strength of relationship. Again this is a
positive relationship if the brand is on the same and negative if on the opposite side of the
map. The closer the brand is to the origin along the statement line, the weaker the
relationship. The further out towards the edges of the map, the stronger the relationship.

Interpreting the Map

The Correspondence Analysis program will search for correlation within the
data and will produce a map based on the two 'themes' which were strongest
within the data. The most important theme will form the basis of the x-axis and
the second most important, the y-axis.

For instance, in the previous example above (on p 4), one end of the x-axis
might reflect ‘Real Men’ who believe real ale is the only beer worth drinking
and that skincare products are for women and the other end ‘Image
Conscious’ people who are more concerned with fast cars and designer
clothes.

In this case, as with all correspondence maps, the vertical or ‘y’-axis is much
less important than the horizontal (it has a relatively low ‘contribution level’ –
discussed later in the manual). However, you might differentiate between
those whose attitudes lean towards being financially aware versus those who
are more traditional.

Now we will run through a number of key questions you might ask about the
Correspondence Analysis Programme. Remember, if the data doesn't seem to
present any distinct patterns, you may need to study the combination of
variables that you are using and/or re-run your analysis.

The importance of the ‘Variance Explained’ figure

6
Correspondence & Cluster Analysis

The variance explained figure is a measure of how well the map is explaining
the variables in it. Ideally on a survey such as TGI at least 60% of the
variation within the market should be explained by the first 2 axes. However, in
reality this may not happen, especially if very few of the brands or variables
overlap (e.g. the statements “My diet is mainly vegetarian” and “ I am a
vegetarian”). If you are in the map view itself this information is given in the
bottom left hand corner of the map.

If the figure is low (we would recommend for a correspondence map that the
minimum acceptable level is 60%) it indicates that these axes do not give a
sufficient explanation of the data. Thus the calculations are probably not
significant enough to create a whole map and the map will not sufficiently
explain the differences between the brands. Note that statistically any set of
data will contain some variance but not all are sufficiently strong. Also, users
of some products might be very similar attitudinally and might be better
differentiated against other variables such as demographics.

What is being expressed along each axis?

Each axis should reflect a dimension within the data, which can be summed up
or described by the user using appropriately descriptive labels. Examples of
dimensions might be introverted / extroverted or traditional / innovative.

The correspondence map can plot any 2 dimensions and will plot the two
strongest ones. However, you should also look at the other axes to see how
other polarities express themselves within the data. This is explained on
page17.

Which brands are the most important?

The brands around the centre of the map will be those that are 'average', or
not as strongly differentiated as the brands around the outside of the map.
Brands near the edge of the map are those which have more extreme variation
or differences from other brands and attitudes. In practice these might be the
smaller brands which may attract a more specialist or distinctive consumer.

How do I measure relationships between variables on the map?

There are two main measurements that you can make with a ruler and/or a
protractor shown below. You should remember that the x-axis would have
been stretched or shrunk to fit on your screen so it will not be shown true to
scale.

1) Making comparisons between brands:

To find out the correlation between two brands, simply draw a line from each
one to the origin, and measure the angle between them. An angle of 0º
represents 100% correlation, 180º shows 100% negative correlation, and 90º
(or 270º) shows no correlation. Brands B and C are diametrically opposite; i.e.
there is a strong negative correlation. It is important to know that these brands
are opposites in the market. This is as opposed to A and B, which are

7
Correspondence & Cluster Analysis

positioned in a similar area of the map and consequently have similar market
positions.

Column vs Column Analysis View

Alternatively the Column vs Column Analysis can be used to compare the


relationships between brands. Each brand is taken in turn (shown at the top of
the table, in the example below the brand is Clarks) and the analysis
presented in the form of a colour-coded table. The brands shown in red have
a close correlation with Clarks whereas those shown in white have no
correlation. Those brands shown in blue have a strong negative correlation
with Clarks, for example Dolcis and Clarks are opposites in the shoe market.

RED

BLUE

Use the next target buttons to scroll through the different

8
Correspondence & Cluster Analysis

brands and view the relevant analysis

2) Making comparisons between statements and a brand

Relationship becoming
more strongly positive.

Relationship becoming
more strongly negative

You can see how different statements relate to a brand. Draw a line from the
brand through the origin, and then draw perpendicular lines from each
statement to the line (i.e. at 90º).

The relationship between the Brand A and the lifestyle statements X,Y and Z is
shown by the point where the statement’s intersection line hits the Brands
origin line. Positive relationships lie on the same side of the origin as the
brand. Negative relationships lie on the other side of the origin to the brand.
In the example shown above consumers of Brand A have a strong agreement
with statement Z. Consumers of Brand A disagree more strongly with
Statement X than Statement Y. The closer the brand is to the origin along the
statement line, the weaker the relationship. The further out towards the edges
of the map the brand is, the stronger the relationship.

Column vs Row Analysis

Alternatively the Column vs Row Analysis can be used to compare the


relationships between brands and lifestyle statements. Each brand is taken in
turn (shown at the top of the table, in the example below the brand Ravel) and
again the analysis is presented in the form of a colour-coded table. Ravel
shoppers have a strong agreement with the lifestyle statements shown in red
whereas they have a strong disagreement with those statements shown in
blue.

Example:
Look at the top 12 statements in the list (in red, the closest to Ravel) and try to
find a common theme. In this example, these statements could be part of the
“Image Conscious” theme.

9
Correspondence & Cluster Analysis

Use the next target buttons to scroll through the different


brands and view the relevant analysis.

RED

BLUE

10
Correspondence & Cluster Analysis

The Statistics View

In this example we will use a very straightforward map showing a selection of


shoe shops people might use against attitudinal statements, to explain the
various statistics. As before, the map itself will look something like this:

The statistics view contains information, which will allow you to describe your
correspondence map in more detail. An example of the Column statistics view
is given below along with explanations of each of its components and how they
might be used.

Please note that by clicking on the column heading (e.g. ‘Mass’, ‘Inertia’)
enables you to sort by that statistic in descending order.

11
Correspondence & Cluster Analysis

The statistics of a Correspondence Map are based on the Chi-squared statistic,


which measures deviations from expected values. The inertia is the chi-
squared statistic divided by grand total of all cell entries in the table. This total
inertia is what the correspondence map will explain. The total of eigenvalues
across all dimensions is the total inertia. A process similar to factor analysis re-
allocates this inertia between a series of dimensions, which will be the axes of
the map.

Headings in the General Statistics Views for Rows and Columns:

These numbers represent the original numeric order of the variables that were
assigned immediately after the creation of the correspondence map.
Subsequently you may use this row / column to re-order your variables to their
original order should you so wish.

Key

The key represents a code reference for your variable. Note: A default code is
given to each variable if no code can be found.

Mass

The Mass figure represents the percent of data in the crosstab that is in that
row or column.

This is most useful if your map is based upon ‘projected’ figures (i.e. the 000s
figure in your crosstab), rather than the ‘Vertical Percent’ since then the mass
would represent the size of the brand.

NB Choices will automatically use Vertical Percent as your map basis. This
means that your brands are measured in terms of the percentage of those

12
Correspondence & Cluster Analysis

using it. Please contact the KMR-SPC Helpdesk if you would like advice on
using different statistics as your map basis.

Distance (‘Dist’) / Chi² Distance

‘Distance’ refers to ‘Chi² Distance’ on the map, this figure is important for
measuring the distance of variables from the centre of the map, or the ‘origin’.
This Distance is the squared distance of row/column point from the origin of
the map; Inertia of row/column divided by its % mass

Chi² Distances are statistical values used to make the correspondence map.
The higher the value the more discriminating the attribute. They are most
useful for assessing the discrimination power of your attributes in a
conventional correspondence map.

Chi² Distances

The chi² distance measures how well theoretical data 'fits' observed data. It is
calculated by measuring an 'expected' value for each cell and comparing this
with the actual observed data. The 'expected' value is that which would occur if
there were no relationship between the row and column.

Brands with large differences between observed and expected values will have
a high distinctiveness, while those with an average performance will have low
distinctiveness. In the map, distinctiveness corresponds to the distance from
the origin, but measured over all the dimensions not just the two shown on the
map. Often a small brand has the most distinctive image.

Inertia

This figure shows how strongly each variable contributes to determining the
overall shape of the map, and is a breakdown of the ‘variance explained’
figure. You will find it most useful for discriminating between brands usually
your columns.

This figure is calculated by multiplying the mass by the distance. The total of
eigenvalues across all dimensions is total inertia.

Axis Statistics

Co-ordinates

The Co-ordinates view shows the position of each row/column point on each
axis. The overall distance of each point from the origin has already been fixed
above. The position on each axis will depend on how much of the inertia of
that row/column is explained by that dimension. On each axis the Sum of the
squared co-ordinates (i.e. squared distance) times the mass of each point
gives the inertia of that dimension.

Axes 1 and 2 represent the actual co-ordinates used to construct the default
map. Negative numbers mean that the point is on the opposite side of the
origin to positive numbers on the same axis.

13
Correspondence & Cluster Analysis

By looking at the table above you can see that on the x-axis (axis 1) the points
on the right of the map (positive values) are: Ravel, Dolcis, Next, House of
Fraser, Bally, Barratts, Debenhams, Saxone and John Lewis. Moreover,
Ravel has the biggest value of these; i.e. in this case it would be furthest to the
right. (Please note however, that it is possible to ‘flip’ your axes on the map;
consequently the above would relate to the left of the map and not the right.)

You can also sort each column by clicking on the tab label at the top of each
column. This may reveal other axes (other than axis 1 and 2), which could be
better at explaining some of your key variables.

Absolute & Relative Contributions

For both of these views each row of data represents one variable. Similarly,
each of the rows sums to 100%, reflecting the importance of that axis in
explaining the variable.

These views reveal that there is more than one axis that you can use for your
analysis. Although the initial correspondence map is based upon axis 1 and 2
(the best axes to explain your variables overall), you may choose other axes
which are stronger in explaining variables which you deem as key to the
analysis.

Absolute Contributions – add to 100% down all rows or columns for a single
axis (i.e. vertical percents). Shows the percent of all inertia on that axis which
is due to that row or column.

14
Correspondence & Cluster Analysis

Relative Contributions – add to 100% across all axes for a single row or
column (i.e. horizontal percents). Shows the percent of all inertia in that row or
column which is explained by that axis.

It is better to look at the Absolute Contribution view to assess your attributes


and at the Relative Contribution view to check your brands.

Eigenvalues View
The Eigenvalue for each dimension gives the amount of variation explained by
that dimension. These values are used to calculate the correspondence map.

Eigenvalues table view

The sum of the active Eigenvalues is the total of all of the Chi² deviations for
every cell in the table. The larger the number the more a table will deviate from
expected values.

The dimensions of the Correspondence Map are trying to explain this sum and
the output shows various statistics for the dimensions that usually explain most
of the variation in the data.

Dividing the Eigenvalue of each dimension by the sum of the Eigenvalues


gives the % of variation explained by that dimension. The dimensions are
always listed in descending order of importance.

%
The % column gives the percentage of variation explained by
dimension.

%+
The %+ column gives the percentage of variation explained by all
dimensions up to and including the current one.

Pie Chart

The pie chart gives a graphical representation of the percentage of variation


explained by dimension.

15
Correspondence & Cluster Analysis

16
Correspondence & Cluster Analysis

Looking at the Map in Different Ways


Selecting Axes

As mentioned previously, a correspondence map will find many axes, of which


only axes 1 and 2 are used in your initial map and which reflect the greatest
variation in your market. Thus axes 1 and 2 will become the X-axis and the Y-
axis respectively.

As you become more ambitious you may want to use one of the alternative
axes – perhaps one of the axes other than 1 or 2 show better discrimination for
the brands you are looking at. To do this, in the map view go to the View
menu and select Add Map. Alternatively you can use the add map icon
on the toolbar.

Enter a title and select the axes you wish to show on the new map. The new
map will be displayed in the Choices Viewer.

Active, Passive

Points on the map can be made:

Passive (Vs Active) - Points/variables start off as ‘active’ (i.e. they contribute
to the map calculations), but when made passive they no longer contribute to
the shape of the map. Assuming they have not been hidden (see below),
these passive variables are plotted in green so you can see where they would
lie on the map.

Passive points on a map have no mass, so do not affect the shape of the map.
They are excluded from the table used to calculate the map, and then their
positions are superimposed on the map afterwards. The position of a passive
row is fixed by its pattern of answers across the active columns. So a passive
row goes close to the columns it is strongest on, as with an active row. In the
same way, a passive column is fixed by answers across active rows. So

17
Correspondence & Cluster Analysis

changing points to passive means the map is redrawn excluding those points,
and then passive points are positioned afterwards. Overlaying demographics
and media means adding these as passive points. So the map is unchanged –
it is still drawn based on active rows and columns only.

NB: By using this feature the whole map is re-drawn meaning that any editing
will be lost. This is because the variables upon which the map is based are re-
calculated.

You may choose to make points passive for the following reasons:

1) Low sample sizes If you have included brands with low sample
sizes (less than 200) in your map, they
should be made passive since they can be
statistically unreliable.

2) Low chi-distance Although you can put as many lifestyle


statements as you want on the map, the
map can become very cluttered.
Consequently we recommend that you use
around 15 to 20 statements.

3) Additional variables Correspondence maps show the relationship


between two sets of variables. Nevertheless,
it can be interesting to overlay other
unrelated variables onto the map. These
new variables should be made passive, or
they may influence and alter the shape of
the map i.e. media overlays.

Hiding Points

Hide (Vs Unhide) - Removed from view on the map; passive points will be
removed completely, and active points will still contribute to the shape of the
map but won't be shown. (You can hide items by using the mouse and right
clicking and selecting Hide. To unhide items use the View menu and select
Hidden Objects then simply select those items you wish to view).

You might want to make points hidden for the following reasons:

1) Too much data on the map If you have too many brands or statements
and you just want to focus on a few
variables you can 'hide' certain variables so
they are not plotted on the map. NB: They
will still have an influence on the shape of
the map.

2) Passive Brands If variables have been made passive


because their chi-distance was too low to be
of interest, they can be hidden to stop them
cluttering up the map.

18
Correspondence & Cluster Analysis

19
Correspondence & Cluster Analysis

Formatting the Map Display

To format and improve the appearance of the points on your map, first select
them by clicking on them and then use the right mouse button and select
properties. Alternatively select the points you wish to format and use the Point
Properties icon on the toolbar.

You will then be provided with the ‘Point Properties’ box shown below.
Depending upon what you have selected (see ‘select’ option for doing multiple
and/or specific selections) the editing menu will give you various options under
the tab names. Please note however, that the best way to learn the editing
options and appreciate how they can improve the appearance of your map is
to go in and have a go!

Font Options
Provides typical Windows style editing options.

Symbol Options
Here you have a number of options to change map symbol sizes and shape.
For example, you may wish to distinguish Row from Column points through a
different shaped symbol. It can also be used to undo the effects of the 3-D
statistics option (see p.23).

Label Options
Here you can choose from a variety of options with which to highlight your
labels, such as using sunken, raised, shortened labels or by changing their
colour.

Label Text Options


Using this section you can edit the text of your selected point. You may find
these options particularly useful because they allow you to change the text of
your selection to upper or lower case. Note: You will only get this option
when one point is selected.

20
Correspondence & Cluster Analysis

Printing

Maps can be printed to your default printer. The following is the group of
options that you access in Print in the usual Windows manner.

Print Setup
Here you can change the default printer and/or the paper size and orientation.

Print Preview
Use this option to preview your output (this is not accessible in the
Eigenvalues Pie Chart view).

Adding a Map Title


Ensure before you print, you have checked/edited the map title. Using the
mouse click on the map and select Title from the Edit menu.

Any of the views can be printed in the normal Windows manner. We


recommend that you use the report for Rows General Statistics to take the
most discriminating lifestyle statements into Cluster analysis.

21
Correspondence & Cluster Analysis

Overlaying Data

Once you have generated your map you can overlay any other survey data
onto the map to see where it would be placed. Because a crosstabulation must
be re-run to overlay data, any previous editing that you have done will be lost.
Common examples of the sort of information you might want to overlay are:

1) Media consumption
2) Frequency information
3) Cluster groups onto the original correspondence map
4) Non-users of the brand(s) you are interested in
5) Demographic groups such as age or social grade

To overlay data you should follow these steps:

♦ Work from the original spec file that you used to generate the
correspondence map. Add the extra information (e.g. TV programmes) as
columns.
♦ Re-run the correspondence map from the crosstab.
♦ Make the points you are overlaying passive first, so they don't influence
the shape of the map.
♦ Remove the less discriminating lifestyle statements.
♦ Tidy up the map.

The points that have been overlaid will now be superimposed onto your map.
These are coloured green by default.

NB: If you know beforehand that you wish to overlay other information, you
can include these from the outset. In such cases it is important to remember
to make these variables passive.

22
Correspondence & Cluster Analysis

Incorporating 3-D Statistics into your Maps

One of the newer features of Correspondence Analysis is the ability to


represent key statistics on the map itself. This 3-D effect is achieved through
varying the size of points, so that larger points reflect larger values.
Consequently you can have information about the relative importance of
variables incorporated into the map.

To use the 3-D display of variables, select the Analysis Wizard icon
from the toolbar or select Analysis Wizard from the Analysis menu. Select the
option “Vary symbol size by statistic”. The statistics are split between General
and Detailed statistics (the screen you will see is shown below), choose the
option which best suits your requirements, and follow the appropriate
instructions.

23
Correspondence & Cluster Analysis

3D Correspondence Mapping
3D mapping allows you to see variables plotted on the main 3 axes of the
Correspondence map in 3D.

From the correspondence map in the Choices Viewer, click on the 3D


icon on the top right of the tool bar:

Once in the 3D mapping view, use the following icons on the toolbar to format the
3D view:

This “Toggle Fog” icon allows you to change the clarity of the 3D view.

The “View Labels” icon allows you to see the labels for all of the variables
plotted.

The “Small Symbols” icon changes the size of the symbol denoting the
position of the variable on the map.

The “Large Symbols” icon changes the size of the symbol denoting the
position of the variable on the map.

The “Wire Frame” icon changes the texture of the 3 axes to a wire frame
look.

The “3D Glasses” icon allows you to see the map in a fully 3-dimensional
view. Use actual 3D glasses for the full effect.

The “Play” icon allows the 3D view to be automatically rotated.

The “Stop” icon ends the automatic rotation of the 3D view.

The “Move In” icon allows you to enlarge the 3D view and zoom in.

The “Move Out” icon allows you to reduce the 3D view and zoom out.

To manually rotate your 3D map, click on your left mouse button, hold down and
move in the required direction.

To choose any of these settings, select the icons you require to enable them.

All of these options can also be enabled from the “Options” and “View” menus on
the toolbar.

24
Correspondence & Cluster Analysis

Menu Commands

The FILE Menu

The file menu contains basic Windows commands for opening, closing, saving
and printing maps.

Page Setup and Print Preview allow you to change the orientation and
margins, and see how the final print will look.

Using Export, you can export the map as an enhanced metafile which can then
be inserted into Word and PowerPoint documents etc. The map statistics can
be copied and pasted in to Excel if required.

You are also able to open up the last seven files that you worked on.

The EDIT menu

Of particular note in the edit menu is the facility to give the map a title using the
‘Title…’ option (see below).

25
Correspondence & Cluster Analysis

Here, as previously mentioned, you can also change the status of variables to
and from active or passive status (i.e. changing whether particular variables
contribute or do not contribute to the map construction – initially most, if not all,
of your variables will be active).

The VIEW menu

Working much like the standard windows view menu, these options not only
include details of exactly what your hidden points are (and allow you to unhide
them), but also give you the option to flip the axis around on the map display.

The SELECT menu

You can use the select menu to select/highlight points by your specifications.
For example you can select all passive / active / row / column variables etc.

Similarly the selection wizard gives the option to do more complex selections
dependent upon the statistical values of points: You are also asked how many
of the top scoring variables you wish to select.

26
Correspondence & Cluster Analysis

Clean-up map and auto-clean-up map can also be accessed through the
select menu. Clean-up map will prompt you to “Select top Chi Distance
values for rows”. Enter the number of rows required for map. It is possible to
set this number as the default using the tick box in this dialogue box. The map
will now show just the top number of rows selected. Alternatively auto clean-
up will automatically tidy up the map taking the default number of rows set in
the clean-up map option.

27
Correspondence & Cluster Analysis

CLUSTER
ANALYSIS

28
Correspondence & Cluster Analysis

What is Cluster Analysis?

Cluster Analysis is a powerful segmentation tool allowing users to segment a given


population into discrete groups of similar individuals.

Cluster Analysis can be applied to any set of comparable variables and is commonly
used to segment people based on their responses to a series of attitudinal
statements.

Cluster Analysis can be used for example to create attitudinal groups of respondents
where-by the respondents within each group have responded similarly to a battery of
attitudinal statements. These groups can then prove to be very powerful
discriminators within a given market.

This Cluster Analysis program provides an easy means of selecting the target
population and input variables. The analysis can be run to any given level and the
results can be viewed interactively on-screen. The program provides links with the
Choices3 analysis package. This allows the definition of the target market from
within Choices and the export of selected solutions back into Choices for further
analysis.

29
Correspondence & Cluster Analysis

Cluster Analysis Step by Step


Follows on from the Correspondence Analysis Step by Step on Page 5
Preparing to run a Cluster Analysis

• In Choices3, using the original input file, from the toolbar select "Tools" and "Save
Cluster Filter File". This uses your filter as part of the cluster program and forms the
universe to be segmented.
• You should ensure that the sample size for your filter is greater than 2000.
• You will be asked if you wish to run the cluster analysis. Select "Yes".

Running the Cluster Analysis

• Select "Start a new cluster project".


• Select the database to be used (i.e. the survey you used for the Correspondence
Map).
• Choose a filename (max 6 letters) and a title for your work.
• Select ‘Change Filter’ and then choose the base/filter you were using in Choices3.
• Select lifestyle statements by clicking on the ones you wish to use (as listed on your
correspondence printout)
• Now select the icon "Run Cluster”, selecting a solution number (i.e. the maximum
number of cluster groups you think you might want e.g. 9)
• When the analysis has completed, select the cluster report and then go to the section
below on interpreting the cluster analysis…

Interpreting Cluster Analysis

The interpretation below consists of three stages; the first two establish if there is a minimum and a
maximum number of cluster groups that you should use, based upon some basic statistics. The last
stage is more creative and involves the user selecting the best solution (e.g. Solution 6 – which will
have 6 groups in it) for describing your market:

(i) Go to ‘Cluster Report’ and establish if there is a minimum number of groups that you can use –
when using TGI data a Variance Explained of >12 should be used.

(ii) (Also in Cluster Report) check the maximum number of groups you can use by ensuring the
smallest group figure is >200.

(iii) You will now need to decide which Cluster Solution is most appropriate:
To do this, start by looking at all the groups in Cluster Solution 3 and summarise the
characteristics of each group within it in terms of their overall attitudes (give each an
appropriate name to summarise). Next, repeat the process with the next higher Solutions e.g. 4
and then 5 and so on - You should find a point where using further cluster solutions adds no
information or indeed loses some group definition. At this stage you have found the optimum
cluster solution for dividing the market.

Opening the cluster back in Choices3

• In Choices go to the top toolbar and select "Tools" and "Import Cluster Solution" -
your cluster solution will appear at the bottom of your dictionary.
• These can either be used to run further crosstabs or put into the original crosstab
under columns and then run as a correspondence analysis. The solutions should be
made passive so as not to affect the map but to show where they appear in relation to
your market and lifestyle statements

30
Correspondence & Cluster Analysis

How to set up a Cluster Analysis


This description has more detail than the step by step guide but is the same
process

Use Correspondence to find Discriminating Lifestyle Statements

A correspondence map should be carried out first in order to get the most
discriminating lifestyle statements. After identifying the top 15-20 statements in
order of Chi Distance, print out the list of statements.

We encourage clients to do a correspondence map first – because it shows


clearly the statements that discriminate most strongly in that market. Usually you
make all except the top n statements (on distance) passive and hidden to make
the map clearer. The remaining active n statements will be good ones for cluster.
But you could leave the top 40 statements on the map, and only take the top 20
(which is the number we suggest) for cluster. Leaving out good statements will
weaken the power of the cluster analysis. But cluster will allow you to pick any
statements you want – you don’t have to do a correspondence analysis first, but
in nearly all situations it is best to do the map first.

Save a filter in the Choices3 coding window

To run a cluster analysis from Choices3 you must first create a filter, or base of
respondents which the cluster analysis will use. Typically this filter is the target
market (e.g. Bottled Lager users, Heavy Shampoo users, Everyone who has
bought shoes etc). The filter should contain at least 2000 respondents.
Defining clusters from a filter less than this may result in an unreliable size for
the smaller cluster groups in your favoured solution.

♦ Within Choices3 add your target market to the filter. This would usually be
the same filter that you used for the Correspondence Analysis. You may
find it useful to look up the sample size before running the program.

♦ Select ‘Tools/Save Cluster Filter File’.

♦ Answer yes when prompted to save the filter file

♦ Answer yes to when asked if you want to run Cluster Analysis.

You do not have to run Cluster Analysis at this point. You may start the
program later to run the analysis with the filter you have saved.

31
Correspondence & Cluster Analysis

Start a new project

You are faced with the dialogue box below whether you have launched the
Cluster software through Choices3 or from the Shortcut.

♦ Either select 'Start a new cluster project' and click OK. If the program is
already running select 'File/New Project' from the main menu.

♦ From the New Cluster Project dialogue select a Database to use for the
project. The cluster database must correspond to the survey you are
using in Choices3

♦ Enter a title for the analysis, and a project name. You may find it helpful to
use the same naming convention that you used for the Correspondence
Analysis. Click OK.

32
Correspondence & Cluster Analysis

Selecting Filter and Variables

The Cluster Definition window defines the filter and variables to use in the
analysis.

Choose the Filter

♦ To select a filter, click the 'Change Filter' button. The currently selected
filter is shown next to the Change Filter Button (including the sample size).
♦ Choose a filter and click the OK button.

Choosing the variables

Add statements to analysis:

Remove statements from Analysis:

33
Correspondence & Cluster Analysis

♦ Select the statements to use. Using the print out of the most discriminating
lifestyle statements, select them from the database listed on the left-hand
side. The variables you have selected are displayed in the right hand list.
To select a variable, highlight it and either double-click or click the
appropriate button.

♦ To remove a variable, highlight it in the Selected list and click on the


appropriate button.

♦ The order of the variables is fixed in database order. Variables will be


removed from one list and added to the other and therefore will never
appear more than once.

To highlight more than one variable at a time, click and drag with the mouse to
highlight a range or hold down the Control (CTRL) key and click with the left-
hand mouse button to highlight non-adjacent variables.
It is recommended that you choose a maximum of 25 statements; usually 15-
20 are selected.

Run the Analysis

To run the analysis, click the run button on the speed bar. This button is only
activated if the currently active window is the Cluster Definition Window.

Run analysis:

Enter the maximum number of cluster groups to create, and click the OK
button to start the analysis process. (If you selected '6' groups, Choices would
create not only the 6-cluster solution, but also 5, 4, 3 and 2-cluster solutions.

34
Correspondence & Cluster Analysis

Analyses on a sample of up to 10,000 typically do not take longer than 10


minutes using a contemporary PC. However It is not possible to predict how
long an analysis will take as it depends on several factors:

i) The size of the filter (base)


ii) The number of cluster solutions chosen
iii) The speed of your computer
iv) The actual market you are looking at

Where possible, the program will give an indication of progress for each part of
the process. An analysis can be cancelled at the end of each data pass. To
cancel a running analysis click the Cancel button and wait for the system to
respond at the end of the data pass.

35
Correspondence & Cluster Analysis

Interpreting the Results

Once the Cluster Analysis has finished running there are various statistics that
you can print out or view on screen:

♦ Summary statistics window


♦ Cluster report window
♦ Cluster solution window
♦ Cluster groups window

All reports are accessible from the View menu. It is best to work through the
results in the following order:

1) View summary statistics


2) Look at an overview of all the cluster solutions produced
3) Examine an overview of a particular cluster solution
4) Examine a particular cluster solution one group at a time

Summary Statistics

Select 'View / Summary Statistics' from the menu.

This window shows overall mean (average) and standard deviation (measures
degree of spread in answers) for total sample (target population). This
information can be useful for identifying statements with highly-skewed
distributions. All variables are normalised to a mean of 0 and a standard
deviation of 1 before the cluster analysis runs - this ensures that each variable
is given equal importance in the analysis.

Cluster Report Window


This window shows Variance Explained % - the proportion of the total variation
explained by that cluster solution. Cluster analysis tries to find groups with low
variation within groups, and big differences between groups. The figure shown
is the percentage of variation that is between groups rather than within them.
So the higher this figure is, the better. The sizes of the smallest and largest
groups are also shown for each solution. The size of all groups is shown in the
Cluster Solution Window.

36
Correspondence & Cluster Analysis

Cluster Solution Window

NB: Both the Cluster Solution Window and the Cluster Group Window will
appear if you click on “View” “Cluster Solution”.

This window shows information for all cluster groups within a given solution.
The report provides a way of examining an overview of a given solution - the
same figure is displayed for each group alongside each other. The main
objective of this report and the Cluster Group Report is to examine the detailed
biases within each cluster group, and build up a summary description or
'picture' for each group.

Cluster Groups Window

The cluster groups have been formed by grouping together individuals with
similar responses to the variables. Not everyone in a given group has
responded in exactly the same way, but there will be overall biases displayed
by one group in contrast with another. You should interpret these biases and
try to understand why these individuals have been grouped in this way.

The report displays the following figures:

i) Standard Deviations from the mean (Mean for Group - Mean for Sample)
Standard Deviation for Sample

ii) Absolute Deviations from the mean Mean for Group - Mean for Sample

iii) Absolute Mean Mean for this Group on this variable

iv) Variance breakdown A breakdown of the remaining


variance by group and by variable and
a breakdown of the variance
explained by variable.

In all cases large (positive) numbers show high agreement, low (negative)
numbers mean high disagreement.

It is recommended that you use Standard Deviation from the mean to interpret
the cluster groups. The numbers given by this statistic are the biases for this
cluster group (compared with the overall sample) which are standardised into
units of standard deviations. It is important to use this statistic since the
analysis has used standardised data when forming the groups and is
comparable between variables (the same deviation is just as meaningful on
one variable as it is on another).

This window shows overall mean (average) and standard deviation (measures
degree of spread in answers) for total sample (target population). This
information can be useful for identifying statements with highly-skewed

37
Correspondence & Cluster Analysis

distributions. All variables are normalised to a mean of 0 and a standard


deviation of 1 before the cluster analysis runs - this ensures that each variable
is given equal importance in the analysis.

The average score will be 3 based on "Definitely Disagree" scoring 1, through


to "Definitely Agree" scoring 5. This gives you an indication of which
statements have a more positive (or negative) response.

The Absolute Deviation shows the average variation from the overall mean in
respondents’ answers.

Colour Coding
The cells in the report are colour-coded to help interpretation. Red numbers
represent a positive deviation whereas blue numbers represent a negative
deviation. In both cases a light colour represents a large deviation, whereas a
darker colour represents a smaller, but still important, deviation.
Light Red Positive deviation greater than 1 standard deviation from the
mean

Dark Red Positive deviation between 0.6 and 1 standard deviations


from the mean

Dark Blue Negative deviation between 0.6 and 1 standard deviations


from the mean

Light Blue Negative deviation greater than 1 standard deviation from


the mean

38
Correspondence & Cluster Analysis

How many Cluster Groups should I choose?

Choosing the most appropriate cluster solution to use is an art, and the
decision should be based on a number of things.

♦ Look at each of the cluster groups in detail, taking them back into Choices to
work out the demographic characteristics of each group. Now see which
groups make sense and which are interesting. Compare this with other cluster
solutions to see which interesting groups you have gained or lost. Does the
analysis meet the original aims, or should you consider using a different target
or a different set of variables?

♦ In going from say 5 groups to 6 groups, what tends to happen is that you
maintain the 5 groups and gain a new group, i.e. the new solution will not
change dramatically. You might also lose one group and gain 2 new ones. Try
to identify which groups are new and which are lost. Ask yourself if the new
groups are useful, or whether you have lost a very interesting group.

♦ The sample size of the smallest group is important since this will restrict the
detail of further analysis that can be sensibly performed on that group in
Choices. If you were interested in examining brands with small penetrations,
you should use fewer groups but with larger sample sizes in each group.

Select 'View/Cluster report' to see information generated about the analysis,


i.e.

* The variance explained as a percentage


* The sample size of the smallest group in the solution
* The sample size of the largest group in the solution

The number of groups from 2 up to the The % variance in the total sample explained by
maximum specified for the run splitting the sample into the given number of
groups

The % of variance explained indicates how much of the original variance in the
data is explained by splitting the respondents into 2,3,4,5 groups etc. The level
of variance achieved will vary depending on the nature of the data being
analysed. For example, a larger sample size and a larger number of variables
will tend to give smaller levels of explanation. Guidelines to work with are that
15% is the minimum acceptable and it is rare to ever get above 30%. Size of

39
Correspondence & Cluster Analysis

smallest and largest groups are shown as a summary. You don’t want a very
small group, or a very large one.

Other considerations when selecting a cluster solution to look at further include


practical reasons, such as what the cluster groups will be used for. For
example, if the analysis is to be presented to a large audience, how many
groups can the audience cope with? 12 is probably too many.

The more cluster groups you have, the more variance is explained (which is
good). However, the variance explained by additional cluster levels will rise by
a diminishing amount each time. Large numbers of cluster groups can become
unmanageable, and may yield low sample sizes.

40
Correspondence & Cluster Analysis

Taking your Clusters back into Choices3

Once an analysis has been run the next step is usually to take the resultant
groups back into Choices for further interpretation. They can be
crosstabulated against anything else in the survey. In order to do this you need
to tell the system which groups to make available for analysis in Choices. This
is not done automatically since generally only one solution is needed, and
most analyses will generate several possible solutions.

Choices3

From the Choices 3 coding window select Tools/Import Cluster solution.

Choose the cluster solution(s) you wish to import into the Choices3 dictionary.

NB The files for the cluster solutions are saved with the data, so once imported
everyone who accesses Choices3 on you system is able to view the solutions
you import.

Once imported, the groups in the solution may be selected like any other
variable in the survey. (See previous screen)

41
Correspondence & Cluster Analysis

Clusters can now be crosstabulated against anything else in the survey. They
can also be set up as targets in media analysis.

What if my cluster isn't listed?


Ensure that you are in the same survey that the cluster was generated in.
Choices displays the name of the survey at the top of the screen.

How do I change the headings of my clusters?


You may wish to give each cluster a name that embodies the characteristics of
its members. Once you have added the groups to your spec you may rename
the cluster groups using the edit table icon on the Choices toolbar.

For further analysis you may find it helpful to save a definition file with the
group names edited.

42
Correspondence & Cluster Analysis

Overlaying your cluster solution onto the original map

It is helpful to see where the groups lie in relationship to each other on the
original Correspondence Map. This may also help in deciding which of the
cluster solutions you are going to use for targeting purposes. You may find
that one group is a more refined version of another group and hence will
respond to the same targeting strategy

The cluster groups generated from the example of the Shoe Market have been
overlaid onto the original correspondence map.

This is done by re-opening the original crosstab with brands vs attitudinal


statements, and adding the cluster groups as extra columns.

Run the Correspondence Map as before.

It is very important at this stage to make your cluster groups passive. This
is done before going through the process of ‘tidying’ up the Correspondence
Map as before.

43
Correspondence & Cluster Analysis

An example TGI Cluster Analysis: The Shoe Market

1) Crosstab

Set a filter of “Bought shoes in last 12 months”. This can be done by coding all
the shoe retailers together.

With the brands of shoe shop in the answer panel, right click and select sample
and weighted from the context menu. Then click on the word sample to sort by
sample size to identify any brands with a low sample size. These should not be
used in the analysis, and can either be deleted or combined with other brands if
appropriate.

Add the shoe shop brands to the columns and add ‘Any Agree’ Lifestyle
statements to the rows. There is a definition file set up which you can use or
alternatively you can select the lifestyle statements from the dictionary yourself.

Save the crosstab as a spec file. You may find it beneficial to use the same file
naming convention for the spec file, correspondence map and cluster project. Run
the crosstab.

Analyse the correspondence map to see whether it gives a good representation of


the market. When you are satisfied with the map, print it off using "File/Print".

Print the lifestyle statements needed for your cluster project from the row statistics
view.

2) Selection of Lifestyle Statements

Within Choices, set up a filter of "Bought shoes in last 12 months". Select


'Tools/Save Cluster Filter file' (ctrl + D) to set up as a cluster filter.

After saving the filter in the Coding Window you may proceed to the Cluster
Analysis module. Alternatively the Cluster Analysis module may be opened by
means of a separate shortcut under Start/Programs/ .

Choose the correct survey database. This is the same as the survey you are
using in the coding window. Give the project a title i.e. ‘Shoe Market Place’ and a
Name i.e. ‘Shoes’.

From the list of lifestyle statements on the left-hand side of the screen, select from
your printout the top 15-20 statements that you have previously identified from the
row statistics.

Now check that all the statements chosen (i.e. in the right-hand column) are
relevant to the market chosen. If you think they are not relevant, or are too similar
to another statement, click on the left-arrow button to remove them from the list.
You may decide to include statements that were not so important in terms of Chi
Distance on the Correspondence Map.

44
Correspondence & Cluster Analysis

3) Run Cluster Analysis

Click on the button, and select how many cluster solutions to generate.

Since the software will generate every cluster solution up to and including the one
you select, it is better to choose too many rather than too few. Typically, any
number between 4 and 8 might be chosen but this will vary on the market you are
looking at.

4) Interpreting the Results

Once the cluster analysis has run, there are some statistics you should look at
within the cluster software which will help you to understand the cluster groups.
These are explained within the 'Interpreting the results' section on p36. However,
the "View Cluster Solution" option will be of particular interest.

By looking at the top 5 or so statements in each group of the solution, you can
begin to get a feel for how the cluster groups vary attitudinally from each other.
Those marked with a negative sign mean the respondents don't agree with the
statement.

You may find it helpful to print the list of lifestyle statements for each of the cluster
groups for each solution generated. The lifestyle statements are ranked in
descending order for every group. It can be helpful to give the cluster groups
names. We are looking to identify the cluster groups as people we may see in the
street.

Within the Cluster software examine each of the groups in each cluster solution.

CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4


Friends more Spend a lot on If looking for Prefer herbal
important than clothes bargains, look in medicine products
family local paper first
If looking for Like to keep up Like to take Prefer alternative
bargains, look in with latest fashion holidays in Britain medicine (e.g.
local paper first rather than abroad acupuncture)
Interested in Wear designer When household Read financial
financial services clothes shopping budget pages of
advertising for every penny newspaper
Like to take Like to stand out in Skincare products Have classic dress
holidays in Britain a crowd are for women not style
rather than abroad men
Prefer herbal Can’t resist Only beer worth Only beer worth
medicine products expensive drinking is real ale drinking is real ale
perfume/aftershave

5) Import the Clusters back in to Choices3

One of the most powerful ways of interpreting the cluster groups is to import them
back into Choices for further analysis.
First, import your chosen cluster group(s) into Choices by selecting Tools/Import
cluster solution on the Choices3 Coding Window menu bar.

45
Correspondence & Cluster Analysis

What you choose to look at will vary by the market you are looking at, but typically
you might want to know the following:

i) Demographic Profile
CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4
Social Grade DE, Age 15-34, Social Social Grade DE, Social Grade AB,
Age 65+ or <24, Grade ABC1, Age 55+, Not Age 35-64, Work
Not working Working F/T working P/Time

ii) Press Profile


CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4
Weekly News More! Heritage Today Times Educational
Supplement
Angler’s Mail Mizz That’s Life The Independent
Match Now My Weekly Birds Magazine
Angling Times Hair TV Choice The Guardian
Woman’s Own Smash Hits Take A Break Marks & Spencer
Magazine
Inside Soap Kerrang Chat AA Magazine

iii) Brand consumption


CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4
Littlewoods Ravel Trueform Marks & Spencer
Trueform Dolcis Shoe Express House of Fraser
Freeman Hardy Next Timpson/Oliver Clarks
Willis
K Shoes Russell & Bromley Freeman Hardy John Lewis
Willis
Stead & Simpson Bally Clarks BHS
Shoe Express House of Fraser Stead & Simpson K Shoes

iv) TV Programmes
CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4
Family Affairs Streetmate Family Guy Gardener’s World
Doctors The Priory Stargate Horizon
Wheel of Fortune CD UK Driven Timewatch
That’s Esther She’s Gotta Have It Top Gear Dispatches
City Hospital Dawson’s Creek Robot Wars Panorama
Trisha Hollyoaks F1 Grand Prix Watchdog

46
The Statistics Explained

Cluster Analysis
If the population distribution in a sample is not homogenous, respondents often
clump together in clusters. Gaps may also indicate that there is a mixture of
several displaced distributions.

Since clusters are highly dependent on the sampling variation, small


perturbations in the data might lead to very different clusters. The choice of the
number of clusters cannot follow from the algorithm, but has to be made
subjectively. For these reasons, cluster analysis is not a rigorous and sharp
statistical tool, and should be applied after careful consideration and scrutiny of
all the information available.

Cluster Methodology
The clustering process begins with comparing the distance of each
observation from the mean vectors ('Centroids') of each of the proposed
clusters in the sample of n observations. The observation is assigned to the
cluster with the nearest mean vector. The distances are recomputed and
reassignments are made as necessary. The process continues until all
observations are in clusters with minimum distances to their mean vectors.

K-Means Algorithm
The cluster analysis program uses a K-Means partitioning algorithm. A partitioning
algorithm moves from a smaller number of groups to a larger number of groups, as
opposed to a joining algorithm, which does the reverse.

The program works up from an initial starting point of one group (the target
population), then builds a 2-cluster solution, a 3-cluster solution and so on up to a
maximum number of clusters specified by the user. This is known as a 'Multi-K-Means'
analysis, where 'K' refers to the number of clusters chosen by the user. The starting
partition at each instance is derived by splitting an existing group into 2. The program
selects and splits the group with the greatest variance when moving from level to level.

Squared distances are taken for every individual. The individual with the highest score
is taken, and then the person who is the most dissimilar.

i) Standardise Variables
Variables are standardised (normalised) to a mean of 0 and a variance of 1, which
ensures that each variable is given an equal weighting in the analysis.

ii) Split target population into two groups


Two seed points are selected. Point A is chosen as the point furthest away from the
centroid of the group to be split, then Point B is chosen as the point furthest away from
point A. The remaining points are then split between these two seeds.

iii) Refine cluster solution and perform K-Means


For each respondent, and for each cluster other than the respondent's current cluster,
the program calculates the increase in error due to the transfer. If the minimum
increase in error is negative, the respondent is transferred to the minimal cluster. The
cluster centres of all losing and gaining clusters are now readjusted, as any increases
in error are recorded. Data passes are then repeated until no further data cases can
be moved.

iv) Number of groups is smaller than number required


Correspondence & Cluster Analysis

If the number of groups is still smaller than the number required, the group with the
largest variance is split, and phases (ii) and (iii) are repeated.

Mean Absolute Deviation


Each observation is calculated in terms of its deviation from the mean. The
resultant deviations are then summed, and the mean of these is calculated.

Variance
The variance is the average of the sum of squared deviations.
The smaller the variance in the population, the more accurate will be a sample
taken from that population.

Standard Deviation
Once squared, the square root of the variance is taken as a measure of
dispersion.

Further reading:
Hartigan, J.A (1975) Clustering Algorithms (Wiley, New York)
Everitt, B (1974) Cluster Analysis (Heinemann, London)
MacQueen, J (1967) "Some methods for classification and analysis of
multivariate observations", Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability, University of California Press,
Berkeley

48
Correspondence & Cluster Analysis

FILE
New Project Creates a new cluster project and makes it the active window. You will be
prompted to save the document when you close the application.

Open Project Displays the Open File dialog box, so you can select a file to load into a new
document window. You can also create a new document by naming a file that
does not currently exist.

Close File Closes the currently active window.

Close Project Takes you back to the initial screen where you can start a new cluster project or
work on an existing cluster project.

Save Project Saves the document in the active window. If the document is unnamed, the
Save As dialog box is displayed so you can name the file, and choose where it
is to be saved.

Save As Allows you to save a document under a new name, or in a new location on disk.
The command displays the Save File As dialog box. You can enter the new file
name, including the drive and the directory. All windows containing this file are
updated with the new name. If you choose an existing file name, you are asked
if you want to overwrite the existing file.

Print This prints the contents of the active window. Use File/Print Preview to see how
the document will be laid out on printer pages. Use File/Print Setup to select a
printer, and to set printer options.

Print Preview This opens a special window that shows how the active document will appear
when printed. The preview window shows one or two pages of the active
document as they would be laid out on printed pages. Controls on the window
allow you to page through the pages of the document.

Print Setup This displays the Printer Setup dialog box, which allows you to select and
configure the printer to be used.

Import filter Here you can upload a filter that was saved in Choices 1.

Exit This takes you out of the Cluster program. Make sure that you have saved your
file first.

EDIT
Copy Enables user to copy any highlighted data and paste into documents such as
Word, Excel etc

VIEW
Project Displays the active project window.

Summary statistics For each variable (statement) shows mean and standard deviation.

Main Report For each solution, shows the amount of variance and the size of the smallest
and largest groups in each solution. Allows you to view any individual cluster
solution.

Cluster Information Displays general information on the cluster analysis.

Cluster Report For each solution, shows the amount of variance and the size of the smallest
and largest groups in each solution.

Cluster Solution Displays, for a given solution, each group; shows the standard deviation from
the mean, absolute deviation from the mean and absolute mean for each
statement.

Cluster Group Enables user to view the different groups within a cluster solution.

Cluster Log Provides a record of how each cluster group was arrived at, with the number of
passes and points moved.

49
Correspondence & Cluster Analysis

ANALYSIS

Run Cluster Displays the cluster runtime options.

Create membership data Choices 1 - creates a file that you can load into Choices 1.

Create membership data Choices 2 - creates a file that you can load into Choices 2.

Add Variable Enables user to add an additional variable to analysis.

Remove Variable Enables user to remove an unwanted variable from analysis.

OPTIONS
System Options This shows the directories used by Choices, and may be useful if you want to
know where certain files are stored.

Project options Lets you change the title of the project and the number of clusters.

Save Project Lets you save the above options.

WINDOW
Cascade Displays all the available windows in overlapped form, so that the title bar of each
is visible.

Tile Displays all windows on the same screen in a non-overlapping arrangement.

Arrange Icons Arranges all iconized windows into rows along the bottom of the applications main
window.

Close All Lets you close all windows, report windows, cluster group windows or cluster
solution windows.

HELP
Help Provides help on running a cluster analysis.

50

Potrebbero piacerti anche