Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Mathematical Department, Azores University, R. da Mae de Deus, 9501-801 Ponta Delgada, Portugal
Department of Quantitative Methods, Business School ISCTE, Av. das Forc- as Armadas, 1649-026 Lisboa, Portugal
Abstract
This work is part of a supermarket chain expansion study and is intended to cluster the existent outlets in order to support the
evaluation of outlet performance and new outlet site location. To overcome the curse of dimensionality (a large number of attributes
for a very small number of existing outlets) experts knowledge is considered in the clustering process. Three alternative approaches
are compared for this end, the experts being required to: (1) a priori: provide values for perceived dissimilarities between pairs of
outlets; (2) a posteriori: evaluate results from alternative regression trees; (3) interactively: help to select base variables and evaluate
results from alternative dendrograms. The later approach provided the best results according to the marketing experts.
r 2004 Elsevier Ltd. All rights reserved.
Keywords: Clustering; External validation; Experts knowledge integration
1. Introduction
As in Europe, the retail sector in Portugal is going
through a restructuring phase. Several authors (e.g.
Birkin et al., 2002; Dawson, 2000; Seth and Randall,
1999) identify such factors as increasing consumer
mobility, increasing electronic commerce, changing
household size, concentration of market power, home
market saturation, and changes in planning legislation
to justify the new trends in retailing. In the food retail, in
particular, after an unprecedented period of hypermarkets growth, since the late 1970s, both in number and
Corresponding author.
0969-6989/$ - see front matter r 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jretconser.2004.11.005
ARTICLE IN PRESS
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
232
scale, which forces careful decision-making (McGoldrick, 2000; Salvaneschi, 1996). Because smaller outlets
are heterogeneous in aspects as location, dimension, and
client behaviour, the denition of outlet clusters is
essential in outlet performance and site evaluation.
In Sections 24 the next sections, an empirical
classication of variables used in measuring supermarkets performance and the role of expert knowledge
in clustering and marketing applications is presented.
The data collection phase is described and three
approaches for experts knowledge integration in supermarket cluster are explained. In Section 5 these
approaches are compared and a cluster proling is
presented. This paper nishes with conclusions and a
methodological discussion.
Traditional
Supermarkets
(<'s)
60%
Supermarkets
(>'s)
50%
40%
30%
20%
10%
0%
1991
Hypermarkets
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
Fig. 1. Market share evolution for food outlet type in the Portuguese
market. (Source: A.C. Nielsen Portugal).
Group
Variable Type
outlet size
Location and
Outlet Attributes
Examples
Data Collected
sales area
retail composition
outlet configuration
geographic variables
site configuration
Influence Area
Characterization
Clients
Characteristics
income distribution
characterisation of the
outlet \ client relation
average buy
socioeconomic client
characterisation
demographic data
client preferences
client demography
market share
70%
census data
80%
90%
Other free
service
in shop surveys
100%
buying power
Fig. 2. Classication of assessment location and outlet evaluation explanatory variables and data collected.
ARTICLE IN PRESS
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
233
ARTICLE IN PRESS
234
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
3. Data collection
ARTICLE IN PRESS
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
235
Fig. 3. Shortest path polygons (left) and multiplicative weighted Voronoi diagrams (right) examples. (Point radius proportional to the outlet sales
area and demographic polygons shaded by population density).
4. Methodological approach
In spite of the data abundance following from last
section, the number of outlet supermarkets was very
small. This fact hindered the process of variable choice
for the outlet clustering and respective characterization.
To overcome this difculty, three different procedures
were considered for experts knowledge integration:
1. In a priori integration approach, the experts were
required to compare pairs of outlets and evaluate
their dissimilarities in a perceptual scale. The dissimilarities matrix was then directly used in the Wards
hierarchical clustering procedure. Finally, the selection of clusters proling variables relied on regressions over MDS dimensions.
2. In a posteriori approach, the integration was
accomplished by evaluating alternative clustering
structures derived from a supervized learning procedure using regression trees. In this approach the base
variables choice relied on the regression tree procedure although experts required diverse regression tree
parameterizations and target variables.
3. In the last procedure an interactive process was
adopted: the experts knowledge was considered in
successive stages regarding the choice of clustering
variables and the evaluation of clustering results. This
process is termed interactive integration. Wards
Hierarchical procedure was based in alternative sets
of base variables in order to provide better
clustering, as judged by experts.
4.1. A priori experts knowledge integration
In this approach the integration of the experts
opinion was made by means of outlet paired comparisons. The experts were requested to ll a questionnaire
where pairs of outlets were compared and evaluated
according to a scale of ordinal dissimilarities (from 1
very similar outlets to 9 distinct outlets).
The comparison was meant to be generic, although
some aspects as location, management performance and
site as well as clients characterization were emphasized.
The resulting symmetrical dissimilarity matrix was
obtained by consensus among the several experts
involved. This procedure is termed a priori as the
experts opinion only regards the dissimilarities matrix
used as clustering base. Clusters where then obtained
using the hierarchical Wards method resulting in the
dendrogram presented in Fig. 4. It should be noted that
other hierarchical methods as centroid and group
average linkage were tried and similar dendrograms
were obtained.
From the observation of the fusion index values and
fusion index variations versus number of clusters chart,
presented in the same gure, six clusters were adopted.
ARTICLE IN PRESS
236
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
Fig. 4. Dendrogram of the Ward hierarchical method directly applied to the dissimilarity matrix and fusion indexes chart.
ARTICLE IN PRESS
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
237
MDS dimension 1
0.8
number of
sales
operators
working
places
in shop
parking
places
mystery
shopping
evaluation of
car parking
facilities near
the outlet
0.6
0.4
0.2
MDS dimension 4
MDS dimension 2
sales
turnover for
square meter
of sales area
mystery
shopping
evaluation of
walking trip
outlet
visibility
MDS dimension 3
0.8
0.6
0.4
0.2
MDS dimension 4
number of
resident
women with
more than
65 years old
MDS dimension 3
percentage
of resident
female
children less
than 5 years
old
percentage
of residents
working in
primary or
secondary
sector
MDS dimension 2
Influence
area by
Voronoi
percentage diagrams
of
households
with children
or
grandchildre
n less than 7
years old
clients attributes
percentage
of
respondents
living in a 5
minutes
walking
distance
MDS dimension 1
0.8
0.6
percentage
of passage
respondents
0.4
0.2
MDS dimension 4
percentage
of
respondents
declaring
occasional
purchases
MDS dimension 2
preferential
respondents
MDS dimension 3
Fig. 6. MDS dimensions characterization based on standardized absolute value regression coefcients (black marks represent attributes negatively
related with MDS dimension).
ARTICLE IN PRESS
238
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
ARTICLE IN PRESS
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
239
>
big outlets
<
low potential
>
high potential
<
neighbourhood outlets
transit outlets
Fig. 7. Selected tree obtained for the CART method. (The bar graphs represent the histograms of the target variable in each node. Leave-one-out
estimate of the percentage of explained variance is 86%).
ARTICLE IN PRESS
240
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
big neighbourhood
intermediate outlets
transit outlets
small neighbourhood
ARTICLE IN PRESS
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
241
Table 1
Summary of the main characteristics for the 3 different methodologies
Knowledge integration approach
A prioria
A posteriorib
Interactivec
Methodology
Target variable
None
None
None
Clients characteristics
Preferential respondents
Occasional purchases
Passage respondents
Clusters labels
Big outlets
Neighbourhood outlets
Low-potential low potential
outlets
High-potential outlets
Small high potential
Small outlets
Big outlets
Low potential
High potential
Big outlets
Intermediate outlets
Big neighbourhood
Neighbourhood outlets
Transit outlets
Small neighbourhood
Transit outlets
Transit outlets
Characterization variables selected by linear regression with the 4 extracted MDS dimensions.
Target and splitting variables.
c
Base clustering variables.
b
ARTICLE IN PRESS
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
annualoutletsales turnover
242
2000
2001
2002
N=
neighbourhood
small outlets
N=
big outlets
neighbourhood
transit
low potential
a posteriori approach
a priori approach
2000
2001
2002
N=
small neighbourhood
big outlets
transit outlets
interactive approach
Fig. 9. Box plot charts for annual sales individualizing clusters with three or more outlets. (Stars and circles are identied outliers, being circles 1.53
and stars more than 3 interquartile ranges. The same annual turnover scale is used in every chart).
attributes around mean values are usually not mentioned, but fundamental groups as performance indicators are always mentioned.
ARTICLE IN PRESS
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
243
Table 2
Proportion of explained variance
Number of outlets
2000
2001
2002
13
16
19
19
19
19
19
Number of clusters
A posteriori (%)
Interactive (%)
22
31
59
47
74
55
41
78
85
92a
36
60c
81c
34
83
87
86b
89b
63
65
68
Target variable.
Base cluster variable.
c
Splitting variable.
b
Table 3
Outlet cluster resume characterization
Big neighbourhood Small neighbourhood Intermediate outlets Big outlets Transit outlets Big transit outlets
Location and outlet attributesoutlet performance and area
2002 annual sales turnover
k
Sales/area ratio
m
3
Sales variation between 2002 and 2000
Sales area in square meters
k
k
Location and outlet attributesoutlet conguration
Cash machine in the outlet
m
Price image from mystery shop
Attendance sympathy
Outlet tide/cleanliness
Location and outlet attributesgeographic variables
No. of outlets in downtown
k
Is the outlet considered anchor?
Outlet walking visibility
3
Parking facilities near outlet
3
Inuence area characterizationcompetition
Also clients of other chains
Sum of competition sales area
Outlet price relative to competition
k
% of mensal expenses in outlet
3
3
m
k
k
k
3
3
m
k
m
m
3
3
3
3
3
3
3
3
m
m
n/d
n/d
3
3
3
3
3
3
n/d
n/d
k
m
m
3
3
m
m
m
k
m
m
m
k
3
3
k
k
m
m
3
3
m
3
m
m
m
k
k
k
k
3
3
k
3
3
3
3
k
m
k
Arrows represent the most distinct values for mean (vertical) and variance (horizontal) in each cluster, n/d is short form for not enough data.
ARTICLE IN PRESS
244
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
ARTICLE IN PRESS
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
the concept of supermarket performance several clustering base variables should be considered for selection, the
interactive approach being more appropriate for this
purpose.
From the a posteriori approach, experts where quite
enthusiastic about the use of regression trees but, they
did not pick its results to be the best. In fact, this is a
very instable approach when it refers to small data sets,
which call for extremely careful external validation (Bay
and Pazzani, 2000). However, considering that this
clustering process was widely accepted by users, it
should be further researched taking into account two
main guidelines:
245
Acknowledgements
The authors wish to express admiration and gratitude
for the location experts that enthusiastically adopted
this project and without whom this work would be
impossible. The authors also wish to thank three
anonymous referees for careful review and helpful
suggestions and insights.
References
Adelman, L., Riedel, S.L., 1997. Handbook for Evaluating Knowledge-Based Systems: Conceptual framework and compendium of
methods. Kluwer Academic Publishers, Dordrecht, Netherlands.
Bay, S.D., Pazzani, M.J., 2000. Discovering and describing category
differences: what makes a discovered difference insightful? In:
Gleitman, L.R., Joshi, A.K. (Eds.), Proceedings of the 22nd
Annual Meeting of the Cognitive Science Society. Institute for
Research in Cognitive Science, USA, pp. 603609.
Berry, M.J.A., Linoff, G., 1997. Data Mining Techniques: For
Marketing, Sales, and Customer Support. Wiley, USA.
Birkin, M., Clarke, G., Clarke, M., 2002. Retail Geography and
Intelligent Network Planning. Wiley, Chischester, UK.
ARTICLE IN PRESS
246
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
ARTICLE IN PRESS
A.B. Mendes, M.G.M.S. Cardoso / Journal of Retailing and Consumer Services 13 (2006) 231247
Turban, E., Aronson, J.E., Liang, T.-P., 2004. Decision Support
Systems and Intelligent Systems. Prentice-Hall, London, UK.
Wang, S., 2001. Cluster analysis using a validated self-organizing
method: cases of problem identication. International Journal of
Intelligent Systems in Accounting, Finance and Management 10,
127138.
247