Distance-Based Clustering and Multidimensional Scaling Techniques

6.
Distance-based Techniques
We conclude this lecture course with a brief review of further topics.
We shall examine cluster analysis and multidimensional scaling which belong to a class of MV
techniques which are based on the concept of distance between members of a sample.
6.1 Distance and Similarity
The minimal properties of a distance measure are captured in the following denition.
Denition
A n n distance matrix D satises
(i) d
rs
_ 0
(ii) d
rs
= d
sr
(iii) d
rr
= 0
Here d
rs
denotes distance between individual r and individual s (rows r; s) in the sample.
Some examples are
Euclidean distance between x and y is:
d(x; y) =
_
(x
1
y
1
)
2
+ + (x
p
y
p
)
2
:
the city-block distance between x and y is:
d(x; y) = w
1
[x
1
y
1
[ + + w
p
[x
p
y
p
[ ;
where w
k
s are weights (e.g. w
k
= 1=p).
Minkowski distance between x and y is:
d(x; y) =
_
w
1
[x
1
y
1
[
k
+ + w
p
[x
p
y
p
[
k
_
1=k
:
Mahalanobis distance between x and y is:
d(x; y) = (x y)
T
S
1
(x y):
Distance is often the converse (ip side) of a similarity in the data. For example we are more
familiar with the extent to which two faces resemble (are similar to) each other.
Denition
The matrix C is a similarity matrix if for all r; s
(i) c
rs
= c
sr
(ii) c
rs
_ c
rr
Example
We can show (Ex) that the following transformation produces a distance matrix D from a
similarity matrix C
d
1
2
rs
= (c
rr
2c
rs
+ c
ss
)
1
2
1
6.2 Cluster analysis overview
Clustering diers from classication methods discussed in the previous chapter. Classication
pertains to known groups identied in advance. Clustering of data is a way of partitioning your
data into self-dening populations or groups on the basis of similarities or distances.
e.g.
Segmenting a customer base to perform targetted marketing
cluster patients into subgroups that have similar response patterns to dene treatment cate-
gories
In cluster analysis, you dont know a priori who belongs in which group. You often dont even
know the number of groups.
However, quoting from SPSS guide at
http://www.norusis.com/about.php
"Identifying groups of individuals or objects that are similar to each other but dierent from
individuals in other groups can be intellectually satisfying, protable, or
sometimes both."
Examples
Aim to group television shows into homogeneous categories based on viewer characteristics.
This can be used for targetted advertising.
You want to cluster skulls excavated from archaeological digs into the civilizations from which
they originated. Various measurements of the skulls are available.
6.3 Hierarchical Clustering Methods
The hierarchical clustering method for grouping N objects can be decribed as follows:
(i) start with N cluster, each containing a single entity and an N N symmetric matrix of
distance D = d
ik
.
(ii) search the distance matrix for the nearest (most simialr) pair of clusters. Let the distance
between most similar clusters U and V be d
UV
.
(iii) Merge clusters U and V . Label the newly formed cluster (UV ). Update the entries in the
distance matrix by (a) deleting the rows and columns corresponding to clusters U and V , and
(b) adding a row and column giving the distances between cluster (UV ) and the remaining
clusters.
(iv) Repeat steps (i)-(iii) a total of N 1 times. Record the identity of clusters that are merged
and the levels at which the mergers take place.
The distance between cluster (UV ) and any other cluster W in step (iii) can be dened dier-
ently. One has the following options:
(a) Single linkage or nearest neighbour NN method
d
(UV )W
= min(d
UW
; d
V W
)
2
(b) Complete linkage or furthest neighbour FN method:
d
(UV )W
= max(d
UW
; d
V W
)
(c) Average linkage method:
d
(UV )W
=

i
k
d
ik
=(N
(UV )
N
W
)
where d
ik
is the distance between object i in the cluster (UV ) and object k in the cluster W,
and N
(UV )
and N
W
are the number of items in clusters (UV ) and W, respectively.
The NN method is relatively simple but is often criticised because it doesnt take account of
cluster structure and can result in a problem called "chaining" whereby clusters end up being long
and straggly.
The FN method tends to produce compact clusters of similar size but, as for the nearest neigh-
bour method, does not take account of cluster structure. It is also quite sensitive to outliers.
6.4 Nonhierarchical (kmeans) clustering
6.4.1 Optimization approach
Given a trial set of clusters
i
(i = 1; :::; k) we can assume a normal likelihood
l = 2 log L
=
k
j=1
n
j
_
log j
j
[ + tr
_
1
j
(S
j
+d
j
d
T
j
)
__
where d
j
= x
j

j
: Suppose covariance matrices are unequal, then ^
j
= x
j
and
^
j
= S
j
hence
l
=
k
j=1
n
j
log [S
j
[ + const
When
1
=
2
= ::: =
k
is assumed (c.f. MANOVA) then
S =
1
n
W
where W =

k
j=1
n
j
S
j
Hence the criterion becomes
minlog [W[
This criterion is invariant under linear transformations x
0
= Tx
W
0
=
k
j=1
n
j
S
0
j
=
k
j=1
n
j
TS
j
T
T
= TWT
T
[W
0
[ = [T[
2
[W[ [W[
So clustering criterion is invariant under linear transformations.
3
6.4.1 Iterative approach
Nonhierarchical clustering methods are designed to group items into a collection of k clusters. The
iterative k-means method assigns each item to the cluster having the nearest centroid. It can be
described as follows:
(i) Partition the items into k initial clusters.
(ii) Proceed through the list of items, assigning an item to the cluster whose centroid is nearest.
Recalculate the centroid for the cluster receiving the new item and for the cluster losing the
item.
(iii) Repeat step (ii) until no more reassignments take place.
Minimum MSE (mean squared error) algorithm
The MSE criterion to be minimized is
J =
k
i=1
J
i
where J
i
=

x2
i
[[x m
i
[[
2
(Euclidean metric)
and m =
1
n
1
x2
i
x
Algorithm starts with trial set of clusters and iteratively seeks to improves the MSE. Consider
two clusters
i
and
j
.
Consider the eect of taking some x
0
in
i
and transferring it to the cluster
j
:
m
0
i
=
n
i
m
i
x
0
n
i
1
= m
i

x
0
m
i
n
i
1
Similarly
m
0
j
= m
j
+
x
0
m
j
n
j
+ 1
x m
0
j
= x m
j

x
0
m
j
n
j
+ 1
x
0
m
0
j
=
n
j
n
j
+ 1
_
x
0
m
j
_
Hence
J
0
j
=

x2
j
_
x m
j

x
0
m
j
n
j
+ 1
_
T
_
x m
j

x
0
m
j
n
j
+ 1
_
+
_
n
j
n
j
+ 1
_
2
[[x
0
m
j
[[
2
= [[x m
j
[[
2
2
n
j
+ 1
_
x
0
m
j
_
T
x2
j
(x m
j
)
+
n
j
(n
j
+ 1)
2
[[x
0
m
j
[[
2
+
_
n
j
n
j
+ 1
_
2
[[x
0
m
j
[[
2
4
J
0
j
= J
j
+
n
j
n
j
+ 1
[[x
0
m
j
[[
2
Similarly there will be a decrease in the SS for
i
of
n
i
n
i
1
[[x
0
m
i
[[
2
Therefore it will be advantageous to transfer x
0
to
j
if
n
i
n
i
1
[[x
0
m
i
[[
2
>
n
j
n
j
+ 1
[[x
0
m
j
[[
2
6.5 Multidimensional scaling (non-examinable)
The analysis of distances underpins many techniques in statistics and OR.
Multidimensional scaling (MDS) is one such technique that is less well known, but has interesting
applications. It is closely related to principal components analysis (PCA), a common dimensionality
reduction technique in statistics.
We are all familiar with the table of distances between principal cities to be found in a road
atlas. MDS allows the reconstruction of a "map" from a distance matrix.
6.5.1 Algorithm for Metric Multidimensional Scaling
Denition
A symmetric matrix D(n n) with d
rr
= 0 and d
rs
_ 0; r ,= s is a distance matrix.
Given a (n n) distance matrix D
1. Construct the matrix A =
_
1
2
d
2
rs
_
:
2. Obtain the matrix B with elements b
rs
= a
rs
a
r:
a
:s
+ a
::
3. Find the two largest eigenvalues
1
_
2
of B with corresponding eigenvectors X =
_
x
(1)
; x
(2)
normalized by
x
T
(i)
x
(i)
=
i
; i = 1; 2
4. The coordinates of points P
r
are the rows of X
x
r
= (x
r1
; x
r2
)
T
; r = 1; :::; n
The solution obtained is known as the classical MDS solution.
The generalization to Euclidean congurations in k (_ p) dimensions is straightforward; just
use more eigenvectors.
6.5.2 Optimality properties
(see Mardia et al 14.4)
Given a distance matrix D; aim of MDS is to nd a conguration in a low-dimensional Euclidean
space R
k
whose interpoint distances
d
2
rs
= (x
r
x
s
)
T
(x
r
x
s
)
closely match D:
5
Denition
A distance matrix is Euclidean if there exists a conguration of points in some Euclidean space
whose interpoint distances are given by D: i.e. for some k; there exist n points x
1
; :::; x
n
R
k
such
that d
2
rs
= (x
r
x
s
)
T
(x
r
x
s
) :
Let H =
1
n
_
I
n
11
T
_
where 1 = (1; :::; 1)
T
R
n
be the centring matrix. Then with A dened
as in the algorithm, we have
B = HAH
B can be interpreted as the centred inner product matrix for the
conguration obtained.
Result 1 (Mardia Th. 14.2.1)
D is Euclidean if and only if B is positive semi-denite (p:s:d:)
Result 2 (Mardia Th. 14.4.2)
If D is a distance matrix (not necessarily Euclidean) then
= tr
_
B
^
B
_
2
is minimized over all congurations
^
X in R
k
when X is the classical MDS solution. In the above
expression,
^
B is the centred inner product matrix corresponding to
^
X:
6.5.3 Example (from Mardia et al. "Multivariate Analysis " 1979).
The 7 7 distance matrix D gives rise to A as follows:
D =
_
_
_
_
_
_
_
_
_
_
0 1
_
3 2
_
3 1 1
0 1
_
3 2
_
3 1
0 1
_
3 2 1
0 1
_
3 1
0 1 1
0 1
0
_
_
_
_
_
_
_
_
_
_
A =
1
2
_
_
_
_
_
_
_
_
_
_
0 1 3 4 3 1 1
1 0 1 3 4 3 1
3 1 0 1 3 4 1
4 3 1 0 1 3 1
3 4 3 1 0 1 1
1 3 4 3 1 0 1
1 1 1 1 1 1 0
_
_
_
_
_
_
_
_
_
_
Then
B =
1
2
_
_
_
_
_
_
_
_
_
_
2 1 1 2 1 1 0
1 2 1 1 2 1 0
1 1 2 1 1 2 0
2 1 1 2 1 1 0
1 2 1 1 2 1 0
1 1 2 1 1 2 0
0 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
6
is a rank 2 matrix. We see that b
(1)
and b
(2)
span the columns of B
b
(3)
= b
(2)
b
(1)
b
(4)
= b
(1)
b
(5)
= b
(2)
b
(6)
= b
(1)
b
(2)
b
(7)
= 0
The eigenvalues of B are
1
=
2
= 3;
3
= ::: =
7
= 0:
Two orthogonal eigenvectors corresponding to = 3 are
x
(1)
=
_
a; a; 0; a; a; 0; 0
with a =
1
2
_
3
x
(2)
=
_
b; b; 2b; b; b; 2b; 0
with b =
1
2
Hence the coordinates of the seven points are
1
2
__
3
1
_
;
1
2
__
3
1
_
;
_
0
1
_
;
1
2
__
3
1
_
;
1
2
_
_
3
1
_
;
_
0
1
_
;
_
0
0
_
These are vertices of a regular hexagon with sides of length one.
6.5.4 Applications
"Maps from Marriages". An interesting application to archaeology
suggested by D.G. Kendall (1971).
Target was to discover "lost villages" using parish marriage records from a set of eight parishes
in the Otmoor region, close to Oxford.
Assumption: number of marriages between a couple from two parishes reects the distance
between the parishes.
Kendall reconstructed the location of "lost" parishes from medieval times using parish records.
Tobler and Wineberg (1974) tried to locate towns in Anatolia based on joint mentions in As-
syrian cuneiform tablets.
(But assumptions may break down as size of town could be an important factor)
7

Distance-Based Clustering and Multidimensional Scaling Techniques

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Distance-Based Clustering and Multidimensional Scaling Techniques

Caricato da

Copyright:

Formati disponibili

6.

Potrebbero piacerti anche