Sei sulla pagina 1di 17

0

9. Hierarchical Clustering
• Many times, clusters are not disjoint, but may have
subclusters, in turn having sub-subclusters, etc.
• Consider a sequence of partitions of the n samples
into c clusters
• The first is a partition into n clusters, each one
containing exactly one sample
• The second is a partition into n-1 clusters, the third into
n-2, and so on, until the n-th in which there is only one
cluster containing all of the samples
• At the level k in the sequence, c = n-k+1.

Pattern Classification, Chapter 10


1

• Given any two samples x and x’, they will be grouped


together at some level, and if they are grouped a level k,
they remain grouped for all higher levels
• Hierarchical clustering  tree called dendrogram

Pattern Classification, Chapter 10


• The similarity values may help to determine if the
2

grouping are natural or forced, but if they are


evenly distributed no information can be gained
• Another representation is based on Venn diagrams

Pattern Classification, Chapter 10


3

• Hierarchical clustering can be divided in


agglomerative and divisive.

• Agglomerative (bottom up, clumping): start with


n singleton cluster and form the sequence by
merging clusters

• Divisive (top down, splitting): start with all of the


samples in one cluster and form the sequence
by successively splitting clusters

Pattern Classification, Chapter 10


4

Agglomerative hierarchical clustering

• The procedure terminates when the specified number of


cluster has been obtained, and returns the clusters as sets
of points, rather than the mean or a representative vector
for each cluster. Terminating at c = 1 yields a dendrogram.
Pattern Classification, Chapter 10
5

• At any level, the distance between nearest clusters


can provide the dissimilarity value for that level
• To find the nearest clusters, one can use
d min ( Di , D j )  min x  x' Nearest neighbor
x D i , x ' D j

d max ( Di , D j )  max x  x' Farthest neighbor


xDi , x 'D j

1
d avg ( Di , D j ) 
ni n j
  x  x'
xDi x 'D j

d mean ( D i , D j )  m i  m j
which behave quite similar if the clusters are
hyperspherical and well separated.

Pattern Classification, Chapter 10


6

Nearest-neighbor algorithm
• When dmin is used, the algorithm is called the
nearest neighbor algorithm
• If it is terminated when the distance between
nearest clusters exceeds an arbitrary threshold,
it is called single-linkage algorithm
• If data points are considered nodes of a graph
with edges forming a path between the nodes in
the same subset Di, the merging of Di and Dj
corresponds to adding an edge between the
neirest pair of node in Di and Dj
• The resulting graph has any closed loop and it is
a tree, if all subsets are linked we have a
spanning tree
Pattern Classification, Chapter 10
7

• The use of dmin as a distance measure and


the agglomerative clustering generate a
minimal spanning tree

• Chaining effect: defect of this distance


measure (right side of figure next slide)

Pattern Classification, Chapter 10


8

Problem!

Pattern Classification, Chapter 10


9

The farthest neighbor algorithm


• When dmax is used, the algorithm is called the
farthest neighbor algorithm
• If it is terminated when the distance between
nearest clusters exceeds an arbitrary threshold, it
is called complete-linkage algorithm
• This method discourages the growth of elongated
clusters
• In the terminology of graph theory, every cluster
constitutes a complete subgraph, and the distance
between two clusters is determined by the most
distant nodes in the 2 clusters
Pattern Classification, Chapter 10
10

• When two clusters are merged, the graph is


changed by adding edges between every pair
of nodes in the 2 clusters

• All the procedures involving minima or


maxima are sensitive to outliers. The use of
dmean or davg are natural compromises

Pattern Classification, Chapter 10


11

Pattern Classification, Chapter 10


12

The problem of the number of clusters

• Typically, the number of clusters is known


• When it’s not, there are several ways of proceed
• When clustering is done by extremizing a criterion function,
a common approach is to repeat the clustering with c=1,
c=2, c=3, etc.
• Another approach is to state a threshold for the creation of
a new cluster
• These approaches are similar to model selection
procedures, typically used to determine the topology and
number of states (e.g., clusters, parameters) of a model,
given a specific application

Pattern Classification, Chapter 10


13

13. Component Analysis (also Chap 3)

• Combine features to reduce the dimension of the feature space


• Linear combinations are simple to compute and tractable
• Project high dimensional data onto a lower dimensional space
• Two classical approaches for finding “optimal” linear
transformation
• PCA (Principal Component Analysis) “Projection that best represents the
data in a least- square sense”
• MDA (Multiple Discriminant Analysis) “Projection that best separates the
data in a least-squares sense” (generalization of Fisher’s Linear
Discriminant for two classes)

Pattern Classification, Chapter 10


14

• PCA (Principal Component Analysis) “Projection that


best represents the data in a least- square sense”
• The scatter matrix of the cloud of samples is the same as
the maximum-likelihood estimate of the covariance matrix
• Unlike a covariance matrix, however, the scatter matrix includes
samples from all classes!

• And the least-square projection solution (maximum scatter)


is simply the subspace defined by the d’<d eigenvectors of
the covariance matrix that correspond to the largest d’
eigenvalues of the matrix
• Because the scatter matrix is real and symmetric the eigenvectors
are orthogonal

Pattern Classification, Chapter 10


15

Fisher Linear Discriminant


• While PCA seeks directions efficient for representation, discriminant
analysis seeks directions efficient for discrimination

Pattern Classification, Chapter 10


16

Multiple Discriminant Analysis


• Generalization of Fisher’s Linear Discriminant for more than two classes

Pattern Classification, Chapter 10