Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
James Bailey
1
Introduction
Cluster analysis: group similar
objects into clusters
No single solution
Task formulation:
Number of alternatives to generate
Sequential or Simultaneous Generation
Mathematical basis
Linear algebra
Information theory
Other objective functions
Sequential Alternative
Clustering Generation
Task: Given input clusterings {C1,..Cn}, generate an
alternative clustering C, such that C is of high quality
and C is different from {C1Cn}
Important special case: n=1
Existing Alternative
C1 generate
C2 ------> C
Cn
Simultaneous Alternative
Clustering Generation
Task: Simultaneously generate n clusterings
{C1,Cn}, such that each Ci is of high quality
and each pair (Ci,Cj) is different from one
another
Important special case: n=2
Alternatives
generate
C1
----------> C2
Cn
Sequential vs. Simultaneous
Sequential (greedy)
Semi-supervised
For i=2 to n
{generate the optimal alternative clustering with
respect to the previous i clusterings}
Locally optimal at each step
Simultaneous (non-greedy)
Unsupervised
In parallel, generate optimal set of n clusterings
Globally optimal clustering collection
but might miss some strong clusterings which would
be generated by a sequential technique
More difficult optimisation problem
Style of Algorithm
Projection based
Project the data into an orthogonal subspace and then
re-cluster
Appealing linear algebra formulation
Relatively efficient
Orthogonality may be too strict
More complex objective function
Generate the alternative clustering, trading off
dissimilarity and quality in the objective function
More flexible
May require parameter choices
Simple Example