Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Inter-cluster
Intra-cluster distances are
di t
distances are maximized
i i d
minimized
What is not Cluster Analysis?
z Supervised classification
– Have class label information
z Simple segmentation
– Dividing students into different registration groups
alphabetically,
p y by
y last name
K-means Clustering
2.5
2
Original Points
1.5
y
0.5
3 3
2.5 2.5
2 2
1.5 1.5
y
1 1
0.5 0.5
0 0
O ti l Clustering
Optimal Cl t i S b
Sub-optimal
ti l Clustering
Cl t i
Importance of Choosing Initial Centroids
Iteration 6
1
2
3
4
5
3
2.5
1.5
y
0.5
-2
2 -1.5
15 -1
1 -0.5
05 0 05
0.5 1 15
1.5 2
x
2 2 2
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2 2 2
1 1 1
0 0 0
-2
2 -1.5
1.5 -1
1 -0.5
0.5 0 0.5 1 1.5 2 -2
2 -1.5
1.5 -1
1 -0.5
0.5 0 0.5 1 1.5 2 -2
2 -1.5
1.5 -1
1 -0.5
0.5 0 0.5 1 1.5 2
x x x
Evaluating K-means Clusters
– x is
i addata
t point
i t iin cluster
l t Ci and
d mi is
i th
the representative
t ti point
i t ffor
cluster Ci
can show that mi corresponds to the center (mean) of the cluster
– Given two clusters, we can choose the one with the smallest
error
– One easy way to reduce SSE is to increase K
K, the number of
clusters
A good clustering with smaller K can have a lower SSE than a poor
clustering with higher K
Iteration 5
1
2
3
4
3
2.5
1.5
y
0.5
-2
2 -1.5
15 -1
1 -0.5
05 0 05
0.5 1 15
1.5 2
x
Importance of Choosing Initial Centroids …
Iteration 1 Iteration 2
3 3
2.5 2.5
2 2
1.5 1.5
y
y
1 1
0.5 0.5
0 0
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2
y
-2
-4
-6
0 5 10 15 20
x
Starting with two initial centroids in one cluster of each pair of clusters
10 Clusters Example
Iteration 1 Iteration 2
8 8
6 6
4 4
2 2
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
x x
Iteration 3 Iteration 4
8 8
6 6
4 4
2 2
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
x x
Starting with two initial centroids in one cluster of each pair of clusters
10 Clusters Example
Iteration 4
1
2
3
8
2
y
-2
-4
-6
0 5 10 15 20
x
Starting with some pairs of clusters having three initial centroids, while other have only one.
10 Clusters Example
Iteration 1 Iteration 2
8 8
6 6
4 4
2 2
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
x
Iteration 3 x
Iteration 4
8 8
6 6
4 4
2 2
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
x x
Starting with some pairs of clusters having three initial centroids, while other have only one.
Limitations of K-means