Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
∑ ∑ ∑ xi − x j = ∑ Nk ∑ x −m
2
W (C ) =
2
i k
2 k =1 C (i ) = k C ( j ) = k k =1 C (i )=k
where
10
9
8
7
6
5
K K
1
∑ ∑ ∑ xi − x j = ∑ Nk ∑ x −m
2
W (C ) =
2
2 k =1 C (i ) = k C ( j ) = k
i k 4
k =1 C (i )=k
3
2
1
1 2 3 4 5 6 7 8 9 10
Objective Function
K-means Algorithm
• For a given assignment C, compute the cluster means mk:
∑x
i:C ( i ) = k
i
mk = , k = 1, K , K .
Nk
• For a current set of cluster means, assign each observation
as:
C (i ) = arg min xi − mk , i = 1, K , N
2
1≤ k ≤ K
1 Many variants
1. variants, complex history since 1956
1956, over
100 papers per year currently
2. Iterative, related to expectation-maximization
(EM)
3. # of iterations to converge grows slowly with n, k,
d
4. No accepted method exists to discover k.
K-means Clustering: Step 1
Algorithm: k-means, Distance Metric: Euclidean Distance
5
4
k1
k2
2
k3
0
0 1 2 3 4 5
K-means Clustering: Step 2
Algorithm: k-means, Distance Metric: Euclidean Distance
5
4
k1
k2
2
k3
0
0 1 2 3 4 5
K-means Clustering: Step 3
Algorithm: k-means, Distance Metric: Euclidean Distance
5
4 k1
2
k3
k2
1
0
0 1 2 3 4 5
K-means Clustering: Step 4
Algorithm: k-means, Distance Metric: Euclidean Distance
5
4 k1
2
k3
k2
1
0
0 1 2 3 4 5
K-means Clustering: Step 5
Algorithm: k-means, Distance Metric: Euclidean Distance
n condition 2 5
4
k1
3
expression in
2
k2
k3
1
0
0 1 2 3 4 5
expression in condition 1
Comments on the K-Means Method
• Strength
– Relatively efficient: O(tkn), where n is # objects, k is # clusters,
and t is # iterations. Normally, k, t << n.
– Often terminates at a local optimum. The global optimum may
be found using techniques such as: deterministic annealing
and genetic algorithms
• Weakness
– Applicable only when mean is defined, then what about
categorical data?
– Need to specify k, the number of clusters, in advance
– Unable to handle noisy data and outliers
– Not suitable to discover clusters with non-convex shapes
Image Segmentation Results
Matlab code:
J = reshape(kmeans(I(:),3),size(I));
EM Algorithm
• Initialize K cluster centers
• Iterate between two steps
– Expectation step: assign points to clusters
P(di ∈ ck ) = wk Pr(di | ck ) ∑ w Pr(d | c )
j i j
∑ Pr( d ∈ c )
j
i k
wk = i
N
– Maximation step: estimate model parameters
1 m
d iP (d i ∈ ck )
µ = ∑
∑ P (d i ∈ c j)
k
m i=1
k
Iteration 1
The cluster
means are
randomly
assigned
Iteration 2
Iteration 5
Iteration 25
What happens if the data is streaming…
1 2 3 4 5 6 7 8 9 10
10
9
8
7
6
New data point arrives… 5
4
It is within the threshold for cluster 1,
so add it to the cluster, and update 3 1
3
cluster center. 2
1 2
1 2 3 4 5 6 7 8 9 10
New data point arrives… 10
9 4
It is not within the threshold for cluster
8
1, so create a new cluster, and so on..
7
6
5
4
3 1
3
2
1 2
It is difficult to determine t in
advance…
How can we tell the right number of clusters?
In general, this is a unsolved problem. However there are many approximate methods. In the
next few slides we will see an example.
10
9 For our example, we will use the familiar
katydid/grasshopper dataset.
8
7 However, in this case we are imagining
However
6 that we do NOT know the class labels. We
are only clustering on the X and Y axis
5 values.
4
3
2
1
1 2 3 4 5 6 7 8 9 10
When k = 1, the objective function is 873.0
1 2 3 4 5 6 7 8 9 10
When k = 2, the objective function is 173.1
1 2 3 4 5 6 7 8 9 10
When k = 3, the objective function is 133.6
1 2 3 4 5 6 7 8 9 10
We can plot the objective function values for k equals 1 to 6…
The abrupt change at k = 2, is highly suggestive of two clusters in the data. This
technique for determining the number of clusters is known as “knee finding” or
“elbow finding”.
1.00E+03
9.00E+02
8.00E+02
7 00E+02
7.00E+02
Objective Function
6.00E+02
5.00E+02
4.00E+02
3.00E+02
2.00E+02
1.00E+02
0.00E+00
1 2 3
k 4 5 6
Note that the results are not always as clear cut as in this toy example
Summary
• K-means converges, but it finds a local minimum of
the cost function
• Works only for numerical observations (for
categorical and mixture observations, K-medoids is a
clustering method)
• Fine tuning is required when applied for image
segmentation; mostly because there is no imposed
spatial coherency in k-means algorithm
• Often works as a staring point for sophisticated
image segmentation algorithms