Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
PROCESSING
Title: K-Means Clustering
1
In K-means clustering, the number of clusters is initialized and the center of each of
the cluster is randomly chosen. The Euclidean distance between each data point and all
the center of the clusters is computed and based on the minimum distance each data
point is assigned to certain cluster. The new center for the cluster is defined and the
Euclidean distance is calculated. This procedure iterates till convergence is reached.
The objective of K-Means clustering is to minimize the sum of squared distances
between all points and the cluster center.
2
If we talk about K-Means then the correct choice of K is often ambiguous, with
interpretations depending on the shape and scale of the distribution of points in a data set
and the desired clustering resolution of the user. In addition, increasing K without penalty
will always reduce the amount of error in the resulting clustering, to the extreme case of
zero error if each data point is considered its own cluster (i.e., when K equals the number
of data points, n).
Intuitively then, the optimal choice of K will strike a balance between maximum
compression of the data using a single cluster, and maximum accuracy by assigning each
data point to its own cluster.
If an appropriate value of K is not apparent from prior knowledge of the properties of the
data set, it must be chosen somehow. There are several categories of methods for making
this decision and Elbow method is one such method.
ELBOW METHOD
The basic idea behind partitioning methods, such as K-Means clustering, is to define clusters
such that the total intra-cluster variation or in other words, total within-cluster sum of square
(WCSS) is minimized. The total WCSS measures the compactness of the clustering and
we want it to be as small as possible.
3
The Elbow method looks at the total WCSS as a function of the number of clusters: One
should choose a number of clusters so that adding another cluster doesn’t improve much
better the total WCSS.
4
Cluster 2 position X = 13, Y = 15
1. Find the Euclidean Distance:
2. Find the Euclidean distance(D1) between data point and the cluster 1 similarly, find the
Euclidean distance(D2) between data point and the cluster 2
3. Distance D1 = sqrt((13-8).^2+(20-19).^2)) = 5.0990
4. Distance D2 = sqrt((13-13).^2+(20-15).^2))= 5.0000
5. Find the minimum and assign the data point to a cluster
6. Now the minimum distance among the two results is for the cluster 2.
7. So the data point with (X,Y)=(13,20) is assigned to the cluster/group 2.
8. Perform the step 1 and step 2 for all the data points and assign group accordingly.
9. Assign a new position for the clusters based on the clustering done.
10. Find the average position of the newly assigned data points to a particular cluster and use
that average as the new position for the cluster.
11. Iterate this procedure till the position of the clusters are unchanged.
12. Number of clusters = 3
PROGRAM:
he = imread(hestain.png);
lab_he = rgb2lab(he);
ab = lab_he(:,:,2:3);
ab = im2single(ab);
nColors = 3;
pixel_labels = imsegkmeans(ab,nColors,NumAttempts,3);
imshow(pixel_labels,[])
5
mask1 = pixel_labels==1;
cluster1 = he .* uint8(mask1);
imshow(cluster1)
mask2 = pixel_labels==2;
cluster2 = he .* uint8(mask2);
imshow(cluster2)
mask3 = pixel_labels==3;
cluster3 = he .* uint8(mask3);
imshow(cluster3)
L = lab_he(:,:,1);
L_blue = L .* double(mask3);
L_blue = rescale(L_blue);
idx_light_blue = imbinarize(nonzeros(L_blue));
blue_idx = find(mask3);
mask_dark_blue = mask3;
mask_dark_blue(blue_idx(idx_light_blue)) = 0;
blue_nuclei = he .* uint8(mask_dark_blue);
imshow(blue_nuclei)
title(BlueNuclei);
6
RESULT