Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Algorithm
Data Clustering
Conventional K-means
Main steps :Step 1: Chose any k objects arbitrarily for k cluster
centers.
Step 2: Assign each objects in the training set to the
closest cluster and update the center of all
clusters.
Step 3: If the cluster criterion is satisfied i.e. there is
no movement in the cluster centers, the
algorithm stops, else go to step 2.
Conventional K-means
Pros:>> Simple to implement.
>> Computationally efficient. (linear growth in
computational time with respect to
increase
in data set size).
Cons:>> Has problem of Converging at local
minimum.
Cluster Parameters
Cluster information represented as: <w,N,S>
Here,
w center of the cluster
N no. of elements in the cluster
S sum of the square of the distances between
the cluster center and center of the Euclidean
space.
x0 center of Euclidean space.
Distortion error I= S- N[d(w,x0)]2
Modified K- means
Involves jumping operation.
Moves cluster center from least distortion error to cluster
with most distortion error.
Main steps:
Step 1: Choose arbitrarily k objects for k cluster center.
Step 2: Assign each objects to its closest cluster and
update the centers of the cluster.
Step 3: If clustering criterion is satisfied, go to step
4
else go to step 2.
Step 4: If there is a cluster that could be moved to a better
position to reduce sum of total distortion, move
it and go to
step 2, else stop.
Modified K- means
Objective:
>>To obtain the least distortion error in the cluster.
Limitations of this Algorithm:
>> Difficult to calculate the distortion error change
due to shifting of a cluster center.
Overcoming limitation:
>> Change in the distortion error could be
calculated for the special case using 2
procedures.
Incremental K-means
Assign K=1.
Phase 1. Normal training
Step 1. If K=1, choose an arbitrary point for a cluster
center.
If K > 1, insert the center of the new cluster in
the cluster with the greatest distortion.
Step 2. Assign each object in the training set to the closest
cluster and update its center.
Step 3. If the cluster center does not move, go to phase 2.
Else, go to phase 1, step 2.
Phase 2. Increasing the number of clusters
If K is smaller than a specified value, increase K by 1 and
go to phase 1, step 1.
Else, stop.
Incremental K-means
Time complexity of incremental K-means =O(K 2*N*num_iter)
Where num_iter is the no. of iterations for final convergence.
Advantages of Incremental K-means :>> Removes the possibility to converge at local
minimum.
>> No need to shift one cluster center from low
distortion
to high distortion cluster and calculating
its effect on overall and individual distortion error of
other clusters.
Disadvantages :>> Higher time complexity as compared to simple Kmeans algorithm.
Performance comparison
Data sets used: 6 artificial and 6 real data sets from UCI
repository.
Stopping criterion: Max no. of iterations (20 in this case) and
% reduction in distortion error in one iteration (10 -7 in this
case).
Number of clusters K=1 to 15.
Results:
>> On almost all data sets K-means with jumping operation
outperforms original K-means.
>> In case of incremental K- means I a/Imin is almost equal
to
1 for most of the cases showing that it is independent of K and
data set nature & provides reliable & optimal
clustering.
Thank you..