Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
y3
, Michihiko Minoh
y 3
Now, he is with NEC, Information Technology Research Labs., Pattern Recognition Research Lab.
Partitional clustering methods such as C-Means classify all samples into clusters. Even a noise sample that is distant from any cluster is assigned to one of the clusters. Noise samples included in clusters bias the clustering result and tend to produce meaningless clusters. Our clustering method repeats to extract mutually close samples as a cluster and leave isolated noises unclustered. Thus, the produced clusters are less aected by noises than those of C-Means. Because clusters can be obtained analytically by our method, repeated trials to avoid local minima are not necessary. The method is shown to be eective for extracting straight lines from images in the experiments.
Abstract
1 Introduction
The purpose of clustering is to nd clusters from a set of samples, where a cluster is comprised of a number of similar samples grouped together[1]. In general settings, a sample i(i = 1; 1 1 1 ; n) is represented by a sdimensional feature vector xi 2 Rs . The dissimilarity between two samples i and j is represented by the distance between the two samples dij . The product of a clustering algorithm is the membership matrix U = [uki ], where the membership value uki stands for the degree that the sample i belongs to the cluster k. When membership values are limited to 0 or 1, the clustering algorithm is called a \hard clustering algorithm". On the other hand, when membership values are allowed to be any real values between 0 and 1, the clustering algorithm is called a \fuzzy clustering algorithm". The tasks which clustering is used for can be classied into two categories.
Compression Tasks In these tasks, the samples in a cluster are replaced by a representative sample. The number of samples are reduced to the number of clusters, and so the storage are compressed. The clustering methods are evaluated by the compression rate and the compression quality, where the compression rate describes how much the storage reduced, and the compression quality is dened by the average distance between each sample in a cluster and the representative sample. For example, the vector quantization[2] is a typical data compression task. Extraction Tasks In these tasks, a cluster is expected to correspond to an entity in a real world. For example, in clustering for the image segmentation[3], a cluster of pixels are expected to correspond to a region in the image. The clustering methods are evaluated by the correspondence between a cluster and a real entity.
In the extraction tasks, the presence of noises makes clustering dicult. Conceptually, noises are dened as the samples that do not comprise the entity that we want. In the task of line extraction where pixels are clustered into lines [4], all the pixels which are not aligned in line are considered as noises. When a sample is described as a point in the feature space, noises are quantitatively dened as the isolated points where the density in the neighborhood is low[5]. In this paper, we distinguish \noises" with \outliers"[6] in statistics. Whereas outliers are the points far out of the expected area, noises include the samples in the low density area among clusters. The C-Means algorithm[7] is the most frequently used one among fuzzy clustering algorithms. But, CMeans algorithm is very easily aected by noises. It often produces clusters that comprise of noises only, and it 1
is likely to merge two clusters when there are noises between the clusters. The weakness againstP is mainly noise due to the constraint that the sum of the membership values of a sample to all clusters is 1 ( c =1 uki = 1). k This constraint imposes that a sample has to be assigned to one of the clusters. Accordingly, even a noise sample is assigned to one of the clusters and bias the whole result. In the early works on noise detection, Zahn[8] detected low-density regions by histgram analysis. But, this method is not eective unless there is large dierence in density between noises and the points that comprise clusters. There are several studies that modify C-Means to be robust against noises. Jolion et. al.[5] tried to obtain noiseless clusters by assigning small weights to noises. Dave et. al.[9] prepared a noise cluster so that noises are assigned into it. Since these methods do not omit the constraint mentioned above, they have only limited eects. In this paper, we propose the Sequential Fuzzy Cluster Extraction (SFCE) method. The clusters are extracted sequentially by extracting the subset of samples which are close to each other. The sequential extraction ends when almost all samples are extracted as clusters. The areas of high density are extracted early, and the ones of low density (i.e. noises) remain unclustered. Another strong point of SFCE is that a cluster can be obtained by analytical computation without an optimization process. Therefore, SFCE has no local minima problem. The rest of this paper is organized as follows: In Sec.2, the concept of sequential cluster extraction is described. In Sec.3, the procedure of SFCE is described in detail. In Sec.4, the scale parameter to determine the size of clusters is explained. In Sec.5, it is shown that clusters can be extracted analytically. In Sec.6, the robustness against noise of SFCE is testied in experiments. Sec.7 is conclusion.
2 Cluster Extraction
In this section, we present the sequential hard cluster extraction (SHCE) method and then it is fuzzied to the sequential fuzzy cluster extraction (SFCE). In SHCE, the number of samples in a cluster is restricted to K . The K samples whose sum of distances between the samples is the smallest is chosen as a cluster and removed. This process is repeated until there remains no samples. Since the clusters extracted early are not likely to include noises, we can obtain the clusters without noises by ignoring the clusters extracted lately. Let wi 2 f0; 1g denote the membership of the sample i to the cluster under consideration and dij denote the distance between i and j . The hard cluster can be obtained as an optimal solution of the following optimization problem: Minimize n n XX dij wi wj ; (1) i=1 j =1 i6=j subject to the constraints n X wi = K; wi 2 f0; 1g: i=1 It is clear that SHCE fails to extract clusters when the number of samples of each cluster diers. This problem can be resolved by fuzzifying SHCE to SFCE, where \fuzzication" means to allow the membership value to be an arbitrary value in [0; 1]. We produced 100 1-dimensional points which has the normal distribution. SHCE and SFCE are applied to the points and the results are shown in Fig. 1. Whereas the membership values of SHCE are binary, the membership values of SFCE decreases smoothly from the center to the edges. Thresholding the membership values, a cluster of arbitrary number of samples can be obtained. In SFCE, the clusters with dierent numbers of samples can be obtained by adjusting the threshold of membership values. SFCE is formulated as follows: Minimize n n XX dij (wi wj )m Q(w) = ; (2) i=1 j =1 i6=j subject to the constraints
wi = 1; wi 0; i=1 where m is a scale parameter which aects the spread of membership values and it will be explained later in Sec. 4.
2
n X
Q(w1 );
subject to the constraints
w1i = 1; w1i 0: i=1 The members of the k (k 2)-th cluster wk need to be extracted from the samples that do not belong to the extracted clusters w1 ; 1 1 1 ; wk01 . So, the duplication between the clusters should be minimized in the optimization problem. wk is obtained by solving the following problem: Maximize Q(wk ) +
subject to the constraints
n X
dup(wt ; wk ); k 0 1 t=1 t
k01 X
(3)
wki = 1; wki 0; i=1 where dup(wt ; wk ) characterizes the duplication of two clusters wt and wk .
T dup(wt ; wk ) = wt wk ;
n X
(4)
The parameter i (> 0) controls the degree of duplication of the i-th cluster with the other clusters. Since we do not have an established method for setting i 's, these parameters have to be set empirically. It is assumed every i is proportional to Q(wi ) and reduced the number of parameters to one, that is
i = Q(wi );
(5)
where the parameter represents the duplication ratio. Since the sum of the membership values of the samples in a cluster is the same over all classes, the membership values are high on the whole in the cluster of a small number of samples. On the contrary, in the cluster of a large number of samples, the membership values are low. Therefore, it is not appropriate to use the same threshold value over all clusters. So, a fuzzy cluster wk is defuzzied to a hard cluster Ck according to the following procedure. Initially, Ck is a null set. The samples are added to Ck one by one in the descending order of membership value wki . The adding process continues until n X X (6) wki > wki ; i=1 i2Ck where is the parameter that controls the boundary of the cluster.
4 Scale Parameter
It is very dicult to determine the optimal number of clusters although many heuristics have been proposed so far[10]. One reason for this diculty is that the optimal number that human feels changes according to the scale in which the human observes data. There are several methods where the scale to observe data is determined in advance instead of the number of clusters. These methods are called \scale-based clustering"[11]. Melting algorithm[12] and RBF-based clustering are contained in this category. SFCE is also a scale-based clustering method with scale parameter m. The eect of the scale parameter m to membership values is illustrated in Fig. 2, where SFCE was applied to the 500 test points which has twodimensional normal distribution, and the dissimilarity of two samples was described as the Euclidean distance of two samples. When m is small, the membership values concentrate on the small set of samples. As m is large, the membership values are smoothed over all samples.
The optimization problems of SFCE (Eq. 2 and Eq. 3) can be solved analytically, if the following conditions are satised.
i6=j
(7)
n X i=1
w1i = 1; w1i 0:
The rst eigenvector (i.e. the eigenvector corresponding to the largest eigenvalue) of n 2 n matrix
(8)
n X
i=1 Since the signs of hij are all positive, the signs of the elements of z1 are either all positive or all negative. Since 0z1 is also the rst eigenvector, we can choose the rst eigenvector which is all positive(z1i 0). Then, 2 the two optimization problems of Eq. 7 and Eq. 8 are equivalent when w1i = z1i and
2 z1i = 1:
e (i 6= j ) hij = 0ij (i = j ) :
(9)
Therefore, the optimal solution of Eq.7 can be obtained from the rst eigenvector of H . Similarly, the optimization problem of Eq. 3 to obtain k -th cluster can be solved by the rst eigenvector of the following matrix: 8e (i 6= j ) > ij < k01 X (10) hij = 0 1 > k 0 1 t wti (i = j ) : : t=1
""
When the relation of two samples is represented as dissimilarity, it should be converted to similarity to extract clusters analytically. Here, the conversion is performed as follows:
d eij = exp(0 ij ):
(11)
To extract clusters analytically, the scale parameter is xed to 1/2. Instead, the parameter can be used. When is small, many small clusters are extracted. On the contrary, when is large, a few large clusters are extracted.
6 Experiments
In this section, SFCE is applied to articial data which contain a number of noises and the robustness against noise of SFCE is compared with those of C-Means and Noise-Resistent C-Means(NR C-Means)[9]. NR C-Means is the algorithm that assigns the sample whose distance to the nearest cluster center is larger than a certain threshold to the noise cluster. It is a slightly modied version of C-Means algorithm. The modication is as follows: First, the number of clusters is increased from c to c + 1. The c + 1-th cluster is called a \noise cluster". When the distances between the sample and the cluster centers are measured, the distances to 1; 1 1 1 ; c-th clusters are obtained as usual, but the distance to c + 1-th cluster is set to no matter where the sample is. When all the distances to 1; 1 1 1 ; c-th clusters are more than , the nearest cluster is the noise cluster. So, the isolated samples can be gathered into the noise cluster. We produced 500 clustered points and 300 noise points on the two dimensional region f(x; y)j0 x 1; 0 y 1g. The clustered points are placed at 7 positions using Neyman-Scott Process[1]. 1. Determine the number of points contained in each cluster. 2. Set the central position of each cluster so that the distance between every two centers is more than 0.2. 3. Produce points of each cluster so that the distances from the points to the center have the normal distribution with expectation 0 and variance 2 . We prepared ve kinds of testdata with = 0:02; 0:04; 0:06; 0:08; 0:1. The number of points of each cluster is determined randomly so that each cluster contains more than 5% of all points. The noise points are scattered over the region according to the uniform distribution. This kind of test data is frequently used for evaluating the robustness against noise[5, 9]. In C-Means and NR C-Means, the number of clusters was set to the true number of clusters in advance. The initial cluster centers were determined as follows: 1. The rst center is chosen as the nearest point to the arithmetic mean. 2. Other centers are randomly chosen among the points so that the distance between every two centers is more than 0.2. Ten trials were made and the most successful one which achieved the minimum value of the objective function is selected. In the evaluation of results, the defuzzied clusters Ck (k = 1; 1 1 1 ; c) are matched against the prepared clusters Bj (j = 1; 1 1 1 ; b). The cluster Ck is judged as correctly extracted, if
There is a prepared cluster Bj in which the 90% of the non-noise points of Ck are contained. No other clusters Ci (i 6= k ) satisfy the rst condition.
In SFCE, the fuzzy clusters were defuzzied with = 0:90. In C-Means and NR C-Means, each sample was assigned to the cluster which has the largest membership value. We call the ratio of the number of the correctly extracted clusters to the number of prepared ones \the extraction rate". In SFCE, the parameters were set as follows: = 1000; = 0:03. In NR C-Means, the value of was set empirically at 0.12 as follows: the clustering was performed at = 0:1; 0:11; :::; 0:2, and we chose the value that achieved the largest sum of the extraction rates at = 0:02; 0:04; 1 1 1 ; 0:1. The average extraction rate over 15 trials is shown in Fig. 3. As increases, the duplication between clusters increases. So, clustering becomes dicult and the extraction rate decreases. The performance of C-Means is much worse than the other two, because C-Means does not assume the existence of noise. When is large, SFCE performs better than NR C-Means. The reason is that NR C-Means discriminates a point as noise when the distance between the point and its nearest cluster center is more than . NR C-Means can exclude the noises which are far from all clusters, but cannot exclude the noises between clusters. The example of clusters produced by SFCE is shown in Fig. 4 ( = 0:06). The mark "" shows the hard cluster defuzzied at = 0:99. You can see that the noises are excluded and only high density areas are extracted. Since SFCE uses the quadratic objective function as in C-Means, the clusters are spherical. The cluster which is not spherical is extracted as the sum of spherical clusters. The variants of C-Means with modication to the distance measure can produce non-spherical clusters[14]. By these modications, SFCE will be able to extract non-spherical clusters. 5
s c +c d ) + exp( ) + exp( 1 2 ); wd ws 2w c
(12)
where d is the distance between the nearest end points, s is the angle between the line segments, c1 and c2 are the angles between the line which connects the centers and the line segments, and wd ; ws and wc are the weights. Here, the weights are set as follows: wd = 80(pixels), ws = wc = 9= . The image used for the experiment is the snapshot of a square paper placed on small pebbles(Fig.6(a)). The purpose of the experiment is to extract the four lines that comprise the boundary of the paper. In the line extraction tasks from natural scenes, such textured images are considered to occur frequently. Edge detection, binarization, thinning and line segment tting is performed on the image. The obtained line segments are shown in Fig. 6(b). The small line segments whose length is less than 3 pixels are omitted. The texture of pebbles produces the noisy line segments. Fig. 6(c) and (d) show the extracted clusters produced by SFCE and NR C-Means. Here, a cluster is approximated by a single line. The parameters of SFCE were = 0:99; = 1000; = 0:6. In NR C-Means, the experiments were repeated with changing from 1 to 10 at intervals of 1. At = 4, the extracted lines tted to the boundary best. In NR C-Means, only two lines of the boundary were correctly extracted. The other two were wrongly extracted due to noisy line segments. On the other hand, SFCE succeeded to extract all the lines of the boundary, which suggests the eectiveness of SFCE in image processing tasks.
7 Conclusion
In this paper, we proposed a new clustering method called \sequential fuzzy cluster extraction". Ordinary fuzzy clustering methods are easily aected by noises due to the constraint that a sample must be assigned to one of the clusters. Our method sequentially extracts the subset of samples which are close to each other, and has the high robustness against noise. The application areas of SFCE include the image segmentation where noises are easily included, and the number of clusters cannot be predicted in advance.
References
[1] A. K. Jain and R. C. Dubes: \Algorithms for Clustering Data", Prentice Hall (1988). [2] N. Ueda and R. Nakano: \A competitive & selective learning method for designing vector quantizers", 1993 IEEE Int. Conf. Neural Netw., pp. 1444{1449 (1993). [3] M. M. Trivedi and J. C. Bezdek: \Low-level segmentation of aerial images with fuzzy clustering", IEEE Trans. Syst. Man Cybern., SMC-16, 4, pp. 589{598 (1986). [4] P. F. M. Nacken: \A metric for line segments", IEEE Patt. Anal. Mach. Intell., (1993).
15,
[5] J.-M. Jolion and A. Rosenfeld: \Cluster detection in background noise", Pattern Recognition, 22, 5, pp. 603{607 (1989). [6] P. J. Rousseeuw and A. M. Leroy: \Robust Regression and Outlier Detection", John Wiley & Sons (1987). [7] J. C. Bezdek: \A convergence theorem for the fuzzy isodata clustering algorithms", IEEE Trans. Patt. Anal. Mach. Intell., 2, 1, pp. 1{8 (1980). 6
hard
fuzzy
-2
-1.5
-1
-0.5
0.5
1.5
Figure 1: Comparing a fuzzy cluster with a hard cluster [8] C. T. Zahn: \Graph-theoretical methods for detecting and describing gestalt clusters", IEEE Trans. Computers, C-20, 1, pp. 68{86 (1971). [9] R. N. Dave: \Characterization and detection of noise in clustering", Pattern Recognition Letters, 12, 11, pp. 657{664 (1991). [10] G. W. Milligan and M. C. Cooper: \An examination of procedures for determining the number of clusters in a data set", Psychometrika, 50, pp. 159{179 (1985). [11] S. V. Chakravarthy and J. Ghosh: \Scale-based clustering using radial basis function network", IEEE ICNN'94, 2, pp. 897{902 (1994). [12] Y. fai Wong: \Clustering data by melting", Neural Computation, 5, 1, pp. 89{104 (1993). [13] R. Bellman: \Introduction to Matrix Analysis: Second Edition", McGraw-Hill (1970). [14] R. N. Dave: \Generalized fuzzy c-shells clustering and detection of circular and elliptical boundaries", Pattern Recognition, 25, 7, pp. 713{721 (1992). [15] R. J. Hathaway, J. C. Bezdek and J. W. Davenport: \On relational data versions of c-means algorithms", Pattern Recognition Letters, 17, 6, pp. 607{612 (1996).
0.01 0.009 0.008 Membership Value 0.007 0.006 -1 0.005 0.004 m=0.75 0.003 0.002 0.001 0 m=1 -2 -2 -1 1 0 Test Example 2 m=0.5 0 2 1
0.5
2.5
Figure 2: Membership value of the cluster extracted from the test example versus scale parameter m
1 Cluster Extraction Rate 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.02 0.04 0.06 0.08 Cluster Spread ()
SFCE
NR C-Means
C-Means
0.1
Figure 3: Cluster extraction rates of SFCE, Noise-Resistent C-Means and C-Means on noisy data 8
6
0.8
4 2
0.6
1
0.4
0.2
l1 c2 c1 d l2
Figure 5: Similarity between two line segments 9
10