Sei sulla pagina 1di 10

Sequential Fuzzy Cluster Extraction and Its Robustness Against Noise

Koji Tsuda , Shuji Senda

y3

, Michihiko Minoh

and Katsuo Ikeda

y 3

Now, he is with NEC, Information Technology Research Labs., Pattern Recognition Research Lab.

Faculty of Engineering, Kyoto University, Kyoto-shi, 606-01, Japan

Partitional clustering methods such as C-Means classify all samples into clusters. Even a noise sample that is distant from any cluster is assigned to one of the clusters. Noise samples included in clusters bias the clustering result and tend to produce meaningless clusters. Our clustering method repeats to extract mutually close samples as a cluster and leave isolated noises unclustered. Thus, the produced clusters are less a ected by noises than those of C-Means. Because clusters can be obtained analytically by our method, repeated trials to avoid local minima are not necessary. The method is shown to be e ective for extracting straight lines from images in the experiments.

Abstract

Keywords: Cluster Extraction, Clustering, Eigenvalue Problem, Scale, Noise

1 Introduction
The purpose of clustering is to nd clusters from a set of samples, where a cluster is comprised of a number of similar samples grouped together[1]. In general settings, a sample i(i = 1; 1 1 1 ; n) is represented by a sdimensional feature vector xi 2 Rs . The dissimilarity between two samples i and j is represented by the distance between the two samples dij . The product of a clustering algorithm is the membership matrix U = [uki ], where the membership value uki stands for the degree that the sample i belongs to the cluster k. When membership values are limited to 0 or 1, the clustering algorithm is called a \hard clustering algorithm". On the other hand, when membership values are allowed to be any real values between 0 and 1, the clustering algorithm is called a \fuzzy clustering algorithm". The tasks which clustering is used for can be classi ed into two categories.

Compression Tasks In these tasks, the samples in a cluster are replaced by a representative sample. The number of samples are reduced to the number of clusters, and so the storage are compressed. The clustering methods are evaluated by the compression rate and the compression quality, where the compression rate describes how much the storage reduced, and the compression quality is de ned by the average distance between each sample in a cluster and the representative sample. For example, the vector quantization[2] is a typical data compression task. Extraction Tasks In these tasks, a cluster is expected to correspond to an entity in a real world. For example, in clustering for the image segmentation[3], a cluster of pixels are expected to correspond to a region in the image. The clustering methods are evaluated by the correspondence between a cluster and a real entity.

In the extraction tasks, the presence of noises makes clustering dicult. Conceptually, noises are de ned as the samples that do not comprise the entity that we want. In the task of line extraction where pixels are clustered into lines [4], all the pixels which are not aligned in line are considered as noises. When a sample is described as a point in the feature space, noises are quantitatively de ned as the isolated points where the density in the neighborhood is low[5]. In this paper, we distinguish \noises" with \outliers"[6] in statistics. Whereas outliers are the points far out of the expected area, noises include the samples in the low density area among clusters. The C-Means algorithm[7] is the most frequently used one among fuzzy clustering algorithms. But, CMeans algorithm is very easily a ected by noises. It often produces clusters that comprise of noises only, and it 1

is likely to merge two clusters when there are noises between the clusters. The weakness againstP is mainly noise due to the constraint that the sum of the membership values of a sample to all clusters is 1 ( c =1 uki = 1). k This constraint imposes that a sample has to be assigned to one of the clusters. Accordingly, even a noise sample is assigned to one of the clusters and bias the whole result. In the early works on noise detection, Zahn[8] detected low-density regions by histgram analysis. But, this method is not e ective unless there is large di erence in density between noises and the points that comprise clusters. There are several studies that modify C-Means to be robust against noises. Jolion et. al.[5] tried to obtain noiseless clusters by assigning small weights to noises. Dave et. al.[9] prepared a noise cluster so that noises are assigned into it. Since these methods do not omit the constraint mentioned above, they have only limited e ects. In this paper, we propose the Sequential Fuzzy Cluster Extraction (SFCE) method. The clusters are extracted sequentially by extracting the subset of samples which are close to each other. The sequential extraction ends when almost all samples are extracted as clusters. The areas of high density are extracted early, and the ones of low density (i.e. noises) remain unclustered. Another strong point of SFCE is that a cluster can be obtained by analytical computation without an optimization process. Therefore, SFCE has no local minima problem. The rest of this paper is organized as follows: In Sec.2, the concept of sequential cluster extraction is described. In Sec.3, the procedure of SFCE is described in detail. In Sec.4, the scale parameter to determine the size of clusters is explained. In Sec.5, it is shown that clusters can be extracted analytically. In Sec.6, the robustness against noise of SFCE is testi ed in experiments. Sec.7 is conclusion.

2 Cluster Extraction
In this section, we present the sequential hard cluster extraction (SHCE) method and then it is fuzzi ed to the sequential fuzzy cluster extraction (SFCE). In SHCE, the number of samples in a cluster is restricted to K . The K samples whose sum of distances between the samples is the smallest is chosen as a cluster and removed. This process is repeated until there remains no samples. Since the clusters extracted early are not likely to include noises, we can obtain the clusters without noises by ignoring the clusters extracted lately. Let wi 2 f0; 1g denote the membership of the sample i to the cluster under consideration and dij denote the distance between i and j . The hard cluster can be obtained as an optimal solution of the following optimization problem: Minimize n n XX dij wi wj ; (1) i=1 j =1 i6=j subject to the constraints n X wi = K; wi 2 f0; 1g: i=1 It is clear that SHCE fails to extract clusters when the number of samples of each cluster di ers. This problem can be resolved by fuzzifying SHCE to SFCE, where \fuzzi cation" means to allow the membership value to be an arbitrary value in [0; 1]. We produced 100 1-dimensional points which has the normal distribution. SHCE and SFCE are applied to the points and the results are shown in Fig. 1. Whereas the membership values of SHCE are binary, the membership values of SFCE decreases smoothly from the center to the edges. Thresholding the membership values, a cluster of arbitrary number of samples can be obtained. In SFCE, the clusters with di erent numbers of samples can be obtained by adjusting the threshold of membership values. SFCE is formulated as follows: Minimize n n XX dij (wi wj )m Q(w) = ; (2) i=1 j =1 i6=j subject to the constraints

wi = 1; wi  0; i=1 where m is a scale parameter which a ects the spread of membership values and it will be explained later in Sec. 4.
2

n X

3 Sequential Extraction of Clusters


To obtain c clusters, the cluster extraction must be repeated c times. First of all, the rst cluster w1 is obtained as the solution of the following optimization problem: Maximize

Q(w1 );
subject to the constraints

w1i = 1; w1i  0: i=1 The members of the k (k  2)-th cluster wk need to be extracted from the samples that do not belong to the extracted clusters w1 ; 1 1 1 ; wk01 . So, the duplication between the clusters should be minimized in the optimization problem. wk is obtained by solving the following problem: Maximize Q(wk ) +
subject to the constraints

n X

dup(wt ; wk ); k 0 1 t=1 t

k01 X

(3)

wki = 1; wki  0; i=1 where dup(wt ; wk ) characterizes the duplication of two clusters wt and wk .
T dup(wt ; wk ) = wt wk ;

n X

(4)

The parameter i (> 0) controls the degree of duplication of the i-th cluster with the other clusters. Since we do not have an established method for setting i 's, these parameters have to be set empirically. It is assumed every i is proportional to Q(wi ) and reduced the number of parameters to one, that is

i = Q(wi );

(5)

where the parameter  represents the duplication ratio. Since the sum of the membership values of the samples in a cluster is the same over all classes, the membership values are high on the whole in the cluster of a small number of samples. On the contrary, in the cluster of a large number of samples, the membership values are low. Therefore, it is not appropriate to use the same threshold value over all clusters. So, a fuzzy cluster wk is defuzzi ed to a hard cluster Ck according to the following procedure. Initially, Ck is a null set. The samples are added to Ck one by one in the descending order of membership value wki . The adding process continues until n X X (6) wki >  wki ; i=1 i2Ck where  is the parameter that controls the boundary of the cluster.

4 Scale Parameter
It is very dicult to determine the optimal number of clusters although many heuristics have been proposed so far[10]. One reason for this diculty is that the optimal number that human feels changes according to the scale in which the human observes data. There are several methods where the scale to observe data is determined in advance instead of the number of clusters. These methods are called \scale-based clustering"[11]. Melting algorithm[12] and RBF-based clustering are contained in this category. SFCE is also a scale-based clustering method with scale parameter m. The e ect of the scale parameter m to membership values is illustrated in Fig. 2, where SFCE was applied to the 500 test points which has twodimensional normal distribution, and the dissimilarity of two samples was described as the Euclidean distance of two samples. When m is small, the membership values concentrate on the small set of samples. As m is large, the membership values are smoothed over all samples.

5 Cluster Extraction by Analytical Computations


In most clustering methods based on the optimization process, the objective function is not convex and the function has many local minima. So, the clustering result depends on the initial value, and to ensure the appropriate solution, repeated trials with diverse initial values are necessary.
Theorem 1

The optimization problems of SFCE (Eq. 2 and Eq. 3) can be solved analytically, if the following conditions are satis ed.

 the relation of two samples is described as positive similarity. 1  m = 2.


(proof) 1 Let eij be the similarity between i and j and m = 2 , then Eq. 2 is rewritten as: Maximize n n XX 1 eij (w1i w1j ) 2
i=1 j =1

i6=j

(7)

subject to the constraints

n X i=1

w1i = 1; w1i  0:

The following theorem is well known[13].


Theorem 2

H is an optimal solution of the following optimization problem: Maximize


n n XX i=1 j =1
subject to the constraint

The rst eigenvector (i.e. the eigenvector corresponding to the largest eigenvalue) of n 2 n matrix

hij z1i z1j ;

(8)

n X

i=1 Since the signs of hij are all positive, the signs of the elements of z1 are either all positive or all negative. Since 0z1 is also the rst eigenvector, we can choose the rst eigenvector which is all positive(z1i  0). Then, 2 the two optimization problems of Eq. 7 and Eq. 8 are equivalent when w1i = z1i and

2 z1i = 1:

e (i 6= j ) hij = 0ij (i = j ) :

(9)

Therefore, the optimal solution of Eq.7 can be obtained from the rst eigenvector of H . Similarly, the optimization problem of Eq. 3 to obtain k -th cluster can be solved by the rst eigenvector of the following matrix: 8e (i 6= j ) > ij < k01 X (10) hij = 0 1 > k 0 1 t wti (i = j ) : : t=1
""

When the relation of two samples is represented as dissimilarity, it should be converted to similarity to extract clusters analytically. Here, the conversion is performed as follows:

d eij = exp(0 ij ): 

(11)

To extract clusters analytically, the scale parameter is xed to 1/2. Instead, the parameter  can be used. When  is small, many small clusters are extracted. On the contrary, when  is large, a few large clusters are extracted.

6 Experiments
In this section, SFCE is applied to arti cial data which contain a number of noises and the robustness against noise of SFCE is compared with those of C-Means and Noise-Resistent C-Means(NR C-Means)[9]. NR C-Means is the algorithm that assigns the sample whose distance to the nearest cluster center is larger than a certain threshold  to the noise cluster. It is a slightly modi ed version of C-Means algorithm. The modi cation is as follows: First, the number of clusters is increased from c to c + 1. The c + 1-th cluster is called a \noise cluster". When the distances between the sample and the cluster centers are measured, the distances to 1; 1 1 1 ; c-th clusters are obtained as usual, but the distance to c + 1-th cluster is set to  no matter where the sample is. When all the distances to 1; 1 1 1 ; c-th clusters are more than  , the nearest cluster is the noise cluster. So, the isolated samples can be gathered into the noise cluster. We produced 500 clustered points and 300 noise points on the two dimensional region f(x; y)j0  x  1; 0  y  1g. The clustered points are placed at 7 positions using Neyman-Scott Process[1]. 1. Determine the number of points contained in each cluster. 2. Set the central position of each cluster so that the distance between every two centers is more than 0.2. 3. Produce points of each cluster so that the distances from the points to the center have the normal distribution with expectation 0 and variance 2 . We prepared ve kinds of testdata with = 0:02; 0:04; 0:06; 0:08; 0:1. The number of points of each cluster is determined randomly so that each cluster contains more than 5% of all points. The noise points are scattered over the region according to the uniform distribution. This kind of test data is frequently used for evaluating the robustness against noise[5, 9]. In C-Means and NR C-Means, the number of clusters was set to the true number of clusters in advance. The initial cluster centers were determined as follows: 1. The rst center is chosen as the nearest point to the arithmetic mean. 2. Other centers are randomly chosen among the points so that the distance between every two centers is more than 0.2. Ten trials were made and the most successful one which achieved the minimum value of the objective function is selected. In the evaluation of results, the defuzzi ed clusters Ck (k = 1; 1 1 1 ; c) are matched against the prepared clusters Bj (j = 1; 1 1 1 ; b). The cluster Ck is judged as correctly extracted, if

6.1 Experiments on Arti cial Data

 

There is a prepared cluster Bj in which the 90% of the non-noise points of Ck are contained. No other clusters Ci (i 6= k ) satisfy the rst condition.

In SFCE, the fuzzy clusters were defuzzi ed with  = 0:90. In C-Means and NR C-Means, each sample was assigned to the cluster which has the largest membership value. We call the ratio of the number of the correctly extracted clusters to the number of prepared ones \the extraction rate". In SFCE, the parameters were set as follows:  = 1000;  = 0:03. In NR C-Means, the value of  was set empirically at 0.12 as follows: the clustering was performed at  = 0:1; 0:11; :::; 0:2, and we chose the value that achieved the largest sum of the extraction rates at = 0:02; 0:04; 1 1 1 ; 0:1. The average extraction rate over 15 trials is shown in Fig. 3. As increases, the duplication between clusters increases. So, clustering becomes dicult and the extraction rate decreases. The performance of C-Means is much worse than the other two, because C-Means does not assume the existence of noise. When is large, SFCE performs better than NR C-Means. The reason is that NR C-Means discriminates a point as noise when the distance between the point and its nearest cluster center is more than  . NR C-Means can exclude the noises which are far from all clusters, but cannot exclude the noises between clusters. The example of clusters produced by SFCE is shown in Fig. 4 ( = 0:06). The mark "" shows the hard cluster defuzzi ed at  = 0:99. You can see that the noises are excluded and only high density areas are extracted. Since SFCE uses the quadratic objective function as in C-Means, the clusters are spherical. The cluster which is not spherical is extracted as the sum of spherical clusters. The variants of C-Means with modi cation to the distance measure can produce non-spherical clusters[14]. By these modi cations, SFCE will be able to extract non-spherical clusters. 5

6.2 Experiment on Line Extraction


In line extraction using clustering[4], an image is processed by the edge detector, and then a number of small line segments are tted to the edge. The dissimilarities between the line segments are determined so that the dissimilarity between two line segments aligned in line is small. Then, the cluster of line segments that form a line is extracted using a clustering method. Finally, a cluster of line segments is approximated by a longer line. SFCE is compared with NR C-Means in regard to the line extraction task. Here, the sample is not represented as a point in the feature space (i.e. objective data), and only the dissimilarities between the samples (i.e. relational data) are available. NR C-Means cannot be applied straightforwardly, because this method is based on the objective data. Therefore, the relational version of NR C-Means[15] is used here. The dissimilarity between the line segments is determined as follows (Fig. 5): exp(

s c +c d ) + exp( ) + exp( 1 2 ); wd ws 2w c

(12)

where d is the distance between the nearest end points, s is the angle between the line segments, c1 and c2 are the angles between the line which connects the centers and the line segments, and wd ; ws and wc are the weights. Here, the weights are set as follows: wd = 80(pixels), ws = wc = 9= . The image used for the experiment is the snapshot of a square paper placed on small pebbles(Fig.6(a)). The purpose of the experiment is to extract the four lines that comprise the boundary of the paper. In the line extraction tasks from natural scenes, such textured images are considered to occur frequently. Edge detection, binarization, thinning and line segment tting is performed on the image. The obtained line segments are shown in Fig. 6(b). The small line segments whose length is less than 3 pixels are omitted. The texture of pebbles produces the noisy line segments. Fig. 6(c) and (d) show the extracted clusters produced by SFCE and NR C-Means. Here, a cluster is approximated by a single line. The parameters of SFCE were  = 0:99;  = 1000;  = 0:6. In NR C-Means, the experiments were repeated with changing  from 1 to 10 at intervals of 1. At  = 4, the extracted lines tted to the boundary best. In NR C-Means, only two lines of the boundary were correctly extracted. The other two were wrongly extracted due to noisy line segments. On the other hand, SFCE succeeded to extract all the lines of the boundary, which suggests the e ectiveness of SFCE in image processing tasks.

7 Conclusion
In this paper, we proposed a new clustering method called \sequential fuzzy cluster extraction". Ordinary fuzzy clustering methods are easily a ected by noises due to the constraint that a sample must be assigned to one of the clusters. Our method sequentially extracts the subset of samples which are close to each other, and has the high robustness against noise. The application areas of SFCE include the image segmentation where noises are easily included, and the number of clusters cannot be predicted in advance.

References
[1] A. K. Jain and R. C. Dubes: \Algorithms for Clustering Data", Prentice Hall (1988). [2] N. Ueda and R. Nakano: \A competitive & selective learning method for designing vector quantizers", 1993 IEEE Int. Conf. Neural Netw., pp. 1444{1449 (1993). [3] M. M. Trivedi and J. C. Bezdek: \Low-level segmentation of aerial images with fuzzy clustering", IEEE Trans. Syst. Man Cybern., SMC-16, 4, pp. 589{598 (1986). [4] P. F. M. Nacken: \A metric for line segments", IEEE Patt. Anal. Mach. Intell., (1993).
15,

12, pp. 1312{1318

[5] J.-M. Jolion and A. Rosenfeld: \Cluster detection in background noise", Pattern Recognition, 22, 5, pp. 603{607 (1989). [6] P. J. Rousseeuw and A. M. Leroy: \Robust Regression and Outlier Detection", John Wiley & Sons (1987). [7] J. C. Bezdek: \A convergence theorem for the fuzzy isodata clustering algorithms", IEEE Trans. Patt. Anal. Mach. Intell., 2, 1, pp. 1{8 (1980). 6

hard

fuzzy

-2

-1.5

-1

-0.5

0.5

1.5

Figure 1: Comparing a fuzzy cluster with a hard cluster [8] C. T. Zahn: \Graph-theoretical methods for detecting and describing gestalt clusters", IEEE Trans. Computers, C-20, 1, pp. 68{86 (1971). [9] R. N. Dave: \Characterization and detection of noise in clustering", Pattern Recognition Letters, 12, 11, pp. 657{664 (1991). [10] G. W. Milligan and M. C. Cooper: \An examination of procedures for determining the number of clusters in a data set", Psychometrika, 50, pp. 159{179 (1985). [11] S. V. Chakravarthy and J. Ghosh: \Scale-based clustering using radial basis function network", IEEE ICNN'94, 2, pp. 897{902 (1994). [12] Y. fai Wong: \Clustering data by melting", Neural Computation, 5, 1, pp. 89{104 (1993). [13] R. Bellman: \Introduction to Matrix Analysis: Second Edition", McGraw-Hill (1970). [14] R. N. Dave: \Generalized fuzzy c-shells clustering and detection of circular and elliptical boundaries", Pattern Recognition, 25, 7, pp. 713{721 (1992). [15] R. J. Hathaway, J. C. Bezdek and J. W. Davenport: \On relational data versions of c-means algorithms", Pattern Recognition Letters, 17, 6, pp. 607{612 (1996).

0.01 0.009 0.008 Membership Value 0.007 0.006 -1 0.005 0.004 m=0.75 0.003 0.002 0.001 0 m=1 -2 -2 -1 1 0 Test Example 2 m=0.5 0 2 1

0.5

1 1.5 2 Distance from the Center

2.5

Figure 2: Membership value of the cluster extracted from the test example versus scale parameter m

1 Cluster Extraction Rate 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.02 0.04 0.06 0.08 Cluster Spread ()

SFCE

NR C-Means

C-Means

0.1

Figure 3: Cluster extraction rates of SFCE, Noise-Resistent C-Means and C-Means on noisy data 8

6
0.8

4 2

0.6

1
0.4

0.2

0 0 0.2 0.4 0.6 0.8 1

Figure 4: An example of cluster extraction

l1 c2 c1 d l2
Figure 5: Similarity between two line segments 9

(a) Original Image

(b) Line Segments Fitted to the Edge

(c) Lines Extracted by SFCE

(d) Lines Extracted NR C-Means( = 4)

Figure 6: Extracting straight lines from a textured image

10

Potrebbero piacerti anche