Sei sulla pagina 1di 6

Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005

FUZZY SUPPORT VECTOR MACHINES BASED ON FCM CLUSTERING


SHENG-WU XIONG, HONG-BING LIU, XIAO-XIAO NIU School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China E-MAIL: xiongsw@mail.whut.edu.cn , liuhbing@sohu.com , archernxx@hotmail.com

Abstract:
Fuzzy support vector machines based on fuzzy c-means clustering are proposed in this paper. They apply the fuzzy c-means clustering technique to each class of the training set. During the clustering with a suitable fuzziness parameter q, the more important samples, such as support vectors, become the cluster centers respectively. All the cluster centers generated by fuzzy c-means clustering are selected as the representations of the other similar samples close to the cluster centers. The new training set consisting of all the centers is used to form fuzzy support vector machines. Experimental results on the benchmark data sets show that the proposed fuzzy support vector machines need less training data and less quadratic programming time compared with the conventional fuzzy support vector machines, and their classification accuracy rates are acceptable.

Keywords:
Support vector machines; fuzzy support vector machines; fuzzy c-means clustering; membership functions

1.

Introduction

Support vector machines (SVMs) are a new powerful machine learning method for both regression and classification problems. They embody the structural risk minimization principles, can solve the overfitting problem effectively and apply to a number of problems due to their higher generalization ability and better classification precision than the conventional methods [1]. SVMs become the new research focus following pattern recognition and neural network, and promote the significant development of these theory and technology. At present, SVMs already have been applied to problems ranging from hand-written character recognition, face recognition, speech recognition to medical diagnosis [1,2,3]. When SVMs are extended to multi-class problems, unclassifiable regions exist. To solve this problem Abe and Inoue proposed fuzzy support vector machines (FSVMs) for one-against-all classification [4]. They used the decision functions obtained by training the SVMs for pair of classes, for each class they defined a truncated polyhedral

pyramidal membership function. Equally treating every data point may cause overfitting in SVMs. Han-Pang Huang and Yi-Hung Liu presented the other FSVMs [5]. In training process, they assigned a higher penalty to errors by using the membership degree of data to their own classes, and reduced the number of misclassification data points. The overfitting problem was solved by ODM (Outliers Detection Method). However, all FSVMs must solve exactly the constrained optimization problem, which is difficult for the large number of training data, such as hand-written digits recognition [6]. In this paper, we propose the new FSVMs based on fuzzy c-means (FCM) clustering. FCM clustering reduces the number of training data and the time elapsed in the constrained optimization problem of the larger scale training set. It doesnt pose to information lost of the training data largely. If the performance of FSVMs with the new method is acceptable, the proposed FSVMs are regard as feasible ones. In section 2, we explain two-class FSVMs in [4,5], and in section 3, we present the new FSVMs based on FCM clustering. In section 4, we verify our proposed FSVMs on the two typical data sets, and we draw the conclusions in section 5. 2. Fuzzy support vector machines

The same as SVMs, FSVMs are linear classifiers based on statistical learning theory and the idea of the maximum margin hyperplane. The different aspect is that FSVMs introduce the membership degree ui to their own class for every training data, so as to assign a higher penalty to errors and in order to reduce the number of misclassification data points. For linearly separable data, the maximum margin hyperplane is defined by w and b , which are determined by the constrained optimization problem l 1 l (1) min W ( ) = i j yi y j xiT x j i 2 i, j =1 i =1

0-7803-9091-1/05/$20.00 2005 IEEE 2608

Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005
l

s.t.

y
i i =1

=0

(2)

3. 3. 1.

FSVMs based on FCM clustering FCM clustering algorithm

(3) where the scalar C controls the complexity of FSVMs and determines the trade-off between the maximization of the margin and minimization of the classification error. The decision function of the two-class problem is (4) g ( x) = sign{ i yi xiT x + b}
0 i i C
xi SVs

i = 1, 2,..., l

What makes SVMs and FSVMs particularly powerful is that the generalization to arbitrary kernel function K ( x, y ) . It obeys Mercers theory, and implicitly maps the training data into a higher dimensional space called feature space in which FSVMs may have the better performance. The hyperplane is determined by the constrained optimization problem. l 1 l (5) min W ( ) = i j yi y j K ( xi , x j ) i 2 i, j =1 i =1
s.t.

y
i i =1

=0

(6)

0 i i C

i = 1, 2,..., l

(7) (8)

FCM clustering was originally introduced by Jim Bezdek in 1981 [8] as an improvement on earlier clustering methods. It is a data clustering technique wherein each data point belongs to a cluster to some degree that is specified by a membership grade. It provides a method that shows how to group data points that populate some multidimensional space into a specific number of different clusters. FCM clustering starts with an initial guess for the cluster centers, which intends to mark the mean location of each cluster. The initial guess for these cluster centers is most likely incorrect. Additionally, FCM assigns every data point a membership grade for each cluster. By iteratively updating the cluster centers and the membership grades for each data point, FCM iteratively moves the cluster centers to the "right" location within a data set. This iteration is based on minimizing an objective function that represents the distance from any given data point to a cluster center weighted by that data point's membership grade. Namely
q min J (U ,V ) = uij d ( x j , vi ) i =1 j =1 c n

The decision function is

g ( x) = sign{ i yi K ( xi , x) + b}
xi SVs

(13) (14)

For two-class problem, we calculate the decision function (4) or (8), and classify x into class g ( x) . For multi-class problem, one-against-one FSVMs are n(n 1) adopted in this paper. There are two-class 2 FSVMs for n-class problem and their decision functions are (9) gij ( x) = s ign( fij ( x)) where

s.t.

u
i =1

ij

= 1 , 0 uij 1

where uij denotes the degree to which the element x j belongs to the ith cluster, d ( x j , Vi ) denotes the distance between point x j and the cluster center Vi , q>1 is the fuzziness parameter, and controls the speed and achievement of clustering. And c denotes the number of clusters. Algorithm1.Suppose data set X is one class of the training set, and contains n m -dimensional data. Step1: estimate the cluster centers. Step11: select the number of cluster c , the maximal iterative count N max fuzziness parameter q , randomly generate the distance matrix M = ( mij )
c n

fij ( x) =

xi SVs

y K ( x , x) + b
i i i

(10)

There are many decision strategies, such as one-against-all method in [1], DDAGSVMs [7]. In the decision phases in [4], the membership function of x to classi is defined as
mi ( x) = min 1, min

and an unknown data x is classified into the class arg min mi ( x)


i =1,..., n

j =1,..., n , j i

f ij ( x )

(11)
(12)

which represents the distance between data x j and the cluster centers Vi . Step12:unify M and obtain the unitary matrix F mij f ij = c mkj
k =1

Step13: fuzzify F by fuzziness parameter q , obtain

2609

Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005 the fuzzy matrix U (0) uij (0) = f ijq Step14: defuzzify the fuzzy matrix U (0) , then the cluster centers are obtained 3.2. FSVMs based on FCM clustering

In order to speed up the optimization problem of FSVMs with the larger scale training set, we try to find n another method, which reduces the size of the training set, uij (0) X jk but hardly influences on the performance of FSVMs. FCM j =1 Vi k (0) = n clustering technique, as mentioned above, is one of the effective methods which can get rid of the less important uij (0) j =1 data and select the cluster centers as the representations of where X jk represents the kth scalar of jth sample. the entire training data. In fact, the easily misclassifying data points are dispersive and mainly lie in the exterior of Vi (0) = (Vi1 (0),Vi 2 (0),..., Vim1 (0)) i = 1, 2,..., c are the class, which they belong to. Namely, during clustering, these data points become single clusters respectively. If one center vectors with the same dimension as X jk . cluster center is classified exactly, the entire data in the Step2: calculate the distance between data X j and each cluster will be classified exactly too. Algorithm2 FSVMs on the training set including cluster center Vi (0) the distance matrix is D(0) class1 and class2. Dij (0) = V (0)i X j where i denotes Euclid distance Step1: generate the new training set N trset using FCM Step3: calculate objective function clustering c n Step2: calculate the centers of two classes in N trset obj (0) = D (0)u q (0)

i =1 j =1

ij

ij

Step4: for t = 1, 2,..., N max Step41: fuzzify the distance matrix D(t 1) by q and obtain the fuzzy degree uij (t ) to which the element X j belongs to the ith cluster. U (t ) is the fuzzy matrixwhere uij (t ) = 1

x1* = x1 =

1 l1 1 l2 x1i , x2* = x2 = x2i l1 i =1 l2 i =1

Step3: calculate the membership degree to which the element x j belongs to the more important samples in the new class in N trset

( D (t 1) )
ij

2 q1

Step42: calculate the new cluster centers


Vi k (t ) =

1 j =

x1 j x1* max x1 j x1*


1 j l1

, 2 j =

x2 j x2*
1 j l2

max x2 j x2*

u
j =1 n

ij

(t ) X jk
ij

Step4: form FSVMs using fuzzy set S f = S1 f S 2 f , where

u
j =1

(t )

S1f ={(x1j , y1j , 1j )| x1j Rm, y1j =+1, 1j [0,1], j =1,2,...,l1}


Vi (t )

Step43: calculate the distance Dij (t ) between data


Xj

and

every

new

center

S2 f ={(x2 j , y2 j , 2 j )| x2 j Rm, y2 j =1, 2 j [0,1], j =1,2,...,l2}


x1 j is the jth training data in class1 and y1 j is its label, x2 j is the jth training data in class2 and y2 j is its label.
In step 4, the elements in S1 f and S 2 f are ones in the new training set.

Dij (t ) = Vi (t ) X j

Step44: calculate the objective function


q obj (t ) = Dij (t )u ij (t ) . i =1 j =1 c n

Step45: when

obj (t ) obj (t 1) <

or t = N max ,

stop iteration and return the cluster centers Vi (t ) . The fuzziness in step13 and step41 are different. The aims of fuzziness are to obtain the initial centers in step13 and to update the centers in step41 [9].

3.3.

Influence of parameter q on the proposed FSVMs

The aim of the proposed method is to generate the sparse cluster centers so that the data near to the hyperplane become the cluster center respectively. Formula (13) represents the dispersion measure of the cluster centers [9]. In the case of the same clustering number c, when the

2610

Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005 fuzziness parameter q increases, the value of the objective function (13) reduces, and the cluster centers have smaller dispersion. Namely, the smaller the q, the more dispersive of the cluster centers. Conversely, the larger the q, the more concentrated of the cluster centers. So FCM clustering in FSVMs makes a difference from the general methods in fuzzy pattern recognition. In order to make easily misclassifying data points become centers, all the cluster centers should be as sparse as possible by selecting suitable fuzziness parameter q . Such as Fig.1 reach 100%. Certainly, we select the faster FSVMs on smaller training set. To verify the influence of parameter q , the hyperplane of the proposed FSVMs are formed with a larger q = 3 . Figure (b) in Fig.2 indicates that the performance is worse comparatively.

Figure (a)

Figure1. The results of 100 two-dimensional normal data points when mean vector u = [1,1] and covariance
1 0.3 matrix = with 70 clusters and q = 1.2 . The 0.3 1 solid points denote the cluster centers, and the others denote the original points.

In order to reflect the proposed FSVMs clearly, we discuss two-class problems in 2-D space. In Fig.2 (a), 100 positive and 50 negative data points are generated using two normal distributions with mean vectors u1 = [1,1] ,
u2 = [4, 4]

Figure (b) Figure2. Comparison of the proposed FSVMs and FSVMs for different q. 4. Experiments

and covariance matrices

1 0.3 1 = 0.3 1

0.3 1 2 = . Obviously, they are linearly separable. 0.3 1 The two hyperplanes when C=5000 are showed in Fig.2 (a), as one is based on FSVMs and the other is based on the proposed FSVMs which assign 70 and 35 cluster centers to the positive and negative classes respectively where q = 1.2 . They are identical, and all their training accuracies

To verify the performance of FSVMs proposed in the paper, we evaluate our method using the typical classification problems. Two experiments are performed on Intel P PC with 2.4GHZ CPU and 256MB memory running Matlab6.5. 4.2. Experiment 1 The two-spiral problem is a typical two-class

2611

Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005 classification problem, whose goal is to separate two highly interrelated spirals. It is a difficult classification task and often serves as a test-stone for the performance of the proposed FSVMs versus the conventional FSVMs. The two types FSVMs are verified through RBF kernel. To verify the feasibility of the proposed method effectively, the two-spiral problem with noise is chosen as classification task. The noise data satisfy normal distribution (0, 0.01) in the directions of axis X and Y. In order to discuss the influence of the fuzziness parameter q on FSVMs, the parameters =0.05 and C=5000 which make FSVMs do the best performance are selected. Fig.3 shows the entire training data and its hyperplane. The training precision is 100%. Fig.4 and Fig.5 show the 120 cluster centers when q=1.2 and q=2 and their boundaries, both the training precision on the entire training set are 100%.

Figure5. Boundaries with RBF0.05 kernel and 120 cluster centers when q=2 4.3. Experiment 2

Figure3. Boundaries with RBF0.05 kernel

Figure4. Boundaries with RBF0.05 kernel and 120 cluster centers when q=1.2

The larger scale data set-optdigits (Optical Recognition of Handwritten Digits) in machines learning databases [6], is a multi-class problem. All the 64 -dimensional data, including 3823 training data and 1797 testing data, distribute across 10 classes. The integration of FSVMs in [4] and FSVMs in [5] is adopted in the experiment. There are three steps in the experiments. Firstly, all the classes in the training set are assigned the same cluster number, and the new training set is extracted by FCM clustering. Secondly, FSVMs based on FCM clustering are formed on the new training set, and use the strategy [4] in decision phase. Thirdly, the cluster number and fuzziness parameter q are modify in order to find the trade-off between the running speed and the recognition rates. During the training process, 45 proposed FSVMs and 45 conventional FSVMs are formed and run in the same operating system. In table 2, #cls represents the number of clusters, Tr(%) and Ts(%) denote the training and testing accuracy rate respectively, and RT represents the total run time of the constrained quadratic programming in pairwise FSVMs in training. From the table, the proposed FSVMs perform better than FSVMs. For the optdigit data, the rates of the proposed FSVMs are better or equal to FSVMs, and the optimization total time reduces. The results illuminate the proposed FSVMs have the higher generalization ability comparing with the conventional ones. The overfitting emerges in the conventional FSVMs. The overfitting problem is overcome through FCM clustering. It illustrates that not only the noises and outliers but also the similar

2612

Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005 training data can lead to the overfitting problem. Table1. Comparison the proposed FSVMs with FSVMs on optdigit for different kernel Dot Classifiers #clr Tr(%) Ts(%) RT(s) 1000 99.058 95.938 1.639 1500 99.215 95.715 2.564 FCM+FSVMs 2000 99.32 95.826 4.911 q=1.1 2500 99.608 96.55 7.033 3000 99.738 96.661 9.717 100 96.661 15.456 FSVMs Classifiers #clr 1000 1500 2000 2500 3000 Poly2 Tr(%) 99.503 99.529 99.791 99.895 99.948 100 Ts(%) 96.494 97.051 96.995 97.496 97.44 97.44 Poly4 Ts(%) 97.385 97.329 97.663 97.551 97.83 97.718 Poly6 Ts(%) 97.106 97.106 97.496 97.997 97.551 97.718 RT(s) 0.951 1.908 4.359 6.848 10.234 16.598 machines learning databases indicate that the proposed FSVMs have the acceptable accuracy rates and their runtime of quadratic programming reduce. Although it will take time to deal with FCM clustering of large number of training data, many techniques such as parallel technique can be adopted to avoid this kind of problem. Many other clustering algorithms, such as hierarchical clustering and linearly clustering, can be used by SVMs. They require further research works. Additionally, the constrained optimization problem of SVMs can be solved by genetic algorithm in theory, but it will increase the time spending. References [1] V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, NY, 1998. [2] G. D. Guo, S. Z. Li, and K. L. Chan, Face recognition by support vector machines, Proceedings of International Conference on Automatic Face and Gesture Recognition, Grenoble, France, pp. 196201,2000. [3] G. D. Guo, S. Z. Li, Content-Based Audio Classification and Retrieval by Support Vector Machines, IEEE Transactions on Neural Network, Vol. 14, NO. 1, pp.209-215, January, 2003. [4] T. Inoue, S. Abe, Fuzzy Support Vector Machines for Pattern Classification, Proceedings of International Joint Conference on Neural Networks, Washington, DC, Vol.2, pp.14491454, July 2001. [5] Han-Pang Huang and Yi-Hung Liu, Fuzzy Support Vector Machines for Pattern Recognition and Data Mining, International Journal of Fuzzy Systems, Vol. 4, No. 3, pp.826-835, September 2002. [6] ftp://ftp.ics.uci.edu/pub/machine-learning-databases. [7] B. Kijsirikul, and N. Ussivakul, Multiclass Support Vector Machines Using Adaptive Directed Acyclic Graph, Proceedings of International Joint Conference on Neural Networks, Honolulu, Hawaii, pp.980985, 2002. [8] Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981. [9] Sergios Theodoridis, Konstantinos Koutroumbas, Pattern Recognition, Second Edition, Publishing House of Electronics Industry, Beijing, 2004.

FCM+FSVMs q=1.1 FSVMs Classifiers FCM+FSVMs q=1.1 FSVMs Classifiers FCM+FSVMs q=1.1 FSVMs 5. Conclusions

#clr 1000 1500 2000 2500 3000

Tr(%) 99.608 99.765 99.974 99.948 100 100 Tr(%) 99.294 99.555 99.686 99.817 99.948 100

RT(s) 1.152 2.551 5.828 8.965 13.199 22.001 RT(s) 1.268 2.842 6.436 10.405 15.78 25.985

#clr 1000 1500 2000 2500 3000

The proposed FSVMs based on FCM clustering are proposed in this paper. They select the cluster centers as the representation of the whole data in their clusters. The training data extracted from the entire training set are used to form the proposed FSVMs. The superiority of the proposed FSVMs is demonstrated by two typical pattern recognition problems. The results on the benchmark data of

2613

Potrebbero piacerti anche