Sei sulla pagina 1di 6

A NEW APPROACH FOR DATA

CLASSIFICATION USING FUZZY LOGIC


Shweta Taneja1, Dr. Bhawna Suri2 1
Himanshu Narwal , Anchit Jain ,
2

Department of Computer Science Akshay Kathuria3, Sachin Gupta4


BPIT, GGSIPU Department of Computer Science
New Delhi, India BPIT, GGSIPU
1
shweta_taneja08@yahoo.co.in New Delhi, India
2 1
suri_bhawna@yahoo.com himanshunarwal@yahoo.co.in
2
anchitjain1994@gmail.com
3
akshay.kathuria@gmail.com
4
guptasachin579@gmail.com

Abstract—Data mining is a process of discovering useful patterns to define the membership functions for admission of a student.
from a large set of data. It is mostly used in large information Our algorithm can be used to perform fuzzy classification
processing applications. As we know, classification technique of where uncertainty or fuzziness can be resolved using
data mining classifies the data into a set of classes based on some membership functions.
attributes for further processing. We have developed a new
algorithm to handle the classification by using fuzzy rules on the This paper is organized as follows. In section 2, we give the
real world data set. Our proposed algorithm caters in handling description of the approaches that are currently in practice. The
admission of students to various universities by classifying them problems with these approaches and how the proposed method
into three clusters- admitted, rejected and those who probably overcomes them are highlighted. The proposed algorithm is
would get the admission. To handle the third cluster, fuzzy logic stated and explained in section 3. Section 4 gives the
based approach is appropriate. Our algorithm makes prediction implementation results by applying the proposed algorithm on
for getting admission on the basis of ranking and fuzzy rules student admissions data set. In the next section, comparison of
generated from the numerical data and gives output in linguistic the proposed algorithm with other existing algorithms is done.
terms. We have compared our algorithm with the state of art The last section concludes the paper.
algorithms- KNN, Fuzzy C- means etc. Our algorithm has proved
to be more efficient than others in terms of performance. II. RELATED WORK
Keywords- Fuzzy C-means (FCM) Algorithm, Fuzzy logic, The classification [3] technique is one of the data mining
Classification technique. techniques which classifies data into various classes. We have
used the concept of fuzzy logic to classify the uncertain or
ambiguous data.
I. INTRODUCTION
There are some predefined approaches to handle uncertainty.
Data Mining [1] is the process of extracting useful
information or knowledge from huge amount of data. This In [4], the author uses linguistic terms for database queries and
information is used to perform strategic operations in effective shows the advantages of using linguistic terms as well as the
decision making. Decision making generally needs different difference between classical and fuzzy approaches. But in case
and new methodologies in order to make sure that decision of many-valued logics, this is not sufficient. A new approach
made is accurate and valid. There are many problems that arise to fuzzy classification has been discussed by the authors in [5].
in data analysis. Thus, it is necessary to develop new They state that by using fuzzy discretization the results can be
techniques to handle the ever increasing data. represented in linguistic terms which is superior to other
To handle uncertainty in data, we have used the concept of classification techniques. However, it cannot be applied for
fuzzy logic along with classification technique of data mining. data mining in every case. Another technique presented in [6],
As suggested by Zadeh [2], fuzzy set theory deals with the author has suggested the use of Data Envelopment
ambiguity, uncertainty and seeks to overcome most of the Analysis for graduate admissions using GMAT scores and
problems generally found in classical set theory. We have GPA. The advantage of this technique is that it does not
taken a case study of admission of students to various require expert participation for obtaining membership
universities. According to classical set theory, a student will functions. In [11], author defines a new ranking function based
either be admitted or rejected. But, there is a probability that on on fuzzy logic but the method is untested on large datasets.
the basis of ranking, he or she might get an admission. Thus, to
handle such ambiguous situation, we have applied fuzzy logic

978-1-4673-8203-8/16/$31.00 2016
c IEEE 22
In this paper, we have tried to develop a better fuzzy data 3.1 DETAILED ALGORITHM
classification algorithm in terms of efficiency as well as
universal applicability. Our algorithm uses Fuzzy logic [2], as Input: Dataset D
data for admissions is not crisp and we need to analyze it to Output: Probability of admission in terms of quantifiers
get more accurate predictions [1] and obtain effective as well Method:
as efficient results, which is one of the main benefits. 1. Let’s suppose we have a dataset D, which is divided
into 3 datasets that are: training dataset (dt), Validation
The proposed algorithm helps both institutes/universities
dataset (dv)and test dataset (dtest)such that they follow
and students as it predicts the probability of admission in
following properties:
linguistic terms, which cannot be easily done by most other
methods mentioned above. There are certain assumptions to be dtˆdv = ‡ 2
made for application of the proposed algorithm. These are as
follows: dvˆ dtest = ‡ 3

1. Every student is given a rank in the entrance exam to be dtˆ dtest = ‡ 4


considered for admission. dt‰ dv‰ dtest = D 5
2. A student must get at least a passing grade in the higher 2. Fuzzy Rules are generated using association rules or
secondary standard exam. classification technique using WEKA tool.
3. The institution must consider the entrance rank as well as x Generated fuzzy rules are in if-else form.
the higher secondary marks to provide admission.
x Eg. If rank < 3000, then student is admitted.
III. PROPOSED ALGORITHM
The following Fig. 1. shows the proposed algorithm.
3. On the basis of the rules generated above, Acceptance
STEPS: region, Fuzzy region and Rejection region are
1. Division of dataset into training, validation & test set.
2. Generation of Fuzzy Rules using Association rules
and classification techniques in WEKA.
3. Identification of Acceptance, Fuzzy and Rejection
regions on the basis of rules.
4. For Fuzzy region, fuzzy c-means clustering is applied
and outliers are obtained which are used to calculate
the rank factors.
5. Quantifiers are applied on fuzzy region by using
following formula of APi (Admission Points). Fig. 3. Regions obtained from rules
APi = a*(Mi / Maxi ) +b*(RF)   On X- axis: Ranks

Where, On Y -axis: Marks

APi =Admission Points of ith student 4. For Fuzzy Region:

Mi = Marks obtained in 12th standard by ith student 4.1 Fuzzy C-Means (FCM) Algorithm is applied each
cluster is assigned a rank factor as shown in Table I.
Maxi = Maximum marks that could be scored in class
TABLE I. Rank Factors assigned.
12th
CLUSTER Rank Factor (RF)
RF = Rank Factor
Cluster 1 RF1
a, b = Constants that can vary as they depend on the
Cluster 2 RF2
institute giving weightage to marks or rank.
. .
. .
Fig. 1. Proposed algorithm

The detailed version of algorithm is given in Fig. 2. Cluster n RFn

2016 6th International Conference - Cloud System and Big Data Engineering (Confluence) 23
Where,
n = nth cluster
IV. EXPERIMENTS CONDUCTED AND RESULTS OBTAINED
RFn = Rank factor for nth cluster
In this section, we illustrate the working of our
4.2 Apply quantifiers on fuzzy region by using the algorithm for an under-graduate course seeking for admission
formulae of APi (Admission Point) given in equation 1. to a university. Under-graduate admission decisions are
usually taken by an admission committee which considers
4.3 For General Case,
several factors including students’ higher secondary marks and
NOR = (APi (max) – APi (min))/5 6 the rank achieved by student in the entrance exam conducted
by the university.
Where,
4.1 DATASET
NOR = Number of Regions
We have used a dataset for under-graduate course
APi (max) = maximum value of APi
admission in a university[11]. The under-graduate admissions
APi (min) = minimum value of APi committee uses a scoring function mentioned in equation (1)
to decide whether to admit the candidates or not. Candidates
5. On the basis of the values calculated above, following with a rank of less than or equal to 7034 are checked by the
quantifiers are assigned to APi values as shown in Table II. admissions committee one-by-one and in case of additional
TABLE II. Quantifiers assigned to range assigned. supporting evidence (high higher secondary marks) a
candidate is accepted, otherwise the candidate is rejected. The
APi Quantifier admission data contains attributes like student ID, Gender,
[0] None Marks, Rank and Admit. A subset of dataset having 20 tuples
is shown in Table III.
[ 0.325, 0.442 ) Very Low
Table III. Dataset used
[ 0.442, 0.559 ) Low
ID Gender Marks Rank Admit
[ 0.559, 0.676 ) Moderate 1 Male 67 2618 Yes
[ 0.676, 0.793 ) High 2 Male 69 5346 No
3 Female 98 5053 Yes
[ 0.793, 0.91 ] Very High
4 Female 73 6088 No
[2] All 5 Male 98 6655 No
6 Male 96 1185 Yes
7 Female 64 9290 No
The probability of a student getting admission is shown in
Fig. 4.
8 Female 50 1182 Yes
9 Male 95 7253 No
10 Male 70 4841 Yes
11 Female 59 7331 No
12 Female 100 4807 Yes
13 Male 79 7983 No
14 Male 85 490 Yes
15 Female 76 5527 No
16 Female 78 8948 No
17 Male 61 4032 No
18 Male 99 9327 No
19 Female 100 2956 Yes
20 Female 60 1087 Yes

4.2 WEKA TOOL


We have used Weka tool [7]. The Weka3.6 is the latest stable
Fig. 4. Probability of getting Admission. version of Weka used nowadays. It contains algorithms for

Fig. 2. Detailed Algorithm

24 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)
implementation of different data mining techniques like between the cluster center and the data points. The more the
classification, prediction, clustering, association rules etc. data is near to any specific cluster center, more is
its membership towards that particular cluster center. Clearly,
result obtained on adding membership of each data point
should be equal to one. After each successful iteration
membership and cluster centers are updated according to the
4.3 RESULTS OBTAINED formula given below:
The admission dataset is loaded to weka tool. By using Aprori ʹ
ሺെͳሻ
algorithm [8], following association rules are generated. Ɋ୧୨ ൌ ͳȀ σ…ൌͳሺ†‹Œ Ȁ†‹ ሻ 7
if Rank > 7034 Admitted=No ౣ
σ౤
౟సభ൫ஜ౟ౠ ൯ ୶౟
if Rank < 3499 Admitted=Yes ɓ୨ ൌ ౣ ǡ ‫׊‬୨ ൌ ͳǡʹǡ ǥ Ǥ … 8
σ౤
౟సభ൫ஜ౟ౠ ൯
According to rules, obtained data is divided into 3 regions
which are shown in Fig. 5. and fig. 6. below. Where,
Red: Rejected 'n' is the number of data points.
Green: Fuzzy
Blue: Accepted ‘ɓ୨’ is jth cluster center.
'm' is the fuzziness index m € [1, ∞].
‘c' is the number of cluster center.
‘Ɋ୧୨ ’ is the membership of ith data to jth cluster center
‘dij' is the Euclidean distance between ith data and jth
cluster center
The main objective of fuzzy c-means algorithm is to
minimize:

ሺǡ ሻ ൌ  σ୬୧ୀଵ σୡ୨ୀଵሺɊ୧୨ ሻ୫ ฮš୧ െ ˜୨ ฮ 9
Where,'||xi – vj||' is the Euclidean distance between ith and jth
Fig. 5. The regions obtained shown in 2-D cluster center

Red: Rejected
Red: Rejected Green: Fuzzy
Green: Fuzzy Blue: Selected
Blue: Accepted Black: Centers

Fig. 7. FCM centers

From FCM we got 5 centers shown as black dots in Fig.7.,


thus using them further in our algorithm. After calculating the
Fig. 6. The regions obtained shown in 3-D centers of clusters and distance of each data point from it the
following graph shown in Fig.8. is obtained.
After division into regions, the fuzzy region is operated
upon to remove the uncertainty. FCM [9] is applied to it and
clusters are made to assign different RF values to be used
later. FCM is executed by assigning membership to each data
point corresponding to each cluster center on the basis of gap

2016 6th International Conference - Cloud System and Big Data Engineering (Confluence) 25
For general case, the range of APi can be calculated as
displayed in the Fig. 9.

Fig. 8. Clusters with different RF values.

From Fig.8. the range of ranks for the rank factor (RF) is
calculated as shown in Table IV below.
Table IV. Assignment of Rank Factor

Rank Range Rank Factor


< 3470 1
3470 <= Rank < 4119 0.83
4119 <= Rank < 4813 0.66
4813 <= Rank < 5585 0.49
5585 <= Rank < 6364 0.32
6364 <= Rank < 6920 0.15
6920 <= Rank 0

Fig.9. Quantifiers for general case


After obtaining Rank Factors from FCM we calculate the APi
by using formula (1) and NOR is calculated using formula (6).
For our data the ranges of APi obtained, to be resulted out in a V. COMPARISION OF PROPOSED ALGORITHM WITH OTHER
linguistic manner, are shown in Table V below. CLASSIFICATION ALGORITHMS
Table V. Quantifiers for Data
We have compared our proposed algorithm with KNN
APi Quantifier classifier [10] and Fuzzy C-Means algorithm. This is done on
two datasets: Admissions and UCI University. Table VI shows
[0] None
the accuracy obtained.
[ 0.325, 0.442 ) Very Low Table VI. Comparison based on accuracy.

[ 0.442, 0.559 ) Low Data Set Crisp KNN FCM Proposed


[ 0.559, 0.676 ) Moderate Admission 82.59% 85.63% 92.47%
Data Set
[ 0.676, 0.793 ) High UCI 80.32% 83.36% 88.72%
University
[ 0.793, 0.91 ] Very High
Data Set
[ 2] All
Our proposed algorithm obtains the best accuracy amongst the
other two.

26 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)
VI. CONCLUSION AND FUTURE DIRECTIONS [10] J.M. Keller, M.R. Gray, and J.A. Givens, "A fuzzy K-nearest
neighbor algorithm," in Systems, Man and Cybernetics, vol. SMC-
Data mining is an intelligent technique to extract knowledge 15, no.4, ISSN 0018-9472, 1985,pp. 580-585.
from a large set of data. We have proposed a new [11] Y. Gupta, A. Saini and A.K. Saxena, “A new fuzzy logic based
classification algorithm using the concept of fuzzy logic. The ranking function for efficient information retrieval system,” in
algorithm deals with the admission of students to various Expert Systems with Applications, vol.42, Issue 3, ISSN 0957-
4174,2015, pp. 1223-1234.
universities. As we know, the possible options could be the
[12] https://archive.ics.uci.edu/ml/datasets/University.
student will get the admission, will not get the admission or
there is a probability that he/she might be admitted or rejected.
We have considered the third case where the probability of
getting admission is to be handled. In order to do so we have
defined the degree of membership to the respective cluster
using quantifiers. This is done by using ranking and fuzzy
rules concept.
The efficiency of our proposed algorithm is proved to be
better by comparing it with standard algorithms– KNN, Fuzzy
C- Means.
We would like to extend our algorithm so that it could be
applied on other applications where fuzzy logic can be applied
to solve real world problems like signal processing. Moreover,
the proposed algorithm can be applied on large datasets and its
performance can be compared in terms of execution time as
well as accuracy with other algorithms of similar nature in the
near future.
REFERENCES
[1] J. Han, M. Kamber., and J. Pei, “Data mining concepts and
techniques,” 3rd ed., ISSN 1238-1489, Morgan Kaufmann
Publishers, 2012, pp. 285-370.
[2] LA. Zadeh, "Fuzzy sets," in Information and Control, vol. 8, Issue
3, ISSN 0019-9958, 1965, pp. 338-353.
[3] P. Kromer, J. Platos, V. Snasel, and A. Abraham, "Fuzzy
classification by evolutionary algorithms," in Systems, Man, and
Cybernetics (SMC), 2011 IEEE International Conference, ISSN
1062-922X, 2011, pp. 313-318.
[4] M. Hudec and M. Vujošević, "Integration of data selection and
classification by fuzzy logic," in Expert Systems with applications,
vol. 39, Issue 10, ISSN 0957-4174,2012, pp. 8817-8823.
[5] R.G. Mehta, D.P. Rana, and M.A. Zaveri, "A novel fuzzy based
classification for data mining using fuzzy discretization," in
Computer Science and Information Engineering, 2009 WRI World
Congress, vol. 3, ISSN 2250–3676, 2009, pp. 713-717.
[6] P. Pendharkar, “Fuzzy classification using the data envelopment
analysis,” in Knowledge-Based Systems, vol. 31, ISSN 0950-
7051, 2012, pp. 183-192.
"
[7] I.H. Witten and E. Frank, “Data mining: Practical machine
learning tools and techniques- tutorial exercises for the weka
explorer,” vol. 33, ISSN 0808-9035, Morgan Kaufmann
Publishers, 2011, pp. 559-575.
[8] S. Chai, J. Yang, and Y. Cheng, "The research of improved apriori
algorithm for mining association rules," in Service Systems and
Service Management, 2007 International Conference, ISSN 4244-
0882,2007, pp. 1-4.
[9] J.C. Bezdek, R. Ehrlich, and W. Full, “FCM: The fuzzy c-means
clustering algorithm,” in Computers & Geosciences, vol. 10, Issue
2, 1984, ISSN 0098-3004, pp. 191-203.

2016 6th International Conference - Cloud System and Big Data Engineering (Confluence) 27

Potrebbero piacerti anche