Sei sulla pagina 1di 6

IPASJ International Journal of Computer Science (IIJCS)

Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm


A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 6, Issue 9, September 2018 ISSN 2321-5992

A STUDY ON STUDENT'S ACADEMIC PERFORMANCE ANALYSIS


USING CLASSIFICATION AND PREDICTION TECHNIQUES USING
DATA MINING TECHNIQUES IN ARAKKONAM HIGHER
SECONDARY SCHOOL
B.J.MYTHILI1, Dr. N .R. ANANTHA NARAYANAN2
Research Scholar1, Department of Computer Science and Applications

Associate Professor2, Department of CSA,


Sri Chandrasekharendra Saraswathi Viswa Maha Vidyala, Kanchipuram, India

ABSTRACT
In recent years, the analysis and evaluation of students performance and holding the quality of education could be
a important drawback altogether the educational institutions. The most essential objective of the paper is to analyze
and evaluate the school students‟ performance by applying data mining classification algorithms" in weka tool. The
information of mining tool has been generally accepted as a choice making tool to facilitate better resource utilization
in terms of students performance. The varied classification algorithms may be specifically mentioned as J48,
Random Forest, Native bayes ,Decision table and multidimensional language must be used. The results of such
classification model deals with accuracy level, confusion matrices and conjointly the execution time. So conclusion may
be reached that the Random Forest performance is healthier than that of various algorithms.
Keywords: Data Mining, classification Rule Based, Random forest, J48, Knowledge Discovery

I. INTRODUCTION
Educational Data Mining is an developing discipline, concerned with developing methods for exploring the unique
types of data that come from educational settings and the which they learn in. Whether educational data is taken from
students use of interactive learning environments, computer-supported collaborative learning, or administrative data
from schools and universities, it often has multiple levels of significant hierarchy, which is often needed to be
determined by properties in the data itself, rather than in advance. Issues of time, sequence, and context also play
important roles in the study of educational data. This research is a step to analyze the factors affecting the academic
performance of students using the data mining technique classification in order to evaluate the current performance and
take efficient steps to enhance the quality of education. Every year a huge number of students from school, with respect
to the data collected from the performance of students, classification a data mining technique is applied to it. It is a step
to analyze the factors affecting the academic performance of students in order to evaluate the current student
performance and take efficient steps in the prediction of the most likely occurring relationships between the various
aspects of learning and to enhance the quality of education in future and help the educational planners to plan
accordingly.

II.REVIEW OF LITERATURE
Brijiesh Kumar Bhardwa[2011] ,"Mining Educational Data to Analyze Students Performance" The main objective of
higher education institutions is to provide quality education to its students. In this research, the classification task is
used to evaluate student’s performance and as there are many approaches that are used for data classification, the
decision tree method is used here. By this task knowledge is extracted that describes student’s performance in end
semester examination. It helps earlier in identifying the dropouts and students who need special attention and allow the
teacher to provide appropriate advising/counseling.
Shruthi P, Chaitra B[2013],"Student Performance Prediction in Education Sector Using Data Mining " In this study
of work, the performance of the students is predicted using the behaviours and results of previous passed out students
stored in the database and by using the behaviours of the present students. Many classification algorithms exist like
Decision Tree,Neural Networks, K-Nearest Neighbour, Naïve Bayes etc. In this paper Naïve Bayes classification
algorithm is used as it has highest accuracy compared to other classification algorithms.

Volume 6, Issue 9, September 2018 Page 1


IPASJ International Journal of Computer Science (IIJCS)
Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm
A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 6, Issue 9, September 2018 ISSN 2321-5992

Dorina Kabakchieva[2013], "Predicting Student Performance by Using Data Mining Methods for Classification"
This paper is focused on the implementation of data mining techniques and methods for acquiring new knowledge from
data collected by universities. The main goal of the research is to reveal the high potential of data mining applications
for university management. The proposed research work is to find out if there are any patterns in the available data that
could be useful for predicting students performance at the university based on their personal and pre-university
characteristics.
Abeer Badr El Din Ahmed, Ibrahim Sayed Elaraby[2014] "A prediction for Student's Performance Using
Classification Method" Currently the amount huge of data stored in educational database these database contain the
useful information for predict of students performance. The most useful data mining techniques in educational database
is classification. In this paper, the classification task is used to predict the final grade of students and as there are many
approaches that are used for data classification, the decision tree (ID3) method is used here.

Suchita Borkar, K. Rajeswari[2013],"Predicting Students Academic PerUsing Education Data Mining" The past
several decades have witnessed a rapid growth in the use of data and knowledge mining as a tool by which academic
institutions extract useful unknown information in the student result repositories in order to improve students learning
processes .The main objective of this paper is prediction of student’s performance in university result on the basis of
their performance in Unit test, assignment, graduation percentage and attendance.

III PROPOSED WORK


In this section, Secondary school student data refers to data that was collected by someone other than the
user. Common sources of student data from based on marks information collected by government departments,
organizational records and data that was originally collected for other research purposes.This section will present the
proposed framework in producing a model prediction by using selected classification techniques. The framework shows
the steps involved in developing models to predict Student Analysis Processing . illustrates the three main stages
involved in this study; Data Collection and Integration, and Pattern extraction.

IV EXPERIMENTS AND RESULTS


The accuracy worth obtained shows however smart the extraction model will predict a replacement of data. Two data
of information gain measure used were percentages and fold cross validation. For the odds ,
the coaching information set is 100%, whereas the testing information set is ninetieth of the full information. In fold
cross validation, the info were divided into three, 5, or 10 subset. supported the experimentation, metallic
element shows the best accuracy worth of 92.1% percentages take a look at possibility compared the
opposite techniques. NaiveBayes and Random Forest shows the best accuracy worth of 99.7% in percentages take a
look at possibility. Whereas the displays the simplest accuracy worth of share take a look at possibility. From
the 3 hand-picked algorithms for the experiment, the model prediction extracted by Random forest displays the
best accuracy worth. The confusion matrix table is then constructed that contains the data regarding actual
and therforetold classifications. The matrix shows the prediction is victorious for the nice, average and poor classes.The
description of these datasets is shown in Table I.

Table 1: Dataset Of students Predictor

Attributes Description Categorical values


Register no The student ID Numeric
Name The Name Of Student String
Medium The Medium of student Tamil
Community Student based on caste OC,BC,MBC,SC,ST
Date Of Birth The student born day DD:MM:YYYY
Attendance Regular class Numeric
Gender The Gender Of Student Male,Female

Total The subject to obtain Tamil, English,Maths,Science,Social

Volume 6, Issue 9, September 2018 Page 2


IPASJ International Journal of Computer Science (IIJCS)
Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm
A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 6, Issue 9, September 2018 ISSN 2321-5992

Science
Result Result in student Pass,Fail
This Table 2 shows the database of the student, there are 457 records of the student, given with the name, register
number, community, sex, attendance, medium, tamil, english,maths, science, social science,total,results.
Table 2. student database

Reg no Name Comm Gen Med Atten Tot Result


1814656 AISHA BC FEMALE T 100 453 Pass
SIDDIQA V
N
1814657 AKSHAYA S SC FEMALE T 84 376 Pass
1814731 AJAI ST MALE T 75 256 Pass

1814732 ARUN BC MALE T 90 383 Pass

A.CLASS LABEL ACCURACY FOR CLASSIFIERS


The Table 3 reveals that the confusion matrices are very helpful for analyzing the classifiers.
Table 3: Class Label Accuracy For classifier

Classifier TP Rate FP Rate Precision Recall Class


NaiveBayes 0.935 0.099 0.717 0.935 Fail
0.901 0.065 0.981 0.901 Pass
Random 0.989 0 1 0.989 Fail
Forest
1 0.011 0.997 1 Pass
J48 0.946 0.003 0.989 0.946 Fail
0.997 0.054 0.986 0.997 Pass
Decison 0.967 0.006 0.978 0.967 Fail
Table

0.994 0.003 0.991 0.994 Pass

B. ERROR MEASREMENT TABLE


In Table 4 it explains the time build by the Random Forest is less than the remaining classifier. Kappa statistics is a
measure of the degree of non random agreement between observers and measurement of a particular categorical
variable. The root mean square error and Mean absolute error of Random Forest are minimum when compared to other
classifiers. Therefore the Random Forest is that the efficient classification technique among remaining classification
technique.

Table 4: Error measurement table

Evalution NaiveBayes Random Forest J48 Decision


Criteria Table

Volume 6, Issue 9, September 2018 Page 3


IPASJ International Journal of Computer Science (IIJCS)
Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm
A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 6, Issue 9, September 2018 ISSN 2321-5992

Kappa Statistic 0.7517 0.9931 0.958 0.9654

Mean absolute 0.0901 0.0682 0.0278 0.0706


error

Root mean squard 0.2674 0.1057 0.1167 0.1325


error

Relative absolute 26.922 20.3528 8.131 21.0919


squared error

Root Relative 65.413 25.8605 28.547 32.4176


squard squared
error

C. MULTIPLE ROC CURVES


The ROC curves are defined using various classifiers like J48, Naïve Bayes and random forest as a base classifier on
the students dataset. The depicted ROC curves indicate that the performance of the classifier has vast difference
between normal classification. This shows that, the number of correct classification of positive instances is less and the
number of misclassification on negative instances is high on normal classification method. Whereas, the classifiers are
able to improve the result by producing very little misclassifications on both positive and negative instances. The step of
the curve toward the point (0, 1) on the proposed method proves the same. This is the vital facts to prove that, the
proposed model extensively enhances the performance of the classifier.

Figure I: Roc curves using Random Forest,J48, NaiveBayes Algorithm

Volume 6, Issue 9, September 2018 Page 4


IPASJ International Journal of Computer Science (IIJCS)
Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm
A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 6, Issue 9, September 2018 ISSN 2321-5992

D . TREE VISULIZATION OF J48

At each node of the tree, J48chooses the attribute of the data that most effectively splits its set of samples into subsets
enriched in one class or the other. The splitting criterion is the normalized information gain (difference in entropy).
The attribute with the highest normalized information gain is chosen to make the decision. The J48 algorithm then
recurs on the smaller sublists.

Figure II: Tree Visualization of J48

V.CONCLUSION
In this paper, the classification rule is employed on student info to predict the student’s performance within
the future examination on the idea of previous student’s info. As there are several approaches that are used
for knowledge classification, the Naïve Bayes algorithmic rule is employed here. Information’s like group action, marks
were collected from the student’s previous info, to predict the performance at the top of the examination. The

Volume 6, Issue 9, September 2018 Page 5


IPASJ International Journal of Computer Science (IIJCS)
Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm
A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 6, Issue 9, September 2018 ISSN 2321-5992

opposite attributes are collected by students and their individual schools who recognize the behavior of scholars. This
study can facilitate to the scholars and also the academics to enhance the results of the scholars who are at the
chance of failure.
In future work, it is do able to extend the analysis by victimisation completely different clustering techniques and
association rule mining for the students’ dataset. we planned to expanded data set form university with more
distinctive attributes to get more accurate results. This proposal will be utilized in future for similar variety
of analysis work.

REFERENCES
[1.] Brijesh Kumar Baradwaj and Saurabh Pal (2011) „Mining Educational Data to Analyze Students Performance,
International Journal of Advanced Computer Science and Applications, vol. 2, no. 6.
[2.] Chandra, E. and Nandhini, K. (2010) „Knowledge Mining from Student Data, European Journal of Scientific
Research, vol. 47, no.1.
[3.] Suchita Borkar, K. Rajeswari(2013),, Predicting Students Academic Performance Using Education Data Mining,
International Journal of Computer Science and Mobile Computing,vol.2,no.7.
[4.] Ajay Kumar Pal, Saurabh Pal (2015),, Analysis and Mining of Educational Data for Predicting the Performance of
Students,vol4.,no.5.
[5.] Kumar, V. and Chadha, A. (2011) „An Empirical Study of the Applications of Data Mining Techniques in Higher
Education, ,International Journal of Advanced Computer Science and Applications, vol. 2, no. 3.

Volume 6, Issue 9, September 2018 Page 6

Potrebbero piacerti anche