Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
AnalysisofData MiningClassificationwithDecisiontreeTechnique
© 2013. Dharm Singh, Naveen Choudhary & Jully Samota. This is a research/review paper, distributed under the terms of the Creative
Commons Attribution-Noncommercial 3.0 Unported License http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-
commercial use, distribution, and reproduction inany medium, provided the original work is properly cited.
Analysis of Data Mining Classification with
Decision Tree Technique
Dharm Singh α, Naveen Choudhary σ & Jully Samota ρ
Abstract- The diversity and applicability of data mining are With the rising of data mining, decision tree
increasing day to day so need to extract hidden patterns from plays an important role in the process of data mining
Year 2 013
massive data. The paper states the problem of attribute bias. and data analysis. Decision tree learning involves in
Decision tree technique based on information of attribute is
using a set of training data to generate a decision tree
biased toward multi value attributes which have more but
that correctly classifies the training data itself. If the
insignificant information content. Attributes that have additional
values can be less important for various applications of learning process works, this decision tree will then
decision tree. Problem affects the accuracy of ID3 Classifier correctly classify new input data as well. Decision trees 1
and generate unclassified region. The performance of ID3 differ along several dimensions such as splitting
classification and cascaded model of RBF network for ID3 criterion, stopping rules, branch condition (univariate,
W
ith the rapid development of information attribute selection measure in ID3. ID3 is famous for the
technology and network technology, different merits of easy construction and strong learning ability.
trades produce large amounts of data every There exists a problem with this method, this means that
year. The data itself cannot bring direct benefits so need it is biased to select attributes with more taken values,
efficiency and precision (Li & Zhang, 2010).A additional connections can improve the speed. Artificial
comparison of attribute selection technique with rank of neural networks (ANNs) can find internal representations
attributes is presented. If irrelevant, redundant and noisy of dependencies within data that is not given. Short
attributes are added in model construction, predictive response time and simplicity to build the ANNs
performance is affected so need to choose useful encouraged their application to the task of attribute
attributes along with background knowledge (Hall & selection.
Holmes, 2003). To improve the accuracy rate of
classification and depth of tree, adaptive step IV. Process Method
forward/decision tree (ASF/DT) is proposed. The 1.Sampling of data from sampling technique
method considers not only one attribute but two that can 2.Split data into two parts training and testing part
find bigger information gain ratio (Tan & Liang, 2012). A 3.Apply CRBF function for training a sample value
Year 2 013
new heuristic technique for attribute selection criteria is 4.Using 2/3 of the sample, fit a tree the split at each
introduced. The best attribute, which have least heuristic node.
functional value are taken. The method can be extended For each tree
to larger databases with best splitting criteria for Predict classification of the available 1/3 using the
2 attribute selection (Raghu, Venkata Raju & Raja Jacob, tree, and calculate the misclassification rate
2012).Interval based algorithm is proposed. Algorithm = out of CRBF.
D ) Volume XIII Issue XIII Version I
has two phases for selection of attribute. First phase 5. For each variable in the tree
provides rank to attributes. Second phase selects the 6. Compute Error Rate: Calculate the overall
subset of attributes with highest accuracy. Proposed percentage of misclassification
method is applied on real life data set (Salama, M.A., El Variable selection: Average increase in CRBF
Bendary, N., Hassanien, Revett & Fahmy, 2011). error over all trees and assuming a normal
Large training data sets have millions of tuples. division of the increase among the trees, decide
Decision tree techniques have restriction that the tuples an associated value of feature.
should reside in memory. Construction process 7. Resulting classifier set is classified
becomes inefficient due to swapping of tuples in and Finally to estimate the entire model,
out of memory. More scalable approaches are required misclassification.
to handle data (Changala, R., Gummadi, A., 8. Decode the feature variable in result class
Yedukondalu & Raju, 2012).An improved learning
Global Journal of Computer Science and Technology ( C
Load Data
Year 2 013
Train Data Test Data
CRBF 3
Choose node
Variable as feature
Select
Variable
Sample data -I
Most Yes
Class label
trees is assigned
found
Tree is called
V. Experimental Results
For the performance evaluation cancer dataset from UCI machine learning repository is used.
Year 2 013
4
D ) Volume XIII Issue XIII Version I
VI. Conclusion 13. Kubat, M. (1998). Decision trees can initialize radial-
basis function networks. IEEE transactions on neural
In this paper, we have experimented cascaded networks, vol. 9, no. 5.
model of RBF with ID3 classification. The standard 14. Fu, X., Wang, L. (2003). Data dimensionality
presentation of each attribute on selected ID3 is reduction with application to simplifying RBF
calculated and the Classify the given data. We can say network structure and improving classification
from the experiments that the cascaded model of RBF performance. IEEE transactions on systems, man,
with ID3 approach provides better accuracy and and cybernetics vol. 33, no. 3.
reduces the unclassified region. Increased classification
region improves the performance of classifier.
Year 2 013
1. Lakshmi, B.N., Raghunandhan, G.H. (2011). A
conceptual overview of data mining. Proceedings of
the National Conference on Innovations in Emerging
Technology, 27-32. 5
2. Han, J., Kamber, M., Pei, J. (2012) DATA MINING:
6
D ) Volume XIII Issue XIII Version I