Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
TUTORIAL-1
PROBLEM 1:
Objective: To construct a decision Tree for the abalone dataset .Draw the decision tree to
classify a new record and show the accuracy of the tree.
Total classes in dataset:
There are 29 classes in given dataset based on Rings.
Training Data Set: 75 % of the data set (contains 34781 instances)
Test Data: 25 % data
Methodology:
i.
ii.
Tools used: Weka is a collection of machine learning algorithms for data mining tasks.
Weka contains tools for data pre-processing, classification, regression, clustering,
association rules, and visualization. Here Weka version 3.6.13 is used.
Features(if any)/Preprocess:
The classification algorithm works on nominal data so
Length, Diameter, Height, Whole weight, Shucked weight, Viscera weight, Shell
weight have continuous values which are reduced up to 2 precision value to
improve efficiency
RINGS Attribute Preprocessing: Rings attribute is divided into 3 classes
0-9 is classified as Young
10-14 is classified as Adult
>14 is classified as Old
Results:
Decision Tree for the Training data Set :
Number of Leaves :
Size of the tree : 51
28
Accuracy
Test Option
Classifier Accuracy %
75.6705 %
Supplied Test Set (25 % of the Test data generated from 80.9706 %
the training data for validation)
Percentage Split -66%
74.0845 %
Results of Clustering:
Value of K
2
3
4
5
=== Run information ===
K=2
Clustered Instances
33395 ( 96%)
1386 ( 4%)
Cluster
96%,4%
39%,4%,57%
37%,4%,57%,2%
35%,4%,55%,2%,3%
K=3
Clustered Instances
13410 ( 39%)
1549 ( 4%)
19822 ( 57%)
K=5
Clustered Instances
12346 ( 35%)
1549 ( 4%)
19252 ( 55%)
606 ( 2%)
1028 ( 3%)
PROBLEM:3
1.Objective: To create a classifier of the balance-scale dataset using K Nearest Neighbor
algorithm.
2. Total classes in dataset:
Number of Instances: 625 (49 balanced, 288 left, 288 right)
Attribute Information:
1. Class Name: 3 (L, B, R)
2. Left-Weight: 5 (1, 2, 3, 4, 5)
3. Left-Distance: 5 (1, 2, 3, 4, 5)
4. Right-Weight: 5 (1, 2, 3, 4, 5)
5. Right-Distance: 5 (1, 2, 3, 4, 5)
So, there are total 3 classes (49 balanced, 288 left, 288 right) in given dataset
2.1 Training Data Set- 70 % of total instances (438)
150
80.2139 %
37
19.7861 %
0.593
0.1563
0.2936
37.4503 %
60.5805 %
187
0.802
FP Rate
Precision
Recall
F-Measure
ROC Area
Class
0
0.248
0
0
0.507
1
0
1
0.83
0
0.673
0.907
0.609
0.97
0.982
B
R
L
0.05
a b c <-- classified as
0 14 0 | a = B
0 38 0 | b = R
0 23 112 | c = L
0.825
0.802
0.791
0.951
Accuracy Chart:
The Accuracy Chart for various values of K is given in table below
Test Option
Value of K
Classifier Accuracy %
84.8%
79.1444 %
81.2834 %
80.2139 %