Report File Dataset

MACHINE LEARNING
TUTORIAL-1
PROBLEM 1:
Objective: To construct a decision Tree for the abalone dataset .Draw the decision tree to
classify a new record and show the accuracy of the tree.
Total classes in dataset:
There are 29 classes in given dataset based on Rings.
Training Data Set: 75 % of the data set (contains 34781 instances)
Test Data: 25 % data
Methodology:
i.
ii.
Tools used: Weka is a collection of machine learning algorithms for data mining tasks.
Weka contains tools for data pre-processing, classification, regression, clustering,
association rules, and visualization. Here Weka version 3.6.13 is used.
Features(if any)/Preprocess:
The classification algorithm works on nominal data so
Length, Diameter, Height, Whole weight, Shucked weight, Viscera weight, Shell
weight have continuous values which are reduced up to 2 precision value to
improve efficiency
RINGS Attribute Preprocessing: Rings attribute is divided into 3 classes
0-9 is classified as Young
10-14 is classified as Adult
>14 is classified as Old
Results:
Decision Tree for the Training data Set :
Number of Leaves :
Size of the tree : 51
28
Accuracy
Test Option
Classifier Accuracy %
Cross Validation 10 Fold
75.6705 %
Supplied Test Set (25 % of the Test data generated from 80.9706 %
the training data for validation)
Percentage Split -66%
74.0845 %
PROBLEM 2: Perform Clustering using K means on the following Data set.

Objective: To cluster the plant dataset using K Means clustering algorithm
Total classes in dataset:
The data is in the transactional form. It contains the Latin names (species or genus) and state
abbreviations.
Total classes in the data set will depend on the value of K provided by user .
Number of classes = K.
Training data set: 100 % (As it is a clustering task we do not need to create test data set)
Methodology:
Tools used: Weka is a collection of machine learning algorithms for data mining tasks. Weka
contains tools for data pre-processing, classification, regression, clustering, association rules, and
visualization. Here Weka version 3.6.13 is used.
Building Cluster:
Simple K means algorithm is used to cluster whole data set. The clustering process clusters the
whole data set into K clusters.
Results of Clustering:
Value of K
2
3
4
5
=== Run information ===
K=2
Clustered Instances
33395 ( 96%)
1386 ( 4%)
Cluster
96%,4%
39%,4%,57%
37%,4%,57%,2%
35%,4%,55%,2%,3%
K=3
Clustered Instances
13410 ( 39%)
1549 ( 4%)
19822 ( 57%)
K=5
Clustered Instances
12346 ( 35%)
1549 ( 4%)
19252 ( 55%)
606 ( 2%)
1028 ( 3%)
PROBLEM:3
1.Objective: To create a classifier of the balance-scale dataset using K Nearest Neighbor
algorithm.
2. Total classes in dataset:
Number of Instances: 625 (49 balanced, 288 left, 288 right)
Attribute Information:
1. Class Name: 3 (L, B, R)
2. Left-Weight: 5 (1, 2, 3, 4, 5)
3. Left-Distance: 5 (1, 2, 3, 4, 5)
4. Right-Weight: 5 (1, 2, 3, 4, 5)
5. Right-Distance: 5 (1, 2, 3, 4, 5)
So, there are total 3 classes (49 balanced, 288 left, 288 right) in given dataset
2.1 Training Data Set- 70 % of total instances (438)
2.2 Test Data Set- 30% of the removed data set.

Methodology:
Tools used: Weka is a collection of machine learning algorithms for data mining tasks. Weka
contains tools for data pre-processing, classification, regression, clustering, association rules, and
visualization. Here Weka version 3.6.13 is used.
Building and Testing Classifier
Firstly the model is trained with complete data set containing 625 instances and tested with cross
fold 10 validations (K=1). Cross fold -10 divides the database into 10 equal parts ,train the
model with 9 parts and test it with 1 part.
Then the model is trained with only training data containing only 70% of the data and tested with
30% of remaining data as the test set. Accuracy for various values of K is given in table under
Heading Accuracy.
Results:
Run Information for case :Supplied Test Set and K=5
=== Run information ===
=== Summary ===
Correctly Classified Instances

Incorrectly Classified Instances
Kappa statistic
150
80.2139 %
37
19.7861 %
0.593
Mean absolute error
0.1563
Root mean squared error
0.2936
Relative absolute error
37.4503 %
Root relative squared error
60.5805 %
Total Number of Instances
187
=== Detailed Accuracy By Class ===

TP
Rat
e
0
1
0.83
Weighted Avg.
0.802
FP Rate
Precision
Recall
F-Measure
ROC Area
Class
0
0.248
0
0
0.507
1
0
1
0.83
0
0.673
0.907
0.609
0.97
0.982
B
R
L
0.05
=== Confusion Matrix ===
a b c <-- classified as
0 14 0 | a = B
0 38 0 | b = R
0 23 112 | c = L
0.825
0.802
0.791
0.951
Accuracy Chart:
The Accuracy Chart for various values of K is given in table below
Test Option
Value of K
Classifier Accuracy %
Cross Validation 10 Fold
84.8%
Supplied Test Set (30 % of the split data)
79.1444 %
81.2834 %
80.2139 %

Report File Dataset

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Report File Dataset

Caricato da

Copyright:

Formati disponibili

MACHINE LEARNING

Cross Validation 10 Fold

PROBLEM 2: Perform Clustering using K means on the following Data set.

2.2 Test Data Set- 30% of the removed data set.

=== Summary ===

Correctly Classified Instances

Mean absolute error

Root mean squared error

Relative absolute error

Root relative squared error

Total Number of Instances

=== Detailed Accuracy By Class ===

=== Confusion Matrix ===

Cross Validation 10 Fold

Supplied Test Set (30 % of the split data)

Supplied Test Set (30 % of the split data)

Supplied Test Set (30 % of the split data)

Potrebbero piacerti anche