Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
EEL 709
DEEPALI JAIN
2012ee10082
Motive :
Approach
Cross validation by varying C and kernel parameter initially insteps of log2c = 3 at 10 fold cross
validation.
Obtaining the best parameters and then again varying them in steps of log2c= 1 around the
previously obtained values to obtain more accurate values.
Test the final parameter without any cross validation on the data.
99.2
99
98.8
98.6
98.4
98.2
-5
10
15
Radial Kernel :
2
0
65
70
75
80
85
90
95
90
90
95
95
-2
log2gamma
-4
-6
95
-8
90
-10
-12
95
90
-14
-4
-2
4
log2c
10
12
degree
Polynomial Kernel :
98.5
98
97.5
97
96.5
96
-4
-2
Underfit at C < -4
4
log2c
10
12
95.5
Features
Linear
Radial
Polynomial
1-15
Best c 2^4 ,
Accuracy=99.2
Best c : 2^13 ,
Best g : 2^-9
Accuracy=99.4
Best c : 2^1,
Best deg: 2 ,
Accuracy=99.6
1-10
Best c 2^1,
Best c : 2^4,
Best c : 2^1,
Accuracy=99
Best g : 2^-6 ,
Accuracy=99
Best c : 2^13 ,
Best g : 2^0
Accuracy=99.4
Best c : 2^1 ,
Best g : 2^0
Accuracy=91.8
Best c : 2^1 ,
Best g : 2^0
Accuracy=99
Best deg: 1 ,
Accuracy=99.2
Best c : 2^-2,
Best deg: 5 ,
Accuracy=99.2
Best c : 2^10,
Best deg: 2 ,
Accuracy=92.3
Best c : 2^1,
Best deg: 3 ,
Accuracy=98.6
1-5
Best c 2^4 ,
Accuracy=99.2
5-10
Best C : 2^7 ,
Accuracy=93.2
10-15
Best c 2^4 ,
Accuracy=98.6
Observations:
Kernel:
Changing the kernel function has no drastic effect on accuracy for any number of features.
Hence sigmoidal was not used and other parameters were analysed more.
A low degree polynomial in general gives good enough results.
With decrease in number of features, radial kernel parameters are affected the maximum for
change in number of features. Best C of other 2 is usually same (2^4 and 2^1)
As we go from 15 to 5 features, vast variation occurs in radial kernel.
Fitting:
With radial basis, hard margin case can be approximated when all features are used.
For linear and polynomial, underfit and over fit critical C occur at approx 2^(-5) and 2^(+5)
respectively in all cases. Overfit and underfit are less prominent in radial (shape of graph).
For polynomial, at lower degree over, under fit occurs is prominent. At high degrees, accuracy is
less for all c values.
Features:
The best parameters do not vary much as features decreased from 15 to 10. However if we
further decrease it to 5, the change seems more significant.
Ignoring the effect of combination of two features on the accuracy (use filter and not wrapping),
it can be seen that features 1-5 and 10-15 are more important.
Results
2-5
1-4
10-14
11-15
11,14,15
Observations :
Amongst, [1,5] removing even a single feature really reduces accuracy.
Features 11, 14, 15 alone give very high accuracy and parameter setting comes close to more
feature case
When accuracy is good, best C is 2^4 irrespective of the features included for linear.
No prominent overfitting. With increase in C, there is never a very substantial decrease in accuracy.
Polynomial :
Overfit : C >2^4
Underfit : C < 2 ^(-4)
Features
Linear
Radial
Polynomial
1-15
Best c 2^1 ,
Accuracy=98.6028
1-10
Best c 2^4 ,
Accuracy=98.4032
1-5
Best c 2^4 ,
Accuracy=75.8483
Best c : 2^-2,
Best deg: 3 ,
Accuracy=98.8028
Best c : 2^4,
Best deg: 4 ,
Accuracy=98.004
Poor acc
5-10
Best c 2^4 ,
Accuracy=94.61
Best c : 2^1 ,
Best g : 2^-3
Accuracy=98.8024
Best c : 2^13 ,
Best g : 2^-3
Accuracy=98.6028
Best c : 2^7 ,
Best g : 2^-6
Accuracy=76.6467
Best c : 2^7 ,
Best g : 2^-3 ,
Accuracy=89.2216
10-15
Best c 2^1 ,
Accuracy=98.60
Best c : 2^1 ,
Best g : 2^-3
Accuracy=98.6028
Best c : 2^3,
Best deg: 2 ,
Accuracy=96.8064
Poor acc
Other Observations :
In this case, best parameters for features [10,15] were closer to the actual ones than [1,10] ie
[10,15] are most essential.
When further subsets were takes, it was again found that {11,14,15} give accuracy 98.2%, Best c
= 2^(-2) for linear case.
III]. Classes : 8,9
All features Linear:
overfit at c > 2^4, underfit at c < 2^(-4)
Polynomial : overfit at c>2^5, underfit at c< 2^(-6)
1-10 features :
Lin : overfit at c > 2^6, underfit at c < 2^(-4)
Features
Linear
Radial
Polynomial
1-15
Best c 2^4 ,
Accuracy=95.01
1-10
Best c 2^4 ,
Accuracy=95.8084
Best c : 2^10 ,
Best g : 2^-9
Accuracy=95.6088
Best c : 2^4 ,
Best c : 2^4,
Best deg: 2 ,
Accuracy=96.2076
Best c : 2^7,
Best g : 2^-3
Accuracy=95.01
Best c : 2^4 ,
Best deg: 2 ,
Accuracy=94.8104
Best c : 2^4,
Best g : 2^-3 ,
Accuracy=72.2555
Best c : 2^7 ,
Best g : 2^-3 ,
Accuracy=81.6367
Best c : 2^4 ,
Best g : 2^-3 ,
Accuracy=95.2096
Best deg: 2 ,
Accuracy=68.0639
Poor acc
1-5
Best c 2^1 ,
Accuracy=72.6547
5-10
Best c 2^10 ,
Accuracy=80.4391
10-15
Best c 2^1 ,
Accuracy=94.6088
Best c : 2^3,
Best deg: 3 ,
Accuracy=94.2116
Other Observations :
Again , accuracy using 15 features and last 5 features is almost the same. However in the latter
case, we get a more complex model.
Again features {11,14,15} in linear kernel give : Best c 2^-2 , Accuracy=92.2156.
Linear
Radial
Polynomial
1-15
Best c 2^1 ,
Accuracy=97.4052
Best c : 2^4 ,
Best c : 2^7,
Best g : 2^-3
Accuracy=97.6048
Best c : 2^13 ,
Best deg: 2 ,
Accuracy=97.2056
Best c : 2^1,
Best g : 2^-6
Accuracy=96.8064
Best deg: 1 ,
Accuracy=96.008
Best c : 2^-2 ,
Best g : 2^0
Accuracy=81.6367
Poor acc
Best c : 2^3 ,
Best c : 2^6,
Best g : 2^-3 ,
Accuracy=97.6096
Best deg: 2 ,
Accuracy=96.2151
1-10
Best c 2^4 ,
Accuracy=96.6068
1-5
Best c 2^1 ,
Accuracy=81.6367
5-10
Best c 2^1 ,
Accuracy=83.8323
Best c 2^4 ,
Accuracy=96.4072
10-15
Other Observations :
Features 10-15 are most important.
Again features {11,14,15} in linear kernel give Best c 2^4 , Accuracy=95.4092
ANALYSES OF PARAMETERS FOR DIFFERENT PAIRS OF CLASSES:
Features
Linear
Radial
Polynomial
1-15 {0,1}
1-15 {4,5}
Best c 2^1 ,
Accuracy=98.6028
1-15 {8,9}
Best c 2^4 ,
Accuracy=95.01
1-15 {4,8}
Best c 2^1 ,
Accuracy=97.4052
Best c : 2^13 ,
Best g : 2^-9
Accuracy=99.4
Best c : 2^1 ,
Best g : 2^-3
Accuracy=98.8024
Best c : 2^10 ,
Best g : 2^-9
Accuracy=95.6088
Best c : 2^4 ,
Best g : 2^-3
Accuracy=97.6048
Best c : 2^1,
Best deg: 2 ,
Accuracy=99.6
Best c : 2^-2,
Best deg: 3 ,
Accuracy=98.8028
Best c : 2^4,
Best deg: 2 ,
Accuracy=96.2076
Best c : 2^7,
Best deg: 2 ,
Accuracy=97.2056
1-10
Best c 2^4 ,
Accuracy=98.4032
1-10
Best c 2^1,
Accuracy=99
Best c : 2^13 ,
Best g : 2^-3
Accuracy=98.6028
Best c : 2^4,
Best g : 2^-6 ,
Accuracy=99
Best c : 2^4,
Best deg: 4 ,
Accuracy=98.004
Best c : 2^1,
Best deg: 1 ,
Accuracy=99.2
1-10
Best c 2^4 ,
Accuracy=95.8084
Best c : 2^4 ,
Best g : 2^-3
Accuracy=95.01
Best c : 2^13 ,
Best g : 2^-6
Accuracy=96.8064
Best c : 2^7,
Best deg: 2 ,
Accuracy=94.8104
Best c : 2^1,
Best deg: 1 ,
Accuracy=96.008
1-10
Best c 2^4 ,
Accuracy=96.6068
10-15
Best c : 2^1 ,
Best g : 2^0 Accuracy=99
10-15
Best c 2^1 ,
Accuracy=98.60
10-15
Best c 2^1 ,
Accuracy=94.6088
10-15
Best c 2^4 ,
Accuracy=96.4072
Best c : 2^1 ,
Best g : 2^-3
Accuracy=98.6028
Best c : 2^4 ,
Best g : 2^-3 ,
Accuracy=95.2096
Best c : 2^3 ,
Best g : 2^-3 ,
Accuracy=97.6096
Best c : 2^1,
Best deg: 3 ,
Accuracy=98.6
Best c : 2^3,
Best deg: 2 ,
Accuracy=96.8064
Best c : 2^3,
Best deg: 3 ,
Accuracy=94.2116
Best c : 2^6,
Best deg: 2 ,
Accuracy=96.2151
Observations :
There is some correlation between different pairs of classes. With lesser features, similarity is
higher.
Linear :
With any number of features, value of best c is very similar in all cases.
Radial:
A higher C is usually preferred while gamma seems to change a little arbitrarily.
Polynomial:
Degree obtained is low and bestC shows more variation than linear but less than radial.
MULTICLASS :
ONE v/s ONE
Features
Linear
1-15
Best c 2^6 ,
Accuracy=87.32
1-10
Best c 2^6 ,
Accuracy=80.24
Best c 2^6 ,
Accuracy=79.6
10-15
Radial
Polynomial
Best c : 2^14 ,
Best g : 2^-9
Accuracy=89
Best c : 2^4,
Best deg: 2 ,
Accuracy=89.12
Poor acc
Poor acc
Best c : 2^6 ,
Best g : 2^-3 ,
Accuracy=71.52
Best c : 2^4,
Best deg: 2 ,
Accuracy=80.72
11 14 15
Best c 2^1 ,
Accuracy=47.6
Observations :
Multiclass classification gives almost 10% less accuracy.
Also, there seems no highly preferred set of features.
Although, features 1-10 and features 10-15 give the same accuracy (approx), unlike in the binary
classification these accuracies are really less than all feature accuracy . Hence leaving out some
features does not make sense.
There is not a very large difference in binary class average parameters and multiclass. Eg : Linear
gives best c near 2^4, rad gives a high gamma near 2^-6 and polynomial gives a low degree.
One versus all :
Class
0
1
2
3
4
5
6
7
8
9
Accuracy
Lin : Best c
4
4
4
7
4
7
7
7
4
4
80.8 %
Rad: Best c
-6
-6
-6
-6
-9
-6
-9
-9
-6
-6
76.8%
Rad : Best g
7
7
7
13
13
10
13
13
10
13
Poly : Best c
4
4
4
7
7
7
7
7
4
4
78.24%
Poly : Best d
7
7
7
7
7
7
7
7
7
7
OVO vs OVA :
Accuracy in one versus all is less than one versus one.
One vs one takes less computational time since dataset for each of the binary classifier is
reduced.
However one vs all gives an insight about each class parameters.
The feature selection in ova follows the same concept as ovo but the accuracy is less than ovo.
Idea:
If we have F1 and F2 within range (1,2) and (1000,2000) respectively for a dataset. Now to want the
hyperplane to depend on distribution of points, we need to normalize. For consider (1.1, 1100) and (1.4,
1100). The classifier will be either less sensitive to F1 or have a very large coefficient for it.
15 features:
All features, multiclass
Non scaled
Scaled
87.32
88.12
89
88.68
89.12
81.96
93.4
94.4
91.2
94.6
92.3
93.4
80.4
83.2669
82.27
86.0558
81.051
83.4661
Observations :
When the accuracy is already high, scaling shows a very slight increase (hence results for binary,
all feature is not shown).
If the accuracy is slow, scaling is a good option.
For multiclass, scaling is less consequential