Sei sulla pagina 1di 9

ASSIGNMENT 2

EEL 709
DEEPALI JAIN
2012ee10082
Motive :

Grid search for linear, radial and polynomial function.


Effect of varying features used. (15 -> 10 -> 5 -> further subsets within the best range)
Binary Classification : 4 pairs of classes analysed : 0,1 ; 4,5 ; 8,9 ; 4,8
Analyse overfitting, best fit, under fit.
One vs one and one vs all
Scaling of feature values

Approach

Cross validation by varying C and kernel parameter initially insteps of log2c = 3 at 10 fold cross
validation.
Obtaining the best parameters and then again varying them in steps of log2c= 1 around the
previously obtained values to obtain more accurate values.
Test the final parameter without any cross validation on the data.

ANALYSIS FOR EACH PAIR OF CLASSES :


I]. Classes : 0,1
Features : 1-15
Linear Kernel

99.2

99

98.8

98.6

98.4

98.2

-5

Overfit occurs at C>2^6


Underfit occurs at C <2(-5)

10

15

Radial Kernel :

2
0

65
70
75
80
85

90
95

90

90

95

95

-2

log2gamma

-4
-6
95

-8
90

-10
-12

95
90

-14
-4

-2

4
log2c

10

12

Best c : 2^13 , Best g : 2^-9 Accuracy=99.4


Overfitting is not easily distinguished. For given gamma, accuracy keeps on increasing with c
asymptotically for a very very large range.
This means that with radial basis, hard margin case can be approximated when all features are used.
Eg: Even for gamma = log2(-6) and log2c = 190, Accuracy is 99.2%

degree

Polynomial Kernel :

98.5

98

97.5

97

96.5

96

-4

-2

Best c : 2^1, Best deg: 2 , Accuracy=99.6


Overfit at c>4

Underfit at C < -4

4
log2c

10

12

95.5

Features

Linear

Radial

Polynomial

1-15

Best c 2^4 ,
Accuracy=99.2

Best c : 2^13 ,
Best g : 2^-9
Accuracy=99.4

Best c : 2^1,
Best deg: 2 ,
Accuracy=99.6

1-10

Best c 2^1,

Best c : 2^4,

Best c : 2^1,

Accuracy=99

Best g : 2^-6 ,
Accuracy=99
Best c : 2^13 ,
Best g : 2^0
Accuracy=99.4
Best c : 2^1 ,
Best g : 2^0
Accuracy=91.8
Best c : 2^1 ,
Best g : 2^0
Accuracy=99

Best deg: 1 ,
Accuracy=99.2
Best c : 2^-2,
Best deg: 5 ,
Accuracy=99.2
Best c : 2^10,
Best deg: 2 ,
Accuracy=92.3
Best c : 2^1,
Best deg: 3 ,
Accuracy=98.6

1-5

Best c 2^4 ,
Accuracy=99.2

5-10

Best C : 2^7 ,
Accuracy=93.2

10-15

Best c 2^4 ,
Accuracy=98.6

Observations:
Kernel:
Changing the kernel function has no drastic effect on accuracy for any number of features.
Hence sigmoidal was not used and other parameters were analysed more.
A low degree polynomial in general gives good enough results.
With decrease in number of features, radial kernel parameters are affected the maximum for
change in number of features. Best C of other 2 is usually same (2^4 and 2^1)
As we go from 15 to 5 features, vast variation occurs in radial kernel.
Fitting:
With radial basis, hard margin case can be approximated when all features are used.
For linear and polynomial, underfit and over fit critical C occur at approx 2^(-5) and 2^(+5)
respectively in all cases. Overfit and underfit are less prominent in radial (shape of graph).
For polynomial, at lower degree over, under fit occurs is prominent. At high degrees, accuracy is
less for all c values.
Features:
The best parameters do not vary much as features decreased from 15 to 10. However if we
further decrease it to 5, the change seems more significant.
Ignoring the effect of combination of two features on the accuracy (use filter and not wrapping),
it can be seen that features 1-5 and 10-15 are more important.

Now, we see subset of features from [1,5] and [10,15] :


We use linear kernel to get the accuracy in each subset.
Features

Results

2-5
1-4
10-14
11-15
11,14,15

Best c 2^4 , Accuracy=93.8


Best c 2^7 , Accuracy=95.6
Best c 2^10 , Accuracy=97.2
Best c 2^4 , Accuracy=98.8
Best c 2^4 , Accuracy=98.6

Observations :
Amongst, [1,5] removing even a single feature really reduces accuracy.
Features 11, 14, 15 alone give very high accuracy and parameter setting comes close to more
feature case
When accuracy is good, best C is 2^4 irrespective of the features included for linear.

II]. Classes 4,5


Features : 1-15
Linear
:
Overfit : c>2^6
Radial :

Underfit : C < 2^(-4)

No prominent overfitting. With increase in C, there is never a very substantial decrease in accuracy.
Polynomial :
Overfit : C >2^4
Underfit : C < 2 ^(-4)

Features

Linear

Radial

Polynomial

1-15

Best c 2^1 ,
Accuracy=98.6028

1-10

Best c 2^4 ,
Accuracy=98.4032

1-5

Best c 2^4 ,
Accuracy=75.8483

Best c : 2^-2,
Best deg: 3 ,
Accuracy=98.8028
Best c : 2^4,
Best deg: 4 ,
Accuracy=98.004
Poor acc

5-10

Best c 2^4 ,
Accuracy=94.61

Best c : 2^1 ,
Best g : 2^-3
Accuracy=98.8024
Best c : 2^13 ,
Best g : 2^-3
Accuracy=98.6028
Best c : 2^7 ,
Best g : 2^-6
Accuracy=76.6467
Best c : 2^7 ,
Best g : 2^-3 ,
Accuracy=89.2216

10-15

Best c 2^1 ,
Accuracy=98.60

Best c : 2^1 ,
Best g : 2^-3
Accuracy=98.6028

Best c : 2^3,
Best deg: 2 ,
Accuracy=96.8064

Poor acc

Other Observations :
In this case, best parameters for features [10,15] were closer to the actual ones than [1,10] ie
[10,15] are most essential.
When further subsets were takes, it was again found that {11,14,15} give accuracy 98.2%, Best c
= 2^(-2) for linear case.
III]. Classes : 8,9
All features Linear:
overfit at c > 2^4, underfit at c < 2^(-4)
Polynomial : overfit at c>2^5, underfit at c< 2^(-6)
1-10 features :
Lin : overfit at c > 2^6, underfit at c < 2^(-4)
Features

Linear

Radial

Polynomial

1-15

Best c 2^4 ,
Accuracy=95.01

1-10

Best c 2^4 ,
Accuracy=95.8084

Best c : 2^10 ,
Best g : 2^-9
Accuracy=95.6088
Best c : 2^4 ,

Best c : 2^4,
Best deg: 2 ,
Accuracy=96.2076
Best c : 2^7,

Best g : 2^-3
Accuracy=95.01
Best c : 2^4 ,

Best deg: 2 ,
Accuracy=94.8104
Best c : 2^4,

Best g : 2^-3 ,
Accuracy=72.2555
Best c : 2^7 ,
Best g : 2^-3 ,
Accuracy=81.6367
Best c : 2^4 ,
Best g : 2^-3 ,
Accuracy=95.2096

Best deg: 2 ,
Accuracy=68.0639
Poor acc

1-5

Best c 2^1 ,
Accuracy=72.6547

5-10

Best c 2^10 ,
Accuracy=80.4391

10-15

Best c 2^1 ,
Accuracy=94.6088

Best c : 2^3,
Best deg: 3 ,
Accuracy=94.2116

Other Observations :
Again , accuracy using 15 features and last 5 features is almost the same. However in the latter
case, we get a more complex model.
Again features {11,14,15} in linear kernel give : Best c 2^-2 , Accuracy=92.2156.

IV]. Classes : 4,8


Features

Linear

Radial

Polynomial

1-15

Best c 2^1 ,
Accuracy=97.4052

Best c : 2^4 ,

Best c : 2^7,

Best g : 2^-3
Accuracy=97.6048
Best c : 2^13 ,

Best deg: 2 ,
Accuracy=97.2056
Best c : 2^1,

Best g : 2^-6
Accuracy=96.8064

Best deg: 1 ,
Accuracy=96.008

Best c : 2^-2 ,
Best g : 2^0
Accuracy=81.6367

Poor acc

Best c : 2^3 ,

Best c : 2^6,

Best g : 2^-3 ,
Accuracy=97.6096

Best deg: 2 ,
Accuracy=96.2151

1-10

Best c 2^4 ,
Accuracy=96.6068

1-5

Best c 2^1 ,
Accuracy=81.6367

5-10

Best c 2^1 ,
Accuracy=83.8323
Best c 2^4 ,
Accuracy=96.4072

10-15

Other Observations :
Features 10-15 are most important.
Again features {11,14,15} in linear kernel give Best c 2^4 , Accuracy=95.4092
ANALYSES OF PARAMETERS FOR DIFFERENT PAIRS OF CLASSES:
Features

Linear

Radial

Polynomial

1-15 {0,1}

Best c 2^4 , Accuracy=99.2

1-15 {4,5}

Best c 2^1 ,
Accuracy=98.6028

1-15 {8,9}

Best c 2^4 ,
Accuracy=95.01

1-15 {4,8}

Best c 2^1 ,
Accuracy=97.4052

Best c : 2^13 ,
Best g : 2^-9
Accuracy=99.4
Best c : 2^1 ,
Best g : 2^-3
Accuracy=98.8024
Best c : 2^10 ,
Best g : 2^-9
Accuracy=95.6088
Best c : 2^4 ,
Best g : 2^-3
Accuracy=97.6048

Best c : 2^1,
Best deg: 2 ,
Accuracy=99.6
Best c : 2^-2,
Best deg: 3 ,
Accuracy=98.8028
Best c : 2^4,
Best deg: 2 ,
Accuracy=96.2076
Best c : 2^7,
Best deg: 2 ,
Accuracy=97.2056

1-10

Best c 2^4 ,
Accuracy=98.4032

1-10

Best c 2^1,
Accuracy=99

Best c : 2^13 ,
Best g : 2^-3
Accuracy=98.6028
Best c : 2^4,
Best g : 2^-6 ,
Accuracy=99

Best c : 2^4,
Best deg: 4 ,
Accuracy=98.004
Best c : 2^1,
Best deg: 1 ,
Accuracy=99.2

1-10

Best c 2^4 ,
Accuracy=95.8084

Best c : 2^4 ,
Best g : 2^-3
Accuracy=95.01
Best c : 2^13 ,
Best g : 2^-6
Accuracy=96.8064

Best c : 2^7,
Best deg: 2 ,
Accuracy=94.8104
Best c : 2^1,
Best deg: 1 ,
Accuracy=96.008

1-10

Best c 2^4 ,
Accuracy=96.6068

10-15

Best c 2^4 , Accuracy=98.6

Best c : 2^1 ,
Best g : 2^0 Accuracy=99

10-15

Best c 2^1 ,
Accuracy=98.60

10-15

Best c 2^1 ,
Accuracy=94.6088

10-15

Best c 2^4 ,
Accuracy=96.4072

Best c : 2^1 ,
Best g : 2^-3
Accuracy=98.6028
Best c : 2^4 ,
Best g : 2^-3 ,
Accuracy=95.2096
Best c : 2^3 ,
Best g : 2^-3 ,
Accuracy=97.6096

Best c : 2^1,
Best deg: 3 ,
Accuracy=98.6
Best c : 2^3,
Best deg: 2 ,
Accuracy=96.8064
Best c : 2^3,
Best deg: 3 ,
Accuracy=94.2116
Best c : 2^6,
Best deg: 2 ,
Accuracy=96.2151

Observations :
There is some correlation between different pairs of classes. With lesser features, similarity is
higher.
Linear :
With any number of features, value of best c is very similar in all cases.
Radial:
A higher C is usually preferred while gamma seems to change a little arbitrarily.
Polynomial:
Degree obtained is low and bestC shows more variation than linear but less than radial.

MULTICLASS :
ONE v/s ONE
Features

Linear

1-15

Best c 2^6 ,
Accuracy=87.32

1-10

Best c 2^6 ,
Accuracy=80.24
Best c 2^6 ,
Accuracy=79.6

10-15

Radial

Polynomial

Best c : 2^14 ,
Best g : 2^-9
Accuracy=89

Best c : 2^4,
Best deg: 2 ,
Accuracy=89.12

Poor acc

Poor acc

Best c : 2^6 ,
Best g : 2^-3 ,
Accuracy=71.52

Best c : 2^4,
Best deg: 2 ,
Accuracy=80.72

11 14 15

Best c 2^1 ,
Accuracy=47.6

Observations :
Multiclass classification gives almost 10% less accuracy.
Also, there seems no highly preferred set of features.
Although, features 1-10 and features 10-15 give the same accuracy (approx), unlike in the binary
classification these accuracies are really less than all feature accuracy . Hence leaving out some
features does not make sense.
There is not a very large difference in binary class average parameters and multiclass. Eg : Linear
gives best c near 2^4, rad gives a high gamma near 2^-6 and polynomial gives a low degree.
One versus all :
Class
0
1
2
3
4
5
6
7
8
9
Accuracy

Lin : Best c
4
4
4
7
4
7
7
7
4
4
80.8 %

Rad: Best c
-6
-6
-6
-6
-9
-6
-9
-9
-6
-6
76.8%

Rad : Best g
7
7
7
13
13
10
13
13
10
13

Poly : Best c
4
4
4
7
7
7
7
7
4
4
78.24%

Poly : Best d
7
7
7
7
7
7
7
7
7
7

OVO vs OVA :
Accuracy in one versus all is less than one versus one.
One vs one takes less computational time since dataset for each of the binary classifier is
reduced.
However one vs all gives an insight about each class parameters.
The feature selection in ova follows the same concept as ovo but the accuracy is less than ovo.

ANOTHER IDEA TO IMPROVE ACCURACY :


Scaling :

Idea:
If we have F1 and F2 within range (1,2) and (1000,2000) respectively for a dataset. Now to want the
hyperplane to depend on distribution of points, we need to normalize. For consider (1.1, 1100) and (1.4,
1100). The classifier will be either less sensitive to F1 or have a very large coefficient for it.
15 features:
All features, multiclass

Features 5-10, classes 0,1

Features 5-10, Classes 4,8

Non scaled

Scaled

87.32

88.12

89

88.68

89.12

81.96

93.4

94.4

91.2

94.6

92.3

93.4

80.4

83.2669

82.27

86.0558

81.051

83.4661

Observations :
When the accuracy is already high, scaling shows a very slight increase (hence results for binary,
all feature is not shown).
If the accuracy is slow, scaling is a good option.
For multiclass, scaling is less consequential

Potrebbero piacerti anche