Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Importing libraries
#test value
x=5
In [2]: df = pd.read_csv("handw.csv")
In [3]: target=df['target'].values
labels=df.iloc[:,df.columns != 'target'].values
test=df.iloc[[x],df.columns != 'target'].values
In [4]: test=np.reshape(test,(28,28))
Visualising data
Target Value 3
In [7]: len(X_train)
Out[7]: 22823
In [8]: len(X_test)
Out[8]: 6438
Generalisation
The evaluation above is based on the test data and shows that the classifier was effective in the training
stage as it was able to generalise on this new training data and achieve a high accuracy score. This data was
used as a test set to evaluate the accuracy of how well the training data prepared the model. The accuracy
on this testing set is 90% so we can assume that the model wil be able to generalise to new data and make
accurate predictions. Overtraining on the training data can cause problems when the model is presented with
new data as it will be able to identify all the important information in the training data but not when theres new
data the model has not seen before. Thus making the model incapable of generalising. The aim of the model
is to be able to generalise well to new data after being trained on a set of traning data.
In [11]: print(sklearn.metrics.classification_report(y_test,pred))
num=0
print("Image: ",num)
plt.imshow(test_2)
plt.show()
Image: 1
Image: 2
Image: 3
Image: 4
Image: 5
Image: 6
Image: 7
Image: 8
Image: 9
Image: 10
These 10 images have been missclassified by the handwriting classifier. From the generated report it shows
the accuracy for correctly classfying the specifc numbers in the test data, which is a total of 6438. The rest of
the data is used for training the classifier. The images missclassified are not all perfect e.g. image 1 shows a
9 but the predicted value was 5 but looking at the handwriting style it can also be seen as a 4. Looking at
image 5 it clearly shows a 3 whereas the classifier once again predicted 5. Observing these images it shows
that the number 5 is a regular missclassification as all three images showing a 5 get predicted as a 3.
Different handwriting styles result in the classification task making errors on some of the images shown
above.