Sei sulla pagina 1di 7

19/05/2019 workshop 3 Part 2

1608275 Tamur Khan

Importing libraries

In [1]: import numpy as np


import pandas as pd
from matplotlib import pyplot as plt
from PIL import Image
import sklearn
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn.metrics import accuracy_score

#test value
x=5

loading custom dataset

In [2]: df = pd.read_csv("handw.csv")

creating np arrays for the required data

In [3]: target=df['target'].values
labels=df.iloc[:,df.columns != 'target'].values
test=df.iloc[[x],df.columns != 'target'].values

Reshaping the data to a 28x28 matrix to allow visualising

In [4]: test=np.reshape(test,(28,28))

Visualising data

In [5]: plt.imshow(test, interpolation='nearest')


plt.show()
print("Target Value",target[x])

Target Value 3

split data into training and test sets

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 1/7


19/05/2019 workshop 3 Part 2

In [6]: X_train, X_test, y_train, y_test = train_test_split(


labels, target, test_size=0.22)

In [7]: len(X_train)

Out[7]: 22823

In [8]: len(X_test)

Out[8]: 6438

Creating the decision tree classifier and running it on training data

In [9]: clf = tree.DecisionTreeClassifier()


clf = clf.fit(X_train, y_train)

checking accuracy of the fit

In [10]: pred = clf.predict(X_test)


print ("DecisionTreeClassifier accuracy score is ",accuracy_score(y_test, pred))

DecisionTreeClassifier accuracy score is 0.9068033550792172

Generalisation

The evaluation above is based on the test data and shows that the classifier was effective in the training
stage as it was able to generalise on this new training data and achieve a high accuracy score. This data was
used as a test set to evaluate the accuracy of how well the training data prepared the model. The accuracy
on this testing set is 90% so we can assume that the model wil be able to generalise to new data and make
accurate predictions. Overtraining on the training data can cause problems when the model is presented with
new data as it will be able to identify all the important information in the training data but not when theres new
data the model has not seen before. Thus making the model incapable of generalising. The aim of the model
is to be able to generalise well to new data after being trained on a set of traning data.

generate report on the classifier

In [11]: print(sklearn.metrics.classification_report(y_test,pred))

precision recall f1-score support

3 0.89 0.92 0.90 1339


4 0.91 0.92 0.91 1243
5 0.88 0.88 0.88 1141
6 0.95 0.94 0.95 1350
9 0.91 0.88 0.89 1365

micro avg 0.91 0.91 0.91 6438


macro avg 0.91 0.91 0.91 6438
weighted avg 0.91 0.91 0.91 6438

checking for miscalculated images

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 2/7


19/05/2019 workshop 3 Part 2

In [12]: mis_index=np.where(y_test != pred)[0]#checking matrices that dont correspond


print("Number of miscalculated data points is ",mis_index.shape[0])

Number of miscalculated data points is 600

visualising 10 misclassified images from the mismatched matrices

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 3/7


19/05/2019 workshop 3 Part 2

In [13]: random_arr=np.random.choice(mis_index, 10)

num=0

for random in random_arr:


num=num+1
test_2=X_test[random]
test_2=np.reshape(test_2,(28,28))

print("Image: ",num)

plt.imshow(test_2)
plt.show()

print("Random Index: ",random)


print("Predicted Value",pred[random])
print("Target Value: ",y_test[random])
print("\n")

Image: 1

Random Index: 2946


Predicted Value 5
Target Value: 9

Image: 2

Random Index: 5391


Predicted Value 6
Target Value: 3

Image: 3

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 4/7


19/05/2019 workshop 3 Part 2

Random Index: 2814


Predicted Value 4
Target Value: 6

Image: 4

Random Index: 3422


Predicted Value 4
Target Value: 6

Image: 5

Random Index: 2096


Predicted Value 5
Target Value: 3

Image: 6

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 5/7


19/05/2019 workshop 3 Part 2

Random Index: 6056


Predicted Value 3
Target Value: 5

Image: 7

Random Index: 619


Predicted Value 4
Target Value: 9

Image: 8

Random Index: 736


Predicted Value 3
Target Value: 5

Image: 9

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 6/7


19/05/2019 workshop 3 Part 2

Random Index: 5131


Predicted Value 3
Target Value: 5

Image: 10

Random Index: 5615


Predicted Value 4
Target Value: 3

These 10 images have been missclassified by the handwriting classifier. From the generated report it shows
the accuracy for correctly classfying the specifc numbers in the test data, which is a total of 6438. The rest of
the data is used for training the classifier. The images missclassified are not all perfect e.g. image 1 shows a
9 but the predicted value was 5 but looking at the handwriting style it can also be seen as a 4. Looking at
image 5 it clearly shows a 3 whereas the classifier once again predicted 5. Observing these images it shows
that the number 5 is a regular missclassification as all three images showing a 5 get predicted as a 3.
Different handwriting styles result in the classification task making errors on some of the images shown
above.

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 7/7

Potrebbero piacerti anche