Workshop 3 Part 2

19/05/2019 workshop 3 Part 2
1608275 Tamur Khan
Importing libraries
In [1]: import numpy as np

import pandas as pd
from matplotlib import pyplot as plt
from PIL import Image
import sklearn
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn.metrics import accuracy_score
#test value
x=5
loading custom dataset
In [2]: df = pd.read_csv("handw.csv")
creating np arrays for the required data
In [3]: target=df['target'].values
labels=df.iloc[:,df.columns != 'target'].values
test=df.iloc[[x],df.columns != 'target'].values
Reshaping the data to a 28x28 matrix to allow visualising
In [4]: test=np.reshape(test,(28,28))
Visualising data
In [5]: plt.imshow(test, interpolation='nearest')

plt.show()
print("Target Value",target[x])
Target Value 3
split data into training and test sets
localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 1/7

In [6]: X_train, X_test, y_train, y_test = train_test_split(

labels, target, test_size=0.22)
In [7]: len(X_train)
Out[7]: 22823
In [8]: len(X_test)
Out[8]: 6438
Creating the decision tree classifier and running it on training data
In [9]: clf = tree.DecisionTreeClassifier()

clf = clf.fit(X_train, y_train)
checking accuracy of the fit
In [10]: pred = clf.predict(X_test)

print ("DecisionTreeClassifier accuracy score is ",accuracy_score(y_test, pred))
DecisionTreeClassifier accuracy score is 0.9068033550792172
Generalisation
The evaluation above is based on the test data and shows that the classifier was effective in the training
stage as it was able to generalise on this new training data and achieve a high accuracy score. This data was
used as a test set to evaluate the accuracy of how well the training data prepared the model. The accuracy
on this testing set is 90% so we can assume that the model wil be able to generalise to new data and make
accurate predictions. Overtraining on the training data can cause problems when the model is presented with
new data as it will be able to identify all the important information in the training data but not when theres new
data the model has not seen before. Thus making the model incapable of generalising. The aim of the model
is to be able to generalise well to new data after being trained on a set of traning data.
generate report on the classifier
In [11]: print(sklearn.metrics.classification_report(y_test,pred))
precision recall f1-score support
3 0.89 0.92 0.90 1339

4 0.91 0.92 0.91 1243
5 0.88 0.88 0.88 1141
6 0.95 0.94 0.95 1350
9 0.91 0.88 0.89 1365
micro avg 0.91 0.91 0.91 6438

macro avg 0.91 0.91 0.91 6438
weighted avg 0.91 0.91 0.91 6438
checking for miscalculated images

In [12]: mis_index=np.where(y_test != pred)[0]#checking matrices that dont correspond

print("Number of miscalculated data points is ",mis_index.shape[0])
Number of miscalculated data points is 600
visualising 10 misclassified images from the mismatched matrices

In [13]: random_arr=np.random.choice(mis_index, 10)
num=0
for random in random_arr:

num=num+1
test_2=X_test[random]
test_2=np.reshape(test_2,(28,28))
print("Image: ",num)
plt.imshow(test_2)
plt.show()
print("Random Index: ",random)

print("Predicted Value",pred[random])
print("Target Value: ",y_test[random])
print("\n")
Image: 1
Random Index: 2946

Predicted Value 5
Target Value: 9
Image: 2
Random Index: 5391

Predicted Value 6
Target Value: 3
Image: 3

Random Index: 2814

Predicted Value 4
Target Value: 6
Image: 4
Random Index: 3422

Predicted Value 4
Target Value: 6
Image: 5
Random Index: 2096

Predicted Value 5
Target Value: 3
Image: 6

Random Index: 6056

Predicted Value 3
Target Value: 5
Image: 7
Random Index: 619

Predicted Value 4
Target Value: 9
Image: 8
Random Index: 736

Predicted Value 3
Target Value: 5
Image: 9

Random Index: 5131

Predicted Value 3
Target Value: 5
Image: 10
Random Index: 5615

Predicted Value 4
Target Value: 3
These 10 images have been missclassified by the handwriting classifier. From the generated report it shows
the accuracy for correctly classfying the specifc numbers in the test data, which is a total of 6438. The rest of
the data is used for training the classifier. The images missclassified are not all perfect e.g. image 1 shows a
9 but the predicted value was 5 but looking at the handwriting style it can also be seen as a 4. Looking at
image 5 it clearly shows a 3 whereas the classifier once again predicted 5. Observing these images it shows
that the number 5 is a regular missclassification as all three images showing a 5 get predicted as a 3.
Different handwriting styles result in the classification task making errors on some of the images shown
above.

Workshop 3 Part 2

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Workshop 3 Part 2

Caricato da

Copyright:

Formati disponibili

19/05/2019 workshop 3 Part 2

1608275 Tamur Khan

In [1]: import numpy as np

loading custom dataset

creating np arrays for the required data

Reshaping the data to a 28x28 matrix to allow visualising

In [5]: plt.imshow(test, interpolation='nearest')

split data into training and test sets

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 1/7

In [6]: X_train, X_test, y_train, y_test = train_test_split(

Creating the decision tree classifier and running it on training data

In [9]: clf = tree.DecisionTreeClassifier()

checking accuracy of the fit

In [10]: pred = clf.predict(X_test)

DecisionTreeClassifier accuracy score is 0.9068033550792172

generate report on the classifier

precision recall f1-score support

3 0.89 0.92 0.90 1339

micro avg 0.91 0.91 0.91 6438

checking for miscalculated images

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 2/7

In [12]: mis_index=np.where(y_test != pred)[0]#checking matrices that dont correspond

Number of miscalculated data points is 600

visualising 10 misclassified images from the mismatched matrices

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 3/7

In [13]: random_arr=np.random.choice(mis_index, 10)

for random in random_arr:

print("Random Index: ",random)

Random Index: 2946

Random Index: 5391

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 4/7

Random Index: 2814

Random Index: 3422

Random Index: 2096

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 5/7

Random Index: 6056

Random Index: 619

Random Index: 736

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 6/7

Random Index: 5131

Random Index: 5615

localhost:8888/notebooks/Desktop/AI %26 Machine Learning/workshop 3 Part 2.ipynb 7/7

Potrebbero piacerti anche