EE554 FinalProjectReport MatthewMcTaggart Sp2018

Matthew McTaggart EE554 Report Due: April 29, 2018
CNN - Transfer Learning Final Project
Names of Group Members:

Matthew McTaggart
Summary:
This project serves to introduce me to applying transfer learning to aid in solving my own two class
classification problem. I performed transfer learning using GoogLeNet to evaluate change of domain
classification from regular images to bronchoscopy images. I expect to achieve good results of
classification accuracy of greater than 80%. More specifically, I expect to achieve a sensitivity of 95% or
greater because with my data, I want to prevent misclassification of informative frames versus
uninformative frames. This relates to our course material because we extensively studied machine
learning and applications of convolutional neural networks. My results have exceeded my expectations!
Quick Overview of My Classification Problem:

This project utilizes transfer learning using MATLAB to classify informative and uninformative frames
from bronchoscopy procedures. The convolutional neural network (CNN) that was used for transfer
learning was GoogLeNet. The initial GoogLeNet is trained on the ImageNet Large-Scale Visual
Recognition Challenge (ILSVRC14) which contains more than a million images and contains 1,000 object
categories. My domain instead consists of two classes, being informative and uninformative, for 6,905
images captured over 22 video sequences from 9 patients and one phantom. A phantom refers to a
plastic mold of a typical airways of the lungs. There is a total of 3,398 informative frames and 3,507
uninformative frames. The support for the two classes are well balanced.
An informative frame is one in which is completely clear, in which a doctor can assess the image and find
meaningful informative. On the contrary, an uninformative frame is the opposite. A frame becomes
uninformative when the video frame becomes too blurry, out of focus, too dark, or a general build up of
mucous, blood, or water.
In this report I test and evaluate how well transfer learning works for datasets that are completely
different domains. More specifically, I see how transfer learning works when I remove the last three
layers - ‘loss3-classifier’, ‘prob’, and ‘output’ – and replace them with – ‘fully connected layer’, ‘softmax’,
and ‘classification layer’. This allows me to train the classifier for my data without changing the feature
vector extracted from the CNN. Depending on these results, we can assess how the change of image
domain affects the performance of classification. Then we can discuss whether this method is
appropriate or if deeper transfer learning is required to produce an acceptable classification model.
It is important to note that 100% classification accuracy will be impossible to achieve and there is some
sort of ambiguity. The decision of whether a frame is informative is subjective to who ground truths the
data. More specifically, if I ground truth the data the decisions are based from my experience. If some
doctor ground truths the data, it most likely wont match mine because they have different expectations
for what they would say is informative. Finally, if you were to ground truth the data it would be different
from mine and the doctor. The most important factor for performance is minimizing the
misclassification of informative frames as uninformative frames. Due to the subjectivity, it is reasonable
to accept a fuzzy boundary for uninformative frames being classified as informative frames. As a
thought, it might be appropriate to consider frames as informative, ambiguous informative, and
uninformative. The fuzzy boundary groups some ambiguous informative with informative frames, but
still completely rejects the uninformative. Unfortunately, we don’t have the luxury for class distinction
like with cat, dog, sheep, pencil, car, and so on for normal images.
The following figure 1 shows examples of ground truth video frames that was performed by me.
Figure 1 – Examples of Informative and Uninformative Frames
By observation, we can see that informative frames are quite clear, while the uninformative frames are
blurry, dark, and hard to distinguish. It is important to note that all these images come from video
sequences with a frame rate of 30 frames per second, so its likely many training images are similar to
others, which doesn’t pose any issues. Due to randomness, three informative frames happen to be very
similar even though they are not the same image.
Transfer Learning:
I performed transfer learning by extracting each frame from the 22 video sequences and, using its
ground truth classification of whether it is informative or uninformative, saved the image file to the
correct folder. There are two folders, one named ‘Informative’ and the other ‘Uninformative’. I used
80% of the data for training and 20% of the data for testing and validation. Through my own analysis
from research, the data is quite self-similar. That is, the more data is introduced for training, the less the
accuracy improves because the feature vectors are quite limited in its range. Another way to think about
this is to consider that all frames are within the airways of some patient, and it is impossible to tell what
patient it is, where in the lungs it is located, and so on. When we analyze the image frames, all we can
say is that its some airway, there is not much variation. The only variation is in the quality of frame being
informative or uninformative. There is no multi-classification seen in other datasets which may contain
1,000 classifications. Finally, the airways are quite restricted for the range of color space, it is skewed
heavily towards red – so there’s no added dimensionality of color analysis.
As mentioned previously, I remove the last three layers - ‘loss3-classifier’, ‘prob’, and ‘output’ – and
replace them with – ‘fully connected layer’, ‘softmax’, and ‘classification layer’. I allowed 1,695 iterations
with a max number of epochs as 3. After about 600 iterations, the loss and accuracy began to converge
to a constant value. The learning rate was fixed at a rate of 0.0001 using stochastic gradient descent
with a mini batch size of 10. The following figure 2 shows a timeline of the training process.
Figure 2 – The training process of fine tuning the last three layers of GoogLeNet.
Although its hard to see the details on the figure, the important bit is to be able to visualize the general
trend of the training process.
During the training process, the images were normalized before training the CNN. In fact, all inputs are
normalized even during classification. I did not do any data augmentation because I had a fair amount of
data which included all cases of how a frame might be informative or uninformative. A good way to
evaluate the performance of this system is to consider a few Monte-Carlo simulations and average the
results across the simulations. The idea is that each simulation is a different random sample of images
for the same percentage of test/train split to verify the robustness. Lastly, it might be beneficial to plot
the performance of the transfer learning by considering different test/train splits, perhaps from
10%/90% to 90%/10% splits. Unfortunately, due to the time it takes per transfer learning (it took my
GPU about 54 mins), I won’t be able to extensively evaluate this. However, I can provide some insight
from my research to verify the consistency of the training results of different training and testing data
splits over a few Monte-Carlo simulations.
Figure 3 shows the performance measure from my research of splitting the training and testing data for
training an SVM classifier on a feature vector that is extracted through my own methods. We can see
that the performance of the SVM for classification does not vary much with the different testing and
training splits from 10%|90% to 90%|10%. Each percentage split is averaged over three Monte-Carlo
simulations.
Figure 3 – Performance of my SVM for classification from a separate feature vector than the CNN
feature vector.
Results:
The following figure shows the performance evaluation of the transfer learning. It shows the balanced
accuracy, the accuracy, the sensitivity and the specificity. Sensitivity refers to the percentage of all
informative frames that are classified correctly as informative, and specificity refers to the percentage of
all uninformative frames that are classified correctly as uninformative.
Figure 3 – Performance measure of the transfer learning on 80% training data and 20% testing data.
I am quite surprised by the results and they were a lot better than I expected. It has a high sensitivity of
97.1% which is the most important thing, but it also has a relative high specificity of 90.6%. In my
research I am able to get a sensitivity of 99.6% with the trade off of 50% specificity. Some more fine
tuning of this network, and even the use of CNN in my problem seem to be extremely valid approaches.
The CNN worked best for the informative class as stated by the results and signifies a step in the right
direction. The inaccuracies in the CNN classification can stem from the results from what was discussed
in the quick summary. To repeat, for ground truth it is subjective to the person performing it – there is
no clear distinction between informative and uninformative as there is a subjective, fuzzy boundary.
Results from the CNN are shown in the following figure 4. For informative frames that are definitely
clear to us, you see the CNN correctly classifies them with a high uncertainty. An interesting observation
is the third image (row 1, column 3). This has a low score for uninformative because while it appears to
be in focus, its not really looking at anything. Its in focus because you can see some details within the
wall of the airway. Another observation is from image 13 (row 4, column 1). This image is labeled as
uninformative but quite certain on being informative. At a closer observation, this frame is
uninformative because the frame is slightly blurry due to motion – the texture details along the walls are
too blurred for any diagnosis. The last observation is for image 9 (row 3, column 1). This image seems to
be highly uninformative because most of the image is looking at an airway, but there is some
information of the airway towards the bottom of the image, however it’s a bit dark so its
understandable why this would be classified as uninformative with a relatively high percentage.
Figure 4 – Results of the classification from the CNN classifier. The top label specifies the CNN
classification and percentage certainty, while the bottom label specifies the ground truth label.
I think the introduction of fine-tuning for transfer learning will definitely improve the accuracy of the
CNN. Some details that causes an image to be labeled as uninformative is small motion from the
endoscope. As the shutter speed is relatively low, shaky movements can blur the texture details of the
image without globally blurring the image. Fine-tuning the CNN might create an activation map that
highlights this behavior, which in turn can improve the decision between frames that are more difficult
to classify. I am interested in looking further into CNN to accompany my research project. With time and
some fine-tuning, I am sure the accuracies should be higher than my current method.
Who did what in my group?

As I am the only person in my group, I did it all! With help from MATLAB’s documentation and resources
of course.

EE554 FinalProjectReport MatthewMcTaggart Sp2018

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

EE554 FinalProjectReport MatthewMcTaggart Sp2018

Caricato da

Copyright:

Formati disponibili

Matthew McTaggart EE554 Report Due: April 29, 2018

CNN - Transfer Learning Final Project

Names of Group Members:

Quick Overview of My Classification Problem:

Figure 1 – Examples of Informative and Uninformative Frames

Who did what in my group?

Potrebbero piacerti anche