Sei sulla pagina 1di 49

Design and Development of Convolutional Neural

Network for Early Tumor Detection in Brain

Project Team

Sl. No. Reg. No. Student Name


1. 15ETEC004403 ANIMESH SAHU
2. 15ETEC004046 KAKANURU LIKHITHA
3. 15ETEC004302 AKSHAY.V
4. 15ETEC004021 MOUNUSHA.S
5. 15ETEC004303 ARCHANA VISHNU NAIK

Supervisors: Supervisor : Mr. Abdul Imran Rasheed


Co-Supervisor: Dr. Ugra Mohan Roy

May – 2019
B-TECH ECE 2015
FACULTY OF ENGINEERING AND TECHNOLOGY
M. S. RAMAIAH UNIVERSITY OF APPLIED SCIENCES
Bangaluru -560 054

Design and Development of Convolutional Neural Network for Early Tumor Detection in Brain

i
FACULTY OF ENGINEERING AND TECHNOLOGY

Certificate
This is to certify that the Project titled “Design and Development of Convolutional Neural
Network for Early Tumor Detection in Brain” is a bona-fide work carried out in the
Department of Electronics and Communications Engineering by Mr. Animesh sahu
(15ETEC004403), Ms.Kakanuru likhitha (15ETEC004046), Mr. Akshay.V
(15ETEC004302), Ms. Mounusha.S (15ETEC004021) and Ms. Archana Vishnu Naik
(15ETEC004303)in partial fulfilment of requirements for the award of B. Tech. Degree
in Electronics and communications Engineeringof Ramaiah University of Applied
Sciences.

May – 2019
Supervisors

Supervisor : Mr. Abdul Imran Rasheed


Co-Supervisor: Dr. Ugra Mohan Roy

Dr.Raghavendra Kulkarni Prof. Dr. Arulanantham


Head-Dept. of ECE,RUAS Dean-FET,RUAS

ii
M.S.Ramaiah University of Applied Sciences – Faculty of Engineering and Technology (FET)

Declaration

Design and Development of Convolutional Neural Network for Early


Tumor Detection in Brain
The project work is submitted in partial fulfilment of academic requirements for the
award of B.Tech. Degree in the Department of Electronics and Communications
Engineering of the Faculty of Engineering and Technology of Ramaiah University of
Applied Sciences. The project report submitted herewith is a result of our own work and
in conformance to the guidelines on plagiarism as laid out in the University Student
Handbook. All sections of the text and results which have been obtained from other
sources are fully referenced. We understand that cheating and plagiarism constitute a
breach of University regulations, hence this project report has been passed through
plagiarism check and the report has been submitted to the supervisor.

Sl. No. Reg. No. Student Name Signature


1. 15ETEC004403 ANIMESH SAHU
2. 15ETEC004046 KAKANURU LIKHITHA
3. 15ETEC004302 AKSHAY.V
4. 15ETEC004021 MOUNUSHA.S
5. 15ETEC004303 ARCHANA VISHNU NAIK

Date : 14th May 2019

Design and Development of Convolutional Neural Network for Early Tumor Detection in Brain

i
Acknowledgements

We take this opportunity to express our sincere gratitude and appreciation to Ramaiah
University of Applied Sciences, Bengaluru for providing us an opportunity to carry out
our group project work.

Most prominently, we would like to thank Dr. Raghavendra Kulkarni, H.O.D., Electronics
and Communication Department, Faculty of Engineering and Technology, RUAS for his
encouragement and support throughout the group project work.

With profound sense of appreciation, we acknowledge the guidance and support


extended by Mr. Abdul Imran Rasheed , Professor, Electronics and Communication
Department, Faculty of Engineering & Technology, RUAS. Their encouragement and
invaluable technical support have helped us to complete our group project successfully
within the stipulated time.

We also extend our thanks to Dr. Ugra Mohan Roy, Professor, Electronics and
Communication Department, Faculty of Engineering & Technology, RUAS and Control
Systems Laboratory who have helped us towards successful completion of our group
project.

We would like to express our thanks and gratitude to Dr. Arulanantham, Dean, Ramaiah
University of Applied Sciences for giving us the opportunity to study this course and for
his continuous support.

We also thank all the faculty and supporting staff of the Electronics and Communication
Department, Faculty of Engineering & Technology who have helped me towards
successful completion of this dissertation work. Finally, we thank to my friends and
family for being with me in many issues during the project and their blessing have been
a constant source of inspiration for me.

ii
Abstract

Image classification finds its suitability in applications ranging from medical diagnostics
to autonomous vehicles. Different approaches have been developed to perform image
classification based on deep belief network (DBN), stacked auto encoder (SAE) and
neural network (NN).
The existing architectures are computational exhaustive, complex and less accurate. An
accurate, simple and hardware efficient architecture is required to be developed for
image classification. In this thesis CNN model for image classification has been proposed.
The developed CNN model is of 13-layer with kernel size of 28x28 for input image size of
28x28. The 13-layer CNN has convolution layer, sub-sampling layer and fully-connected
layer, in convolution layer sliding-filter 2D convolution approach is adopted for reduced
arithmetic operations. IEEE754 single-precision floating point format has been adopted
for processing. The RTL architecture of CNN has been developed using pipeline-parallel
approach to achieve better speed. To optimize power average pooling approach for sub-
sampling has been utilized.
The developed CNN architecture utilizes 14400 multipliers and 13824 adders. The CNN
architecture provides the output with computational complexity of only 1540 multiply-
accumulate operations.
The field of machine learning has taken a dramatic twist in recent times, with the rise of
the Artificial Neural Network (ANN). These biologically inspired computational models
are able to far exceed the per-formance of previous forms of artificial intelligence in
common machine learning tasks.
One of the most impressive forms of ANN architecture is that of the Convolutional
Neural Network (CNN). CNNs are primarilyused to solve difficult image-driven pattern
recognition tasks and with their precise yet simple architecture, offers a simplified
method of getting started with ANNs

iii
Table of Contents

Certificate ……………………………………………………………………………………………………………….…(ii)
Declaration………………………………………………………………………….……..................................(iii)
Acknowledgements……………………………………………………………..…………………………………….(iv)
abstarct …………………………………………………..……………………………………………………………..…(v)
Table of Contents………………………………………….……………………………………………………..…. (vi)
List of Tables……………………………………………….………………………………………………………..…..(x)
List of Figures……………………………………………………………………………………..………………….…(xi)
Abbreviations and Acronyms………………………………………… …………………………………..……(xii)
Chapter-1: Introduction…………………………………………………… …..…………..………………………01
Preamble to the Chapter
1.1 Introduction of Image Classification …………………………………………………………01
1.2 Motivation of Project ……………………………………………………………………………….01
1.3 Scope of the Project………………………………………………………………………………….03
1.4 Applications of Image Classification …………………………………………………………03
1.5 Thesis Organization…………………………………………………………………………………..04

Chapter-2: Background Theory………………………………………………………………………………….06


Preamble to the Chapter
2.1 Biological inspiration……………………………………………………………………………..06
2.2 introduction to convolution neural network…………………………………………..09
2.3 network topologies for image classification……………………………………………15
2.4 Compression of neural network models…………………………………………………23
Chapter-3:Aim and Objectives……………………………………………………………………………….25
Preamble to the Chapter
3.1 Title of the Project………………………………………………………………………………..25
3.2 Aim of the Project…………………………………………………………………………………25

iv
3.3 Objectives of the Project……………………………………………………………………….25
3.4 Project specification……………………………………………………………………………..25
3.5 Methods and Methodology……………………………………………………………………26
Chapter-4: development of CNN based image classification…………………………………….29
Preamble to the Chapter
4.1 Dataset…………………………………………………………………………………………………29
4.2 Design of convolutional neural network algorithm……………………………….30
4.2.1 Type of layer’s…………………………………………………………………………32
4.2.2 Training CNN…………………………………………………………………………..33
Chapter-5: Results ………………………………………………………………………………………………..35
Preamble to the chapter
5.1 CNN performanceon MATLAB………………………………………………………………35
5.2 Segmentation of the MRI images………………………………………………………….36
5.3 shows the final output of that for a benign and malignanat images………37
Chapter-6: Conclusions ……………………………………………….………………………………………38
References

v
List of Tables

A typical List of Tables content page looks like this

Table 2. 1 Comparison of Different CNN Topologies for Image Classification on


ImageNet………………………………………………………………………………………………………………...22
Table2.2: Comparison of Different CNN Topologies for Image Classification on
ImageNet………………………………………………………………………………………………………………...23
Table 3.1 : detection and visualization of coronary
blockages……………………….………………………………………………………………………………………..28
Table 4. 1: BRATS Dataset
Specifications……………………………………………………………….………………………………………….29

vi
List of Figures
________________________________________________________________________

Figure 2.1: A Biological Neuron and its Artificial Counterpart……………………………………07


Figure 2.2: Example of a Neural Network with Fully-Connected Layers,
Inputs and Outputs………………………………………………………………………………………………..08
Figure2.3 Illustration Of CNN Layer Architecture……………………………………………………..11
Figure 2.4: The Non-Linear Activation Functions;
Sigmoid, tanh, ReLU, PReLU and ELU……………………………………………………………………….13
Figure 2.5 Block Diagram of Convolution Neural Network…………………………………………15
Figure. 2.6 An illustration of the SegNet architecture……………………………………………….16
Figure 2.7 Illustration Of AlexNet………………………………………………………………………………19
Figure 2.8 Illustration Of VGG………………………………………………………………………………….20
Figure 2.9 Illustration Of VGG……………………………………………………………………………………20
Figure 2.10 Illustration Of ResNet………………………………………………………………………….…21
Figure 4.1 shows the brain MRI images of benign and malignant tumors………………….…30
Figure 5.1 Training progress……………………………………………………………………………………35
Figure 5.2 Status of the traning performance…………………………………………………………….36
Figure 5.3 Segmentation of MRI image (benign)…………………………………………………………37
Figure 5.4 Segmentation of MRI image (malignant)………………………..…………………………37

vii
Abbreviation and Acronyms
________________________________________________________________________

CNN – Convolutional Neural Network

viii
M.S.Ramaiah University of Applied Sciences – Faculty of Engineering and Technology (FET)

1. Introduction

Preamble
This chapter deals with the introduction of image classification, motivation of the
project, scope of the project, applications of image classification techniques and brief
details of thesis organization are discussed.
1.1 Introduction of Image Classification

Image classification is assigning pixels in the image to categories or classes of interest. In


order to classify a set of data into different classes or categories, the relationship
between the data and the classes into which they are classified must be well
understood. To achieve this by computer, the computer must be trained. Training is key
to the success of classification, Classification techniques were originally developed out of
research in Pattern Recognition field. Computer classification of remotely sensed images
involves the process of the computer program learning the relationship between the
data and the information classes.
Image understanding/classification is becoming a vital feature in ever more applications
ranging from medical diagnostics to autonomous vehicles. Many applications demand
for embedded solutions that integrate into existing systems with tight real-time and
power constraints. Convolutional Neural Networks (CNNs) presently achieve record-
breaking accuracies in all image understanding benchmarks, but have a very high
computational complexity. CNN thus call for small and efficient, yet very powerful
computing platforms.
1.2 Motivation of Project

Image understanding is a very difficult task for computers. Nevertheless, advanced


Computer Vision (CV) systems capable of image classification, object recognition and

Design and Development of Convolutional Neural Network for Early Tumor Detection in Brain

1
scene labeling are becoming increasingly important in many applications in robotics,
surveillance, smart factories and medical diagnostics. Unmanned aerial vehicles and
autonomous cars, which need to perceive their surroundings, are further key
applications. In the last few years, significant progress has been made regarding the
performance of these advanced CV systems. The availability of powerful computing
platforms and the strong market pull have shaped a very fast-paced and dynamic field of
research. Former approaches to image understanding, which mainly relied on hand-
engineered features and hard-coded algorithms, are increasingly being replaced by
machine learning concepts, where computers learn to understand images by looking at
thousands of examples. These advanced learning algorithms, which are based on recent
high-performance computing platforms as well as the abundance of training data
available today, are commonly referred to as deep learning. Convolutional Neural
Networks (CNNs) currently represent the most promising approach to image
understanding in CV systems. These brain-inspired algorithms consist of multiple layers
of feature detectors and classifiers, which are adapted and optimized using techniques
from machine learning.
The idea of neural networks has been around for almost 80 years, yet only the latest
generations of high-performance computing hardware have allowed the evaluation and
training of CNNs deep and wide enough for good performance in image understanding
applications. The progress in these last years has been amazing though, and state-of-
the-art convolutional neural networks already rival the accuracy of humans when it
comes to the classification of images.
This exceptional performance of CNNs comes at the cost of an enormous computational
complexity. The real-time evaluation of a CNN for image classification on a live video
stream can require billions or trillions of operations per second. The effort for image
segmentation and scene labeling is even significantly higher. While this level of
performance can be reached with the most recent Graphics Processing Units (GPUs),
there is the simultaneous wish to embed such solutions into other systems, such as cars,

2
drones, or even wearable devices, which exhibit strict limitations regarding physical size
and energy consumption. Future CNNs ASIC thus call for small and efficient, yet very
powerful computing platforms.
Different platforms have been considered for efficient high-performance
implementations of CNNs and Field-Programmable Gate Arrays (FPGAs) are among the
most promising of them. These versatile integrated circuits provide hundreds of
thousands of programmable logic blocks and a configurable interconnect, which enables
the construction of custom-tailored accelerator architectures in hardware. These have
the potential to deliver the computational power required by CNNs within the size and
power envelopes dictated by their respective applications
1.3 Scope of the Project

In the last decade the computational image understanding systems have drawn
considerable attentions of the researchers. The prime goal was to develop a reliable
image classification system so that the system can be utilized in wide range of
applications.
So, in this dissertation work, the feature extraction using convolution and classification
of images are accomplished by implementing suitable image classification algorithm on
MATLAB platform. The results achieved in MATLAB is validated by comparing the results
of both the phases.
1.4 Applications of Image Classification
1. Improve the efficiency of Google Street View Maps.
2. We can make the devices for enabling vision for blind and low vision people,
vision to speech like text to speech in android phones
3. Unmanned aerial vehicles and autonomous cars, which need to perceive their
surroundings, are further key applications

1.5 Thesis Organization

3
The purpose of this thesis is to implement image classification system using a
convolutional neural network. Image classification systems depend on many
characteristics. Due to all of that, to simplify the implementation of system, image
classification systems that focus on a set of images of different datasets. The system is
verified with standard datasets like MNIST, ImageNet. The system gives an
understanding of how image classification can be implemented, and how convolutional
neural networks can be used to implement advanced artificial intelligence problems.
The thesis is organized into Six Chapters:
Chapter 1 Deals with the introduction of the project, motivation of the project, scope of
the project, applications of image processing techniques and brief details of thesis
organization.
Chapter 2 Deals with background theory and literature review was carried out to know
about the other approaches of image classification and understanding the same.
Chapter 3 Describes the problem statement derived from the literature review and
identified research gaps through the critical review of relevant papers (as discussed in
the chapter 2). The Title, Aim, Objectives, Scope, and Methods and methodologies
carried out for finding the problem solution are discussed in this chapter
Chapter 4 Deals with problem solving through the realization of objectives (as
mentioned in the chapter 3). Problem solving encapsulates the part of literature review,
identification of CNN algorithms and their brief architectural description and MATLAB
and HDL architectures
Chapter 5 Describes the results and discussions of the dissertation work carried out in
the chapter 4 (problem solving). Tables, photographs (input-output images), justification
of realization of objectives, validations (substantiation) and recommendations of
dissertation work are discussed in this chapter

4
Chapter 6 Deals with the discussion of conclusions and future directions of the current
dissertation work based on interpretations of the results presented in the previous
chapters

5
2. Background Theory

Preamble
Literature survey:
This chapter introduces the main concepts behind thesis: Convolutional Neural
Networks (CNNs, section 2.1)
The following sections provides a temporary summary of neural networks generally, and
ofconvolution neural networks specially. First, associate intuitive rationalization for the
inner workings of neural networks is bestowed, together with a high-level description of
the networktraining method (2.1). following section dives into convolution neural
networks,architecture significantly fitted to process pictures, and provides a summary of
theconstruction of those networks (2.2).
2.1 Biological Inspiration: Neural networks are a family of computation architectures
originally inspired by biological nervous systems. The human brain contains
approximately 86 billion neurons connected by 1014–1015 synapses. Each neuron
receives input signals at its dendrites and produces output signals along its axon, which
branches out and connects to the dendrites of other neurons via synapses. These
synapses influence the transfer of information from one neuron to another by
amplifying or attenuating the signals or even inhibiting the transfer of the signal at other
synapses. Together, the billions of conceptually simple neurons form an incredibly
complex interacting network that enables us humans to see, hear, move, communicate,
remember, analyze, understand and even fantasize and dream.

6
Figure 2.1: A Biological Neuron and its Artificial Counterpart.
Artificial Neurons: The artificial neuron, depicted in figure, is the basic building block in
artificial neural networks. 2.1. The artificial neuron receives several input signals from
other neurons. These input signals are multiplied by weights to simulate the dendrites
synaptic interaction. The weighted input signals are summarized, biased with a fixed b
and fed into a non-linear activity. Which produces the output signal of the neuron 𝑦 =
𝑓(𝑝[𝑥𝑖 × 𝑤𝑖 ] + 𝑏). The weights w can be seen as the tuning knobs that define the
neuron's reaction to a given input signal, and their values can be adjusted to learn to
approximate a desired output signal (Dong et al. 2017)

7
Figure 2.2: Example of a Neural Network with Fully-Connected Layers, Inputs
and Outputs
Organization of the Neural Network: A neural network is formed by interconnecting
many artificial neurons. The neurons are usually arranged in a directed acyclic graph to
form a neural feed-forward network. The neurons are further grouped into layers, and
connections are allowed only between neurons of adjacent layers. Figure 2.2 shows an
example of a four-layer neural feed-forward network with fully-connected layers and 2
outputs
Network Training: The parameters in a neural network are not chosen manually, but
learned during a training phase. The most popular training approach is called supervised
learning and requires a set of labelled training examples. An optimization passes through
all examples of training is called an epoch of training. A complete training, depending on
the type of data and the capacity of the neural network. A complete training session can
take from one to a few hundred epochs anywhere. The training begins with small,
randomly initialized weights. The examples are fed through the network one by one (so-
called forward pass). The resulting outputs are compared to ground truth labels using a
loss function, which measures how much the output deviates from the expected
outcome. The learning process' goal is then to minimize this loss (or error) on the
training set by optimizing the weight parameters. Stochastic Gradient Descent is the
most popular method of optimization currently used to train neural networks. The
gradient descent algorithm calculates a gradient vector that describes the influence of
each weight on the error. These gradients can be calculated efficiently by spreading the

8
output error back through the network (so called backward pass). The optimization loop
repeatedly takes an example of training, calculates the current loss (forward pass),
derives the gradient vector (backward pass) and adjusts all weights by a small amount in
the opposite direction of their respective gradient (update phase). The magnitude of
these updates is determined by the so-called rate of learning/learning rate. An
alternative version of the algorithm, called Batch Gradient Descent, defers weight
updates and first calculates and averages the gradients of a batch of training examples.
This allows the computation to be more efficiently vectorized and executed on platforms
that support vector instructions, including GPUs, DSPs, and most CPUs.
Performance Validation: The network ideally converges to a solution with minimal loss
and therefore with a good approximation of the desired output on the training set by
iteratively adjusting the weights. The model's performance is verified every few epochs
with an array of validation examples that were not used during training. If the training
set is representative for the actual "real-world" data, the network also provides good
estimates for previously unseen examples. However, if the training set is too small or the
learning capacity of the network is too high, the neural network can memorize "by
heart" examples and lose its ability to generalize. Such over fitting can be counteracted
by extended training sets (possibly using data augmentation strategies such as mirroring,
rotation and colour transformation) as well as changes in network structure (such as
adding regularization methods) (Zhang n.d.).
2.2 Introduction to Convolution Neural Networks:
Convolution Neural Network is a Deep Learning algorithm which can take in an input
image, assign importance (learnable weights and biases) to various aspects/objects in the
image and be able to differentiate one from the other. Convolution Neural Networks
(CNNs) are a special class of neural networks particularly suitable for operation on 2D
input data such as images. They are widely used for image classification, object
recognition and scene labelling tasks.

9
The pre-processing required in a ConvNet is much lower as compared to other
classification algorithms. While in primitive methods filters are hand-engineered, with
enough training, ConvNets have the ability to learn these filters/characteristics.
The architecture of a ConvNet is analogous to that of the connectivity pattern of
Neurons in the Human Brain and was inspired by the organization of the Visual Cortex.
Individual neurons respond to stimuli only in a restricted region of the visual field known
as the Receptive Field. A collection of such fields overlap to cover the entire visual area.
A ConvNet is able to successfully capture the Spatial and Temporal dependencies in an
image through the application of relevant filters. The architecture performs a better
fitting to the image dataset due to the reduction in the number of parameters involved
and reusability of weights. In other words, the network can be trained to understand the
sophistication of the image better.
The role of the ConvNet is to reduce the images into a form which is easier to process,
without losing features which are critical for getting a good prediction. This is important
when we are to design an architecture which is not only good at learning features but
also is scalable to massive datasets.
Nomenclature: The input to each layer in a convolution neural network consists of a
stack of dimensional din images( ℎ𝑖𝑛 × 𝑤𝑖𝑛 ), so-called input images or input feature
maps. Each layer produces a stack of dimensional 2D images,(ℎ𝑜𝑢𝑡 × 𝑤𝑜𝑢𝑡 ) called output
feature maps.

10
Figure: 2.3 Illustration Of CNN Layer Architecture
Motivation: When neural networks are used for image-related tasks, their input usually
consists of pixel data. Even for an image with a modest resolution of 256×256 RGB
pixels, the resulting input consists of 256×256×3=200000 elements, and a subsequent
fully connected neural network layer would require billions of weights. Fortunately,
there is no need for full connectivity when dealing with pixel data thanks to the location
of information in images. In order to decide whether there is a car in the centre of an
image, one does not need to consider the top-right corner pixel colour and the bottom-
right pixels usually do not influence the class assigned to the top-left pixels. The
important information in images that can be captured from local neighbourhood
relations. Strong contrasts indicate edges, aligned edges result in lines, combined lines
may result in circles and contours, circles may outline a wheel, and multiple nearby
wheels may point to the presence of a car. This location of information in images is
exploited in convolution neural networks by replacing the fully connected layers with
convolution layers (Abdelouahab et al. 2017).
Weight Sharing by Convolution:
A convolution layer contains a ( 𝑑𝑖𝑛 × 𝑑𝑜𝑢𝑡 ), array of kernels, which are small filters of
size
𝑘 × 𝑘(typically 1×1,3×3,5×5,7×7,or 11×11). These kernels are applied by 2D convolution
to the input feature maps. Thus, each output pixel is generated in the input image from
a small local receptive field.𝑑𝑜𝑢𝑡 Filter kernels are slid over each feature map of the

11
input. This results in 𝑑𝑜𝑢𝑡 partial output feature maps for each input feature map. The
final output feature maps are formed by summing the partial output feature maps
contributed by all 𝑑𝑖𝑛 input depth. Instead of requiring.
(ℎ𝑖𝑛 × 𝑤𝑖𝑛 × 𝑑𝑖𝑛 )× (ℎ𝑜𝑢𝑡 × 𝑤𝑜𝑢𝑡 × 𝑑𝑜𝑢𝑡 ) weights, the number of parameters in a
convolution layer is thus reduced to (𝑘. 𝑘) × (𝑑𝑖𝑛 × 𝑑𝑜𝑢𝑡 ) The independence from the
input image dimensions also enables large images to be processed without an exploding
number of weights.

Layer Types: Convolution neural networks are built by stacking a number of generic
network layers that transform dimensional input images (ℎ𝑖𝑛 × 𝑤𝑖𝑛 × 𝑑𝑖𝑛 ) into
dimensional output feature maps (ℎ𝑜𝑢𝑡 × 𝑤𝑜𝑢𝑡 × 𝑑𝑜𝑢𝑡 ).
A typical CNN consists of the following types of layers:
 Convolution Layers: In this layer, filters of size(𝑘 × 𝑘) to generate the output
feature maps. For filters larger than 1×1, border effects reduce the output
dimensions. To avoid this effect, the input image is typically padded with𝑝 = 𝑘⁄2
zeros on each side. The filters can be applied with a stride , which reduces the
𝑤 ℎ
output dimensions to 𝑤𝑜𝑢𝑡 = 𝑖𝑛⁄𝑠 and ℎ𝑜𝑢𝑡 = 𝑖𝑛⁄𝑠
 Nonlinearity Layers: In this layer, a non-linear activation function to each input
pixel. The most popular activation function is the Rectified Linear Unit (ReLU)
which computes 𝑓(𝑥) = max(0, 𝑥) and clips all negative elements to zero. Early
networks used sigmoidal functions such as 𝑓(𝑥) = (1/(1 + 𝑒 −𝑥 ) hyperbolic
tangent ( tanh)f(x)=((𝑒 𝑥 − 𝑒 −𝑥 )/(𝑒 𝑥 + 𝑒 −𝑥 )) but these are no longer used
because of their computational complexity and their slowing effect on
convergence during training. More recent ideas include the Parametric ReLU
(PReLU)𝑓(𝑥) = max(𝛼. 𝑥, 𝑥) with learnable parameter, Maxout and Exponential
Linear Units (ELU). Figure 2.4 shows a comparison of some of these options.
 Pooling Layers: This layer reduce the spatial dimensions of the input by
summarizing multiple input pixels into one output pixel. Two popular choices are
max-pooling and average-pooling, which summarize their local receptive field by
taking the maximum or the average value of the pixels, respectively. They are
usually applied to a patch of 2×2 or 3×3 input pixels with a stride s = 2, but can
also be applied as global pooling to the whole input image, in order to reduce
the spatial output dimensions to 1×1 pixels.

12
 Fully-Connected Layers: This is often used as the last layers in a CNN to compute
the class scores in image classification applications. Even though the spatial
dimensions ℎ𝑖𝑛 𝑎𝑛𝑑 𝑤𝑖𝑛 in the last layers are typically heavily reduced, the
fullyconnected layers often account for most of the weights in these CNNs.

Figure 2.4: The Non-Linear Activation Functions ;Sigmoid, tanh, ReLU, PReLU and
ELU

Convolutionary Neural Network Training Frameworks: There are many popular


software frameworks specifically built for the design and training of neural networks,
including the Neural Network Toolbox for MATLAB, Theano with Lasagne and Keras
extensions, Torch, Tensor Flow and Caffe. Most of these frameworks can use one or
more GPUs to accelerate the training heavily
Network Specification: In order to fully describe a convolution neural network, the
following information is required:

1. A topological description of the network graph


2. A list of layers and their parameters
3. The weights and biases in each layer
4. Training

13
In MATLAB, the network description and the layer design are very simple and it is easy to
increase or decrease the layer as per the requirements. The weights are saved in .MAT
files. The back propagation algorithm supervised learning which applies the stochastic
gradient descent technique to minimize the error. The weights are updated using online
training principle. Other parameters such as the base learning rate, the learning rate
schedule, the batch size, the optimization algorithm, as well as the random seeds for
training initialization have to manually. These parameters and techniques are only
needed if the network is to be trained from scratch or fine-tuned, which refers to the
process of adapting a trained network to a different dataset. For inference, where a fully
trained network is utilized for forward-computation on new input data, the network
description and the trained weights are sufficient(Iandola et al. n.d.).

Figure 2.5 Block Diagram of Convolution Neural Network

2.3 Network Topologies for Image Classification:

14
One of Computer Vision's most interesting but also one of the hardest problems is Image
Classification: the task of correctly assigning one out of several possible labels to a given
image. Examples of this problem include yes-or-no decisions (Is there a person in front
of the car? Is this tissue sample cancerous?) but also recognition tasks with a large
number of labels (What dog breed is there? That's it? Who's on this photo?). Scene
labelling assigns a class to each pixel of the input image as an extension of image
classification.

SegNet :

Figure. 2.6 An illustration of the SegNet architecture.


There are no fully connected layers and hence it is only convolution. A decoder up-
samples its input using the transferred pool indices from its encoder to produce a sparse
feature map(s). It then performs convolution with a trainable filter bank to densify the
feature map. The final decoder output feature maps are fed to a soft-max classifier for
pixel-wise classification.
SegNet has an encoder network and a corresponding decoder network, followed by a
final pixel wise classification layer. This architecture is illustrated in Fig 2.6. The encoder

15
network consists of 13 convolution layers which correspond to the first 13 convolution
layers in the VGG16 network designed for object classification. We can therefore
initialize the training process from weights trained for classification on large datasets .
We can also discard the fully connected layers in favour of retaining higher resolution
feature maps at the deepest encoder output. This also reduces the number of
parameters in the SegNet encoder network significantly (from 134M to 14.7M) as
compared to other recent architectures. Each encoder layer has a corresponding
decoder layer and hence the decoder network has 13 layers. The final decoder output is
fed to a multi-class soft-max classifier to produce class probabilities for each pixel
independently. Each encoder in the encoder network performs convolution with a filter
bank to produce a set of feature maps. These are then batch normalized Then an
element-wise rectified linear non-linearity (ReLU) max(0, x) is applied. Following that,
max-pooling with a 2 × 2 window and stride 2 (non-overlapping window) is performed
and the resulting output is sub-sampled by a factor of 2. Max-pooling is used to achieve
translation invariance over small spatial shifts in the input image. Sub-sampling results in
a large input image context (spatial window) for each pixel in the feature map. While
several layers of max-pooling and sub-sampling can achieve more translation invariance
for robust classification correspondingly there is a loss of spatial resolution of the
feature maps. The increasingly lossy (boundary detail) image representation is not
beneficial for segmentation where boundary delineation is vital. Therefore, it is
necessary to capture and store boundary information in the encoder feature maps
before sub-sampling is performed. If memory during inference is not constrained, then
all the encoder feature maps (after sub sampling) can be stored. This is usually not the
case in practical applications and hence we propose a more efficient way to store this
information. It involves storing only the max-pooling indices, i.e, the locations of the
maximum feature value in each pooling window is memorized for each encoder feature
map. In principle, this can be done using 2 bits for each 2 × 2 pooling window and is thus

16
much more efficient to store as compared to memorizing feature map(s) in float
precision.

ImageNet Challenge:
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is an annual
competition in which participants develop algorithms to classify images from a subset of
the ImageNet database. The ImageNet database consists of more than 14 million
photographs collected from the Internet, each labelled with one ground truth class. The
ILSVRC training set consists of approximately 1.2 million images in 1000 different classes,
covering a wide variety of objects (from toilet paper, bananas and kimonos to fire trucks,
space shuttles and volcanoes), scenes (from valleys and seashores to libraries and
monasteries) and animals (120 breeds of dogs, but also axolotls, sharks and triceratops
dinosaurs). Participants are permitted to make five predictions. The top-1 accuracy
tracks the percentage of correct labels assigned at first guess, and the top-5 accuracy
takes into account all five predictions. With explicit training and concentrated effort,
humans can reach about 5 percent of the top-5 error rate (Mishkin et al.n.d.).
CNN Topologies for Image Classification on ImageNet:
The huge number of training samples and the problem's difficulty make the ImageNet
challenge an ideal playground for machine learning algorithms. Starting with Alex Net in
2012, convolutionary neural networks took the lead in the ILSVRC competition and the
top-1 and top-5 error rates of the winning entries have dropped significantly since then.
1. AlexNet by Alex Krizhevsky et al. from the University of Toronto was the first
CNN to win the ILSVRC in 2012.
AlexNet was the winning entry in ILSVRC 2012. It solves the problem of image
classification where the input is an image of one of 1000 different classes (e.g.
cats, dogs etc.) and the output is a vector of 1000 numbers. The ith element of
the output vector is interpreted as the probability that the input image belongs

17
to the ith class. Therefore, the sum of all elements of the output vector is 1.The
input to AlexNet is an RGB image of size 256×256. This means all images in the
training set and all test images need to be of size 256×256.
If the input image is not 256×256, it needs to be converted to 256×256 before
using it for training the network. To achieve this, the smaller dimension is resized
to 256 and then the resulting image is cropped to obtain a 256×256 image.
AlexNet consists of 5 convolution layers, has 60 million parameters and requires
approximately 1.1 billion multiply-accumulated (MACC) operations for one
forward pass. The network achieved a groundbreaking top-5 error rate of 15.3
percent on ILSVRC 2012, with the second best entry left behind at
26.2%(Krizhevsky & Hinton n.d.).

Figure 2.7 Illustration Of AlexNet


2. Network-in-Network (NiN) Min Lin et al. of the National University of Singapore
was published in 2013 as a novel CNN architecture. The NiN architecture consists
of small, stacked multilayer perceptrons that are slid over the respective input
just like convolutionary filters. In addition, the authors use global average pooling
in the classifier instead of full-connected layers. That makes the network much
smaller Regarding the parameters. NiN has never officially participated in ILSVRC,
but can be trained on the ImageNet dataset and is approximately accurate at
AlexNet level (Qiao et al. 2016).
3. VGG: It stands for Visual Geometry Group, University of Oxford, and also names
this group's CNN architecture, The runner-up at the ILSVRC 2014 competition is
dubbed VGGNet by the community and was developed by Simonyan and

18
Zisserman . VGGNet consists of 16 convolution layers and is very appealing
because of its very uniform architecture. Similar to AlexNet, only 3x3
convolutions, but lots of filters. Trained on 4 GPUs for 2–3 weeks. It is currently
the most preferred choice in the community for extracting features from images.
The weight configuration of the VGGNet is publicly available and has been used in
many other applications and challenges as a baseline feature extractor. However,
VGGNet consists of 138 million parameters, which can be a bit challenging to
handle. The network reached a 7.3% top-5 error. However, VGG-16 has nearly
140 million weights and one forward pass requires almost 16 billion MACC
operations (Fasih & Chedjou 2015)(Fasih & Chedjou 2015).

Figure 2.8 Illustration Of VGG


4. GoogLeNet Christian Szegedy et al. from Google is a milestone CNN architecture
published just a few days after the VGG architecture. GoogleNet's 22-layer set a
new ILSVRC classification record with a top-5 error rate of 6.67 percent, while
requiring only 1.2 million parameters and 0.86 billion MACC operations.7 Savings
are achieved through a more complex architecture that uses so-called Inception
modules. These modules are a network-in-network sub-architecture that first
uses a 11 convolution layer to reduce the number of channels, before expanding
this compressed representation again using parallel convolution layers with
kernel sizes 1 bis 1, 33 and 55. The reduction in the The channel dimension
decreases the number of parameters and MACC operations in both reduction
and expansion layers, and the composition of multiple layers increases the
network's non-linear expressiveness. GoogLeNet uses LRN layers to improve
training convergence (Ghaffari 2016).

19
Figure 2.9 Illustration Of VGG

5. ResNet At last, at the ILSVRC 2015, the so-called Residual Neural Network
(ResNet) by Kaiming He et al introduced anovel architecture with “skip
connections” and features heavy batch normalization. Such skip connections are
also known as gated units or gated recurrent units and have a strong similarity to
recent successful elements applied in RNNs. Thanks to this technique they were
able to train a NN with 152 layers while still having lower complexity than
VGGNet. It achieves a top-5 error rate of 3.57% which beats human-level
performance on this dataset.By using 152 convolution layers, their very deep
ResNet-152 model achieved a top-5 error rate of less than 5.7 percent. Models
with a depth of more than 20 convolution layers were previously very difficult to
train. The researchers solved this problem by including detours around each
batch of two subsequent convolution layers, summing both the detoured original
and the filtered representation together at the junction points. This topology
resembles a function 𝑦 = 𝐹(𝑥) + 𝑥 Where the network only needs to learn the
residual function F(x), simply "add information" instead of reinventing the wheel
every two layers. The smaller version ResNet-50 uses 50 convolution layers and
batch standardization, has 47 million parameters and needs 3.9 billion MACC
operations per forward pass to achieve a top-5 error of 6.7 percent.

20
Figure 2.10 Illustration Of ResNet
6. Inception v3 and v4 by Christian Szegedy et al. are Google’s latest published
image classification CNNs. The GoogLeNet architecture has been thoroughly
studied and optimized in the Inception v3 paper, with valuable hints on how to
design and modify CNNs for efficiency. The Inception v4 paper, published in
February 2016, studies the positive effects of residual connections in Inception
module-based architectures And presents Inception-ResNet-v2 that reaches a
top-5 error rate of 4.1 percent on the ILSVRC dataset. All recent architectures of
Inception make heavy use of Batch Normalization layers.
7. SqueezeNet by Forrest Iandola et al. from UC Berkeley, also published in
February 2016, differs from other CNN architectures in this list because the
design objective was not record-breaking accuracy. Instead, the authors
developed a network with similar accuracy to AlexNet, but with less than 50
parameters. This parameter reduction has been achieved by using Fire modules,
a reduce-expand micro-architecture comparable to the Inception modules, and
careful balancing of the architecture. The 18-layer SqueezeNet uses 7×7, 3×3 and
1×1 convolutions, 3×3 max-pooling, dropout and global average pooling, but
neither fully-connected, nor LRN, nor Batch Normalization layers. One forward
pass requires only 860 million MACC operations, and the 1.24 million parameters
are enough to achieve less than 19.7% single-crop top-5 error.

Table 2. 1 and 2.2: Comparison of Different CNN Topologies for Image


Classification on ImageNet. The top-5 error rate is listed single net, single-crop
evolution. No. of MAC is the number of multiply accumulate operations in one
forward pass. No. of activations is the total pixel count in all output feature
maps.

21
Table 2.1

22
Table 2.2
2.4 Compression of Neural Network Models:
State-of - the-art CNNs require significant amounts of memory for their weights
(e.g. 560 MB for VGG-16 with 32-bit weights), which can be problematic for
instance with over-theair updates or deployment on embedded systems.
Researchers were looking for ways to reduce both the number of weights and
the memory required per weight.
Kernel Decomposition and Pruning: Denil et al. show that up to 95 percent of all
weights in their CNN can be predicted rather than learned without a drop in
accuracy. Denton et al. approximate fully trained convolution kernels using
singular value decomposition (SVD), while Jin et al. replace the 3D convolution
operation with three consecutive one-dimensional convolutions (cross-channel,
horizontal, vertical) Similar methods have been used to effectively deploy CNNs
on smartphones. A final idea is network pruning, where small or otherwise

23
unimportant weights are set to zero, effectively removing the corresponding
connections.
Limited Numerical Precision Reducing the memory consumption of each weight
is possible by replacing the typical 32-bit floating-point weights with either 16-bit
floating-point weights or fixed-point approximations of less than 32 bits. Neural
networks have been shown to very well tolerate this type of quantization. Hwang
et al. successfully quantified most layers in their CNN to three bits Sung et al.
restricted their network to ternary values (−1,0, 1) with a negligible drop in
accuracy, and Courbariaux et al. even trained CNNs with binary weights and
activations. Gysel et al. recently published an automated CNN approximation tool
with Ristretto that analyzes floating-point networks and condenses their weights
to compact fixed-point formats, while respecting a maximally allowed accuracy
drop.
Deep Compression Finally, Han et al. combine pruning, trained quantization and
Huffman coding to reduce AlexNet's storage requirement by a factor of 39, and
that of VGG-16 even by 49 without any drop in accuracy. Fine-tuning the network
with the compressed weights helps to recover most of the initial loss of accuracy
for all of the mentioned methods.

24
3. Aim and Objectives

This chapter deals with the problem statement derived from the literature survey and
identified research gaps through the critical review of relevant papers as discussed in the
chapter 2. The title, aim, objectives, scope, and methods and methodologies carried out
for finding the problem solution are discussed in this chapter.
3.1 Title

Design and Development of Convolutional Neural Network for Early Tumor Detection in
Brain
3.2 Aim

To design a Convolutional Neural Network for early tumor detection in brain and
developing a tumor segmentation and classification in brain MRI.
3.3 Objectives
1. To do Literature survey on various brain tumor cells and convolutional neural
network.

2. To identify and collect data sets for brain tumor cells.

3. To Model and analyze convolutional neural network for tumor cell detection.

4. To Test the functionality of algorithm by subjecting with data set.

5. To Verify, validate and compare the performance of the trained algorithm with
existing model.

3.4 Project Specification


The following requirements and constraints were the guidelines during the work on this
project:
Primary Goal: Design and Implementation of a real-time CNN demonstrator
1. Implementation of best practices from prior work and recent research

25
2. Optimization of a CNN for demonstration purposes
a) Image classification on 3-layer CNN
b) Selection and Training of a suitable CNN topology (existing or custombuilt)
c) Optimization of CNN for implementation
d) Optimization of CNN for accuracy
4. Verification and evaluation of the CNN demonstrator system
The remainder of this report details the implementation of these specifications.
3.5 Methods and Methodologies to Attain Each Objective

In the current dissertation work, contemporary methods and methodologies are


employed for the detection and visualization of coronary blockages as listed in Table

26
Objective Statement of the Objective Method/ Methodology
No.
1. To do Literature survey on various brain 1.1 Survey of various segmentation
tumor cells and convolutional neural algorithms.
network. 1.2 Survey of various neural networks.
1.3 Survey on convolutional neural
network.

2. To Identify and collect data sets for brain 2.1 Collection of data sets: BraTS data
tumor cells. set.
2.2 BraTS data set for 2-D dimensions.
2.3 MRI image: gray level n=75 and
n=210
Note: n=75- > non-cancerous cells
n=210-> cancerous cells

3. To Model and analyze convolutional 3.1 Modelling and development of


neural network for tumor cell detection. different layers.
3.2 Modelling of decision making layer.
3.3 Modelling of output layer.

4. Test the functionality of algorithm by 4.1 Training of CNN with BraTS data set.
subjecting data set. 4.2 Verify efficiency of trained data set.
4.3 Analysing and detecting Network
weight losses.

27
5. Verify, validate and compare the 5.1 Integrating the developed modules
performance of the trained algorithm of algorithm.
with existing model. 5.2 Test, verify and validate the
functionality of CNN.
5.3 Measure the efficiency of
developed convolutional neural
network.

Table 3.1 : detection and visualization of coronary blockages

28
4. Development of CNN based Image Classification

Preamble to the Chapter


This chapter deals with problem solving through the realization of objectives as
mentioned in the chapter 3. Problem solving encapsulates the part of literature review,
identification of image processing algorithms and their brief architectural description
MATLAB and CNN architectures for the identified algorithms are discussed in this
chapter.
4.1.Dataset
The BRATS database of handwritten digits, available from this page, has a training set of
1000 examples, and a test set of 250 examples with their respective labels. It is a subset
of a larger set available from National Institute of Standards and Technology (NIST). The
digits have been size-normalized and cantered in a fixed-size image. It is a good database
for people who want to try learning techniques and pattern recognition methods on
real-world data. The data is stored in a very simple file format designed for storing
vectors and multidimensional matrices(Andri et al. 2016)(LeCun et al. 1998 ).
Training set Testing set Image type Image size
750 250 Grayscale 28*28

Table 4. 1: BRATS Dataset Specifications

29
Figure 4.1 shows the brain MRI images of benign and malignant tumors

4.2 Design of Convolutional Neural Network Algorithm

In machine learning, a convolutional neural network (CNN) is a type of feed forward


artificial neural network in which the connectivity pattern between its neurons.
Convolutional net works were inspired by biological processes and are variations of
multilayer preceptors designed to use minimal amounts of preprocessing. They have
wide applications in image and video recognition, recommender systems and
processing. The convolutional neural network is also known as shift invariant or space
invariant artificial neural network (SIANN), which is named based on its shared weights
architecture and translation invariance characteristics. Convolutional neural networks

30
(CNNs) consist of multiple layers of receptive fields. These are small neuron collections
which process portions of the input image. The outputs of these collections are then
tiled so that their input regions overlap, to obtain a better representation of the original
image; this is repeated for every such layer. Tiling allows CNNs to tolerate translation of
the input image. Convolutional networks may include local or global pooling layers
which combine the outputs of neuron clusters. They also consist of various combinations
of convolutional and fully connected layers, with point wise nonlinearity applied at the
end of or after each layer. A convolution operation on small regions of input is
introduced to reduce the number of free parameters and improve generalization. One
major advantage of convolutional networks is the use of shared weight in convolutional
layers, which means that the same filter (weights bank) is used for each pixel in the
layer; this both reduces memory footprint and improves performance.
SegNet has an encoder network and a corresponding decoder network, followed by a
final pixelwise classification layer. The encoder network consists of 13 convolutional
layers which correspond to the first 13 convolutional layers in the network designed for
object classification. We can therefore initialize the training process from weights
trained for classification on large datasets. We can also discard the fully connected
layers in favour of retaining higher resolution feature maps at the deepest encoder
output. This also reduces the number of parameters in the SegNet encoder network
significantly (from 134M to 14.7M) as compared to other recent architectures. Each
encoder layer has a corresponding decoder layer and hence the decoder network has 13
layers. The final decoder output is fed to a multi-class soft-max classifier to produce class
probabilities for each pixel independently. Each encoder in the encoder network
performs convolution with a filter bank to produce a set of feature maps. These are then
batch normalized. Then an element-wise rectifiedlinear non-linearity (ReLU) max(0, x) is
applied. Following that, max-pooling with a 2 × 2 window and stride 2 (non-overlapping
window) is performed and the resulting output is sub-sampled by a factor of 2. Max-
pooling is used to achieve translation invariance over small spatial shifts in the input

31
image. Sub-sampling results in a large input image context (spatial window) for each
pixel in the feature map. While several layers of max-pooling and sub-sampling can
achieve more translation invariance for robust classification correspondingly there is a
loss of spatial resolution of the feature maps.
4.2.1 Types of Layers
 Convolutional layers: In this layer, filters of size (k×k) to generate the output
feature maps. For filters larger than 1×1, border effects reduce the output
dimensions. To avoid this effect, the input image is typically padded with 𝑝 =
𝑘/2 zeros on each side. The filters can be applied with a stride , which reduces
the output dimensions to 𝑤𝑜𝑢𝑡 = 𝑤𝑖𝑛 /𝑠 and ℎ𝑜𝑢𝑡 = ℎ𝑖𝑛 /𝑠.
𝑲−𝟏 𝑲−𝟏

𝑶(𝒚, 𝒙) = ∑ ∑ 𝑰(𝒋, 𝒊). 𝑲(𝒚 − 𝒋, 𝒙 − 𝒊) + 𝑩


𝒋=𝟎 𝒊=𝟎

 Nonlinearity Layers: In this layer, a non-linear activation function to each input


pixel. The most popular activation function is the Rectified Linear Unit (ReLU)
which computes 𝑓(𝑥) = max(0, 𝑥) and clips all negative elements to zero. Early
networks used sigmoidal functions such as 𝑓(𝑥) = 1/(1 + 𝑒 −𝑥 ) or hyperbolic
𝑒 𝑥 −𝑒 −𝑥
tangent (tanh)𝑓(𝑥) = (𝑒 𝑥 +𝑒 −𝑥 ) , but these are no longer used because of their

computational complexity and their slowing effect on convergence during


training. More recent ideas include the Parametric ReLU (PReLU)𝑓(𝑥) = max(∝
. 𝑥, 𝑥) with learnable parameter, Maxout and Exponential Linear Units (ELU).
𝒇(𝒙) = 𝒎𝒂𝒙(𝟎, 𝒙)
 Pooling Layers: This layer reduce the spatial dimensions of the input by
summarizing multiple input pixels into one output pixel. Two popular choices are
max-pooling and average-pooling, which summarize their local receptive field by
taking the maximum or the average value of the pixels, respectively. They are
usually applied to a patch of 2×2 or 3×3 input pixels with a stride s = 2, but can
also be applied as global pooling to the whole input image, in order to reduce the
spatial output dimensions to 1×1 pixels.

32
𝒂+𝒃+𝒄+𝒅
𝑨𝒗𝒆𝒓𝒂𝒈𝒆 =
𝟒

 Fully-Connected Layers: This is often used as the last layers in a CNN to compute
the class scores in image classification applications. Even though the spatial
dimensions ℎ𝑛 and 𝑤𝑖𝑛 in the last layers are typically heavily reduced, the fully-
connected layers often account for most of the weights in these CNNs.

4.2.2 Training CNN


Gradient-based Learning
The gradient-based learning algorithm is a generalization of the back-propagation
algorithm, which iterates to adjust the weights to minimize an error function E. Starting
from an initial weight vector W, it updates the weights as indicated by equation at each
iteration, where 𝜂 is a learning rate.
𝜕𝐸
𝑊 ←𝑊−𝜂
𝜕𝑊
Denoting the output and the weight vectors of the nth layer as𝑋 𝑛 and𝑊 𝑛 , respectively,
𝜕𝐸
and applying the chain rule, the gradient at the nth layer𝜕𝑊 is expanded as

demonstrated by equation
𝜕𝐸 𝜕𝐸 𝜕𝐸 𝜕𝐸 𝜕𝑋 𝑛 𝜕𝐸 𝜕𝑁𝐸𝑇 𝑛
= =
𝜕𝑊 𝑛 𝜕𝑋 𝑛 𝜕𝑊 𝑛 𝜕𝑋 𝑛 𝜕𝑁𝐸𝑇 𝑛 𝜕𝑊 𝑛 𝜕𝑊 𝑛
Different Approaches to Calculating 2D Convolutions
When it comes to the calculation of the convolutional layers, there are two other
approaches besides the direct “sliding-filter” method described above. Matrix
Multiplication The first approach transforms the 2D convolution into one large matrix
multiplication. For this, each local input region (the image region underneath each
possible filter location) is stretched out into a column vector, and all the column vectors
are concatenated to form a matrix C. Since the filter’s receptive fields usually overlap,
every image pixel is replicated into multiple columns of C. The filter weights are similarly
unrolled into rows, forming the matrix R. The 2D convolution is then equivalent to a

33
matrix product RC, which can be calculated very efficiently using highly optimized linear
algebra (BLAS) routines which are available for CPUs, GPUs and DSPs. The disadvantage
of this approach is the exploding memory consumption of the column matrix. For a small
3×3 filter, matrix C is already blown up by a factor of 9 compared to the original input
image. This makes it necessary to split the problem into a number of overlapping tiles,
and later stitch the results back together, which artificially increases the complexity of
the problem.
Fast Fourier Transformation
The second approach to 2D convolutions makes use of the fact that a convolution in the
Spatial Domain corresponds to a simple element-wise multiplication in the Fourier
Domain. This approach can be implemented using the Fast Fourier Transformation (FFT)
and is especially suited for large kernels and large batch sizes, where it can provide
speedups of more than to 20×compared to the matrix multiplication method .
Advantages of the Sliding-Filter 2D Convolution Approach Both the Matrix Multiplication
and the FFT approach are well suited for general-purpose architectures such as GPUs.
They are especially efficient for large problem sizes and batched computation. However,
their additional memory consumption and the resulting need for tiling and re-stitching
introduce artificial memory and computation requirements, which reduce the resource
efficiency of the architecture. Our focus on the regular, well-optimized CNN further
eliminates the need to support all kinds of different parameter combinations.

34
5. Results

Preamble
This chapter deals with the results and their discussions of the dissertation workcarried
out in the chapter 4 (problem solving). Tables, photographs (input-output
images),justification of realization of objectives, validations (substantiation)
andrecommendations of dissertation work are discussed in this chapter.
5.1 CNN Performance on MATLAB
The CNN on MATLAB performed well as compared to other image classification
algorithm. It is giving an accuracy of 87.6 % for 348-iteration. It has been verified the
same network with different dataset set, the network is maintaining its classification
accuracy as well as time, the training progress is shown in figure 5.1

Figure 5.1: Training progress

Status of the training performance is shown in the in figure 5.2

35
Figure 5.2:Status of the training performance

5.2 Segmentation of the MRI images


Using the proposed CNN architecture the output is verified using the MRI images. After
uploading the MRI image to the network that image is segmented. Figure 5.3 & figure
5.3 shows the final output of that for a benign and malignant image.

36
Figure 5.3: Segmentation of MRI image (benign)

Figure 5.4: Segmentation of a MRI image (malignant)

37
6. Conclusions

Conclusion :
Among brain tumors, malignant are the most common and aggressive, leading to a very
short life expectancy in their highest grade. Thus, treatment planning is a key stage to
improve the quality of life of oncological patients. Magnetic resonance imaging (MRI) is
a widely used imaging technique to assess these tumors, but the large amount of data
produced by MRI prevents manual segmentation in a reasonable time, limiting the use
of precise quantitative measurements in the clinical practice. So, automatic and reliable
segmentation methods are required; however, the large spatial and structural variability
among brain tumors make automatic segmentation a challenging problem. Here we
propose an automatic segmentation method based on Convolutional Neural Networks
(CNN), exploring small 3 ×3 kernels. The use of small kernels allows designing a deeper
architecture, besides having a positive effect against overfitting, given the fewer number
of weights in the network. We also investigated the use of intensity normalization as a
pre-processing step, which though not common in CNN-based segmentation methods,
proved together with data augmentation to be very effective for brain tumor
segmentation in MRI images

38
References
1. Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, Senior Member, IEEE,
“SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image
Segmentation” vol. 3, 10 oct. 2016
2. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic
segmentation,” in CVPR, pp. 3431–3440, 2015.
3. V. Badrinarayanan, A. Handa, and R. Cipolla, “Segnet: A deep convolutional
encoder-decoder architecture for robust semantic pixel-wise labelling,” CoRR,
vol. abs/1505.07293, 2015.
4. H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic
segmentation,” in ICCV, pp. 1520–1528, 2015.
5. L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in
Proceedings of COMPSTAT’2010, pp. 177–186, Springer, 2010.
6. https://www.cbica.upenn.edu/sbia/Spyridon.Bakas/MICCAI_BraTS/MICCAI_BraT
S_2018_proceedings_shortPapers.pdf
(MICCAI_BraTS_2018_proceedings_shortPapers.pdf)
7. NPTEL online course: Machine Learning for Engineering and Science Applications

39

Potrebbero piacerti anche