Sei sulla pagina 1di 109

B.E.

PROJECT
ON

PID Controller tuning using soft computing


techniques
Submitted by
Pulak Malik (513/IC/14)
Sarthak Garg (526/IC/14)
Shashwat Bhageria (531/IC/14)
Shubham Singh (536/IC/14)
(In partial fulfillment of B.E. (Instrumentation and Control
Engineering) degree of University of Delhi

Under the Guidance of


Dr. J.R.P. Gupta and Prof. A.N. Jha

DIVISION OF INSTRUMENTATION AND CONTROL ENGINEERING


NETAJI SUBHAS INSTITUTE OF TECHNOL0GY
UNIVERSITY OF DELHI, DELHI
JUNE 2018
DEDICATION

This thesis is dedicated to Professor Andrew Ng, Director of the

Stanford Artificial Intelligence Lab, and Chairman and Co-founder of

Coursera. His passion for teaching has set a new standard for anyone

involved in education, training and development, specially in the Artificial

Intelligence give everyone in the world access to a great

education, for free

field. This thesis uses the basics from his esteemed course,

Machine Learning, from Coursera.

Amit Kumar Singh Rohan Challana Utkarsh Jain


Roll No. 424/IC/13 Roll No. 526/IC/13 Roll No. 557/IC/13

Instrumentation and Control Engineering Department


Netaji Subhas Institute of Technology (NSIT)
Azad Hind Fauj Marg
Sector-3, Dwarka, New Delhi
PIN - 110078

i
ACKNOWLEDGEMENTS

We would like to express our deepest gratitude to our advisers Dr. J.R.P.

Gupta and Prof. A.N. Jha for giving us the opportunity to work under their able

supervision. We are indebted towards the faculty of NSIT, whose guidance and

teaching for the past four years have helped us in understanding the theory and

practices of Instrumentation and Control Engineering. We also thank our parents

and families who always supported us unconditionally in our thick and thin. Their

belief in our capabilities always pushed us to give our best in this project. Without

these invaluable contributions, this project could not have been completed.

Pulak Malik Sarthak Garg Shashwat Bhageria Shubham Singh


513/IC/14 526/IC/14 531/IC/14 536/IC/14

Instrumentation and Control Engineering Department


Netaji Subhas Institute of Technology (NSIT)
Azad Hind Fauj Marg
Sector-3, Dwarka, New Delhi
PIN - 110078

ii
DECLARATION

This is to certify that the project entitled “PID controller tuning


using soft computing techniques” by Pulak Malik, Sarthak
Garg, Shashwat Bhageria and Shubham Singh is a record of
bona-fide work carried out by us, in the department of
Instrumentation and Control Engineering, Netaji Subhas
Institute of Technology, University of Delhi, New Delhi, in partial
fulfillment of requirements for the award of the degree of
Bachelor of Engineering in Instrumentation and Control
Engineering, University of Delhi in the academic year 2017-
2018. The results presented in this thesis have not been
submitted to any other university in any form for the award of
any other degree.

Pulak Malik Sarthak Garg Shashwat Bhageria Shubham Singh


513/IC/14 526/IC/14 531/IC/14
536/IC/14

Instrumentation and Control Engineering Department


Netaji Subhas Institute of Technology (NSIT)
Azad Hind Fauj Marg
Sector-3, Dwarka, New Delhi
PIN - 110078
iii
CERTIFICATE

This is to certify that the project entitled “PID controller tuning using

soft computing techniques” by Pulak Malik, Sarthak Garg, Shashwat

Bhageria and Shubham Singh is a record of bona-fide work carried

out by us, in the department of Instrumentation and Control

Engineering, Netaji Subhas Institute of Technology, University of Delhi,

New Delhi, under our supervision and guidance in partial fulfillment of

requirements for the award of the degree of Bachelor of Engineering in

Instrumentation and Control Engineering, University of Delhi in the

academic year 2017-2018.

The results presented in this thesis have not been submitted to

any other university in any form for the award of any other degree.

Dr. J.R.P Gupta Mr. A.N. Jha


(Professor Emeritus) (Associate Professor)

Instrumentation and Control Engineering Department


Netaji Subhas Institute of Technology (NSIT)
Azad Hind Fauj Marg
Sector-3, Dwarka, New Delhi
PIN - 110078

iv
CERTIFICATE
This is to certify that the project entitled “PID controller tuning using

soft computing techniques” by Pulak Malik, Sarthak Garg, Shashwat

Bhageria and Shubham Singh is a record of bona-fide work carried

out by us, in the department of Instrumentation and Control

Engineering, Netaji Subhas Institute of Technology, University of Delhi,

New Delhi, under our supervision and guidance in partial fulfillment of

requirements for the award of the degree of Bachelor of Engineering in

Instrumentation and Control Engineering, University of Delhi in the

academic year 2017-2018.

Prof. Smriti Srivastava


Head of the Department
Department of Instrumentation and Control Engineering
Netaji Subhas Institute of Technology (NSIT)
Azad Hind Fauj Marg
Sector-3, Dwarka, New Delhi
PIN - 110078

v
PLAGIARISM REPORT

vi
ABSTRACT

Artificial bee colony (ABC) algorithm has proved its importance in


solving a number of problems including engineering optimization
problems. ABC algorithm is one of the most popular and youngest
member of the family of population-based nature inspired meta-
heuristic swarm intelligence method. ABC has been proved its
superiority over some other Nature Inspired Algorithms (NIA) when
applied for both benchmark functions and real world problems. The
performance of search process of ABC depends on a random value
which tries to balance exploration and exploitation phase. In order to
increase the performance, it is required to balance the exploration of
search space and exploitation of optimal solution of the ABC. This
report outlines a new hybrid of ABC algorithm with Genetic
Algorithm. This report reviews Artificial Bee Colony (ABC) and
Genetic Algorithm (GA), both of which are two powerful meta-
heuristics. This paper explains some major defects of these two
algorithms at first then proposes a new hybrid model Experimental
results show that proposed hybrid algorithm is effective and its
performance including speed and accuracy beats other version.

vii
LIST OF TABLES

Table 6.1: Log loss Error of Neural Network, CNN and XGBOOST ..................................67

viii
LIST OF FIGURES

Figure 4.1: Image Classification of cat..............................................................................................10

Figure 4.2: Image Classification on the basis of different factors...................................11

Figure 4.3: User's Iterest with different step functions ...........................................................14

Figure 4.4: Perceptron Input and Output.........................................................................................15

Figure 4.5: Representation of a Perceptron...................................................................................16

Figure 4.6: Network of Perceptron's...................................................................................................17

Figure 4.7: Sigmoid Function..................................................................................................................18

Figure 4.8: Representation of Perceptron including hidden layer. ..................................21

Figure 4.9: A regular 3-layer Neural Network.................................................................................21

Figure 4.10: Local connectivity represetation of Perceptron ..............................................23

Figure 4.11: Representation of Perceptron body........................................................................24

Figure 4.12: Spatial Arrangement of Neurons...............................................................................25

Figure 4.13: General Pooling....................................................................................................................26

Figure 4.14: Illustration of Pooilng.......................................................................................................27

Figure 5.1: Flow Diagram of Classification ...................................................................................33

Figure 5.2:Hounsfield Unit of different matter..............................................................................35

Figure 5.3:Frequency vs. Hounsfield Unit.......................................................................................36

Figure 5.4: Lung Slice..................................................................................................................................36

Figure 5.5: Z-Axis Sweep...........................................................................................................................39

Figure 5.6: X-Axis Sweep...........................................................................................................................40

ix
Figure 5.7: Z-Axis Sweep...........................................................................................................................41

Figure 5.8: Distribution of Pixels in Image......................................................................................42

Figure 5.9:Edge Detection 1.....................................................................................................................43

Figure 5.10: Edge Detection 2.................................................................................................................43

Figure 5.11: Edge Detection 3.................................................................................................................44

Figure 5.12: 3D Plot of lung......................................................................................................................45

Figure 5.13: Original Input Slice............................................................................................................46

Figure 5.14: Internal Marked Slice........................................................................................................47

Figure 5.15:External Marked Slice........................................................................................................47

Figure 5.16: Watershed Marked Slice.................................................................................................48

Figure 5.17:Sobel Gradient.......................................................................................................................49

Figure 5.18: Watershed Image................................................................................................................49

Figure 5.19: Outline After Re-Inclusion.............................................................................................50

Figure 5.20: Lung Filter After Closing................................................................................................50

Figure 5.21: Segmentedd Lung..............................................................................................................51

Figure 5.22: 3D Segmented Lung..........................................................................................................52

Figure 5.23: 3D Lung-2.................................................................................................................................53

Figure 5.24: 3D Lung-3.................................................................................................................................54

Figure 6.1: Log Loss v/s probability for a single positive instance................................61

Figure 6.2: Root Folder................................................................................................................................62

Figure 6.3:Results with each Iteration...............................................................................................63

Figure 6.4: System Usage Stats and iterations result..............................................................64

x
Figure 6.5:XGBOOST Output...................................................................................................................65

Figure 6.6: Comparison Curve of CNN and XGBOOST...........................................................66

xi
INDEX OF EQUATIONS

Equation Caption Page

Equation 4.1 Linear Combination of features 13


Equation 4.2 Objective Function of supervised Learning
13
Equation 4.3 Mean Squared Error
13
Equation 4.4 Logistic Loss Error
13
Equation 4.5 Output corresponding to the input parameters

15
Equation 4.6 Output corresponding to the input parameters and

biases

Equation 4.7 Sigmoid Function 16


Equation 4.8 Training Loss
17
Equation 4.9 Objective function for XGBOOST
28
Equation 4.10 Additive Training Of XGBOOST
28
Equation 4.11 Objective function for XGBOOST for MSE
29
Equation 4.12 Objective function for XGBOOST in term of

constants

Equation 4.13 Final Objective function of XGBOOST 29

xii
Equation 4.14 General Tree function 29
Equation 4.15 Complexity function in XGBOOST
30
Equation 6.1 Log Loss Error
61

xiii
TABLE OF CONTENTS

CHAPTER 1: INTRODUCTION ........................................................................................ 1

CHAPTER 2: PROJECT REQUIREMENTS ..................................................................... 4


2.1 System Requirements ................................................................................................ 4
2.1.1 Operating System: ............................................................................................. 4
2.1.2 Graphics Processing Unit: ................................................................................. 4
2.2 Software Requirements.............................................................................................. 4
2.2.1 Python 2.7+: ...................................................................................................... 4
2.2.2 TensorFlow:....................................................................................................... 4
2.2.3 Anaconda:.......................................................................................................... 5
2.2.4 Python Libraries used: ....................................................................................... 5

CHAPTER 3: PROJECT APPROACH ............................................................................... 7


3.1 Topic Research and Selection:................................................................................... 7
3.2 Research:.................................................................................................................... 7
3.3 Dataset Collection:..................................................................................................... 7
3.4 Data Preparation: ....................................................................................................... 8
3.5 Training of the Algorithms: ....................................................................................... 8

CHAPTER 4: THEORETICAL BACKGROUND ............................................................. 9


4.1 Image Classification .................................................................................................. 9
4.1.1 Motivation ......................................................................................................... 9
4.1.2 Example............................................................................................................. 9
4.1.3 Challenges ....................................................................................................... 10
4.1.4 Data Driven Approach..................................................................................... 11
4.1.5 The image classification pipeline .................................................................... 12
4.2 Elements of Supervised Learning ............................................................................ 12
4.2.1 Model and Parameters ..................................................................................... 12
4.2.2 Objective Function: Training Loss + Regularization ...................................... 13
4.2.3 Neural Networks.............................................................................................. 14
4.2.3.1 PERCEPTRON...................................................................................... 14
4.2.3.2 ACTIVATION FUNCTIONS ............................................................... 17
4.2.4 Convolutional Neural Networks (CNNs) ........................................................ 20
4.2.4.1 Architecture Overview........................................................................... 20
4.2.4.2 Layers used to build convolutional neural networks ............................. 22
4.2.4.3 FULLY-CONNECTED LAYER........................................................... 28
4.2.5 MODELLING OF XGBOOST ....................................................................... 29
4.2.5.1 Tree Ensemble ....................................................................................... 29

xiv
4.2.5.2 Tree Boosting (Training of Tree)...........................................................29
4.2.5.3 Additive Training............................................................................................29

CHAPTER 5: IMPLEMENTATION DETAILS........................................................................33


5.1 Preprocessing........................................................................................................................33
5.1.1 Loading the files.........................................................................................................34
5.1.2 Resampling....................................................................................................................38
5.1.3 Edge detection and other convolutional filters:.....................................42
5.1.4 3D plotting the scan.................................................................................................45
5.1.5 Lung segmentation using water sheds........................................................46
5.1.6 3D Lung segmentation...........................................................................................51
5.1.7 Normalization...............................................................................................................55
5.1.8 Zero centering..............................................................................................................55
5.2 Running the algorithms....................................................................................................56
5.2.1 CNN Parameters.........................................................................................................56
5.2.1.1 CONVOLUTIONAL LAYERS COUNT = 2:.........................................56
5.2.1.2 NODES PER HIDDEN LAYER COUNT = 64:...................................57
5.2.1.3 STRIDE SIZE = [1,2,2,2,1]:.........................................................................57
5.2.1.4 THE NUMBER OF HIDDEN LAYERS=4:............................................57
5.2.2 XGBoost Parameters:.............................................................................................58
5.2.2.1 N_ESTIMATORS = 1500:............................................................................58
5.2.2.2 MIN_CHILD_WEIGHT = 9:.........................................................................58
5.2.2.3 MAX_DEPTH = 10:.........................................................................................58
5.2.2.4 LEARNING_RATE = 0.05:..........................................................................59
5.2.2.5 COLSAMPLE_BYTREE = 0.80:..............................................................59
5.2.2.6 SEED = 42:..........................................................................................................59
5.2.2.7 NTHREAD = 8...................................................................................................59
5.2.2.8 SUBSAMPLE = 0.80:.....................................................................................60
5.2.2.9 VERBOSE = TRUE.........................................................................................60
5.2.2.10 EARLY_STOPPING_ROUNDS = 50:.................................................60

CHAPTER 6: EVALUATION AND RESULT..........................................................................61


6.1.1 Neural Network............................................................................................................62
6.1.2 Convolutional Neural Network...........................................................................63
6.1.3 XGBoost...........................................................................................................................65
6.1.4 Comparison...................................................................................................................66

CHAPTER 7: CONCLUSION AND FUTURE........................................................................68


CHAPTER 1:
The Proportional-Integral-Derivative (PID) controller has been proved the most popular

controller of this century for its remarkable effectiveness, easiness of implementation and vast

applicability,though the tuning of PID Controller has been found to be hard over the years. The

tuning of PID controller was done using various methods which include Genetic Algorithms,

Evolutionary Programming and Particle Swarm Optimization.

All these tuning methods are done manually and are difficult as well as time consuming. For

using PID controller efficiently, the optimal tuning of the parameters of PID controller has

become a significant research area and is taken up by various control graduates. Optimization

problems have been resolved with the aid of numerous soft computing techniques which include

fuzzy logic,artificial neural network and meta heurisitic methods which are an alternative to the

traditional approches. Many algorithms like those of Evolutionary Programming were taking a

lot of time to show the desired results.

Genetic Algorithms (GAs) are a random global search method that replicates the evolution

process and was developed in the United States in the 1970 at the University of Michigan.

The Genetic Algorithm uses two major principles i.e. the Crossover of population and mutation

to get to the closest result possible.

The Genetic Algorithm has no knowledge of the correct solution and depends entirely on

responses from the environment and evolution operators such as crossover and mutation to arrive

at the best proposed solution. The algorithm avoids reaching the local minima and converges to

sub optimal solutions, by starting with many independent points and searching them in parallel.

But for very complex problems, it may converge at local minima. The time consumed by this

optimization algorithm is also high since it involves so many parameters. Also, the genetic
algorithm is sensitive to the initial population used while we want a wide diversity of easily

feasible solution.

The biggest limitation of Genetic Algorithm is that it cannot guarantee an optimal solution. The

solution quality also degrades with the rise of problem size. The convergence rate is slow in GA.

However it can generate good quality solutions for any given problem and function type.

The quality of the results depends highly on:

 The population size


 The genetic operators (crossover, selection, mutation) and whether they are well-groomed

for the problem you're solving.


 The probabilities of crossover and mutation.

ARTIFICIAL BEE COLONY (ABC) METHOD

The natural behaviour of bees and their collective activities in their hives has been fascinating

researchers all over the world for decades. Recently, the researchers have been focusing on

Swarm Intelligence followed by developing Swarm Optimization Methods which have extended

their knowledge to animal societies especially about insect colonies. Ants, termites, bees, firefly

and wasps are the most important social insects inspiring efficient problem solving algorithms.

In 2005, Karaboga presented Artificial Bee Colony (ABC) algorithm to optimize numeric

benchmark functions. It was then extended by Karaboga and Basturk and showed to outperform

other recognized heuristic methods such as GA [18] as well as DE, PSO and ACO. In addition, it

has been successfully applied on constrained optimization problems, neural networks.

ABC, which is similar to other nature-based algorithms, models honey bees lifecycle but not

precisely. In this model, the honey bees are categorized as employed, onlooker and scout. An

employed bee is a forager associated with a certain food source which she is currently exploiting.

She remembers the character of the food source and then after returning to the hive, shares it

with other bees waiting there via a peculiar communication called waggle dance. An onlooker
bee is an unemployed bee at the hive which tries to find a new food source using the information

provided by employed bees. While ignoring the other’s information, a scout searches around the

hive randomly. In nature, the recruitment of unemployed bees happens in a nearly similar way. In

addition, when the quality of a food source is below a certain level, it will be abandoned to make

the bees explore for new food sources. In ABC algorithm, the solutions are modelled as food

sources and their corresponding objective functions/fitness function as the quality (nectar

amount) of the food source.

Although the exploration of solution of ABC algorithm is good, exploitation to found food

sources is very bad. . And it falls into local optimum because of premature and improved

convergence rate, as a result of which there was a need of more optimised and improved

technique.

As we know that many practical control problems faced by control engineers are of higher order with time delay.
The Metaheuristic algorithm discussed above are generally works good for lesser order problem but as the order of
problem increases these algorithms converges to local maxima and many times they even don’t give satisfactory
result.
Genetic algorithm being very robust have high convergence rate for lower order problems but for higher order
practical problems convergence rate reduces and they provide local maxima as optimal solution.
Artificial Bee Colony method being flexible, easy to implement and good exploration of the solution
but exploitation to found food sources is very bad and falls into local optimum solution as a result of premature.

Advantages of Genetically mutated bee colony optimization technique is it’s-


1- Simplicity, Flexibility and Robustness
2- Ability to overcome local maxima problem faced by Ga
3- Easy to converge for complex problem
4- Ability to handle multidimensional Cost Function
5- Works even with Larger Population Size
6- Converges to optimal solution within few iterations.
7- Can be applied to problems of various practical domain
8- Less dependent on user input variables
9- Applied to higher order practical systems like 3 tank system.
10- It guarantees optimality.
CHAPTER 2: Controller and its type

For the past few years, control system has assumed an increasingly important role in the
development and advancement of modern civilization and technology. Practically every
aspect of our day-to-day activities is affected by some type of control systems. Automatic
control system is found in majority in all sectors of industry, such as quality control of
manufactured products, automatic assembly line, machine-tool control, space technology
and weapon system, computer control, transportation systems, power systems, robotics
and many others. It is essential in such industrial operations as controlling pressure,
temperature, humidity, and flow in the process industries.

Recent application of modern control theory includes such non-engineering systems as


biological, biomedical, control of inventory, economic and socioeconomic systems.

The basic ingredients of a control system can be described by:


 Objectives of control.
 Control system components.
 Results or output.

Fig2.1

Automatic Controllers:-
An automatic controller is used to compare the actual value of plant result with
reference command, determines the difference, and produces a control signal that
will reduce this difference to a negligible value. The manner in which the
automatic controller produces such a control signal is called the control action.
An industrial control system comprises of an automatic controller, an actuator, a
plant, and a sensor (measuring element). The controller detects the actuating
error command, which is usually at a very low power level, and amplifies it to a
very high level. The output of the automatic controller is fed to an actuator, such
as a hydraulic motor, an electric motor or a pneumatic motor or valve (or any
other sources of energy). The actuator is a power device that produces input to the
plant according to the control signal so that the output signal will point to the
reference input signal.

The sensor or the measuring element is a device that converts the output variable
into another optimum variable, such as a displacement, pressure or voltage, that
can be used to compare the output to the reference input command. This element
is in a feedback path of the closed loop system. The set point controller must be
converted to reference input with the same unit as the feedback signal from the
sensor element.

Classification of Industrial controllers: -

Industrial controllers may be classified according to their control action as:


 Two-position or on-off controllers
 Proportional controllers
 Integral controllers
 Proportional-plus-integral controllers
 Proportional-plus-derivative controllers
 Proportional-plus-integral-plus-derivative controllers

Type of controller to use must be decided depending upon the nature of the plant and the
operating condition, including such consideration as safety, cost, availability, reliability,
accuracy, weight and size.
Two-position or on-off controllers: -

In a two-position control system, the actuating part has only two fixed positions,
which are, in many simple cases, simply on and off. Due to its simplicity and
inexpensiveness, it is being very widely used in both industrial and domestic control
system.

Let the output signal from the controller be u(t) and the actuating error signal be
e(t). Then mathematically,

u(t) = U1, for e(t) > 0

= U2, for e(t) < 0

Where,
U1 and U2 are constants and the minimum value of U2 is usually either
zero or - U1.

1.1 Proportional Control :-


A proportional control system is a type of linear feedback control system. Proportional
control is how most drivers control the speed of a car. If the car is at target speed and the
speed increases slightly, the power is reduced slightly, or in proportion to the error (the
actual versus target speed), so that the car reduces speed gradually and reaches the target
point with very little, if any, "overshoot", so the result is much smoother control than on-
off control [5].

In the proportional control algorithm, the controller output is proportional to the error
signal, which is the difference between the set point and the process variable. In other
words, the output of a proportional controller is the multiplication product of the error
signal and the proportional gain. This can be mathematically expressed as

Pout = Kp e(t)

Where

Pout: Output of the proportional controller

Kp: Proportional gain

e(t): Instantaneous process error at time 't'. e(t) = SP − PV


SP: Set point

PV: Process variable

With increase in Kp :
· Response speed of the system increases.

 Overshoot of the closed-loop system increases.


 Steady-state error decreases.
But with high Kp value, closed-loop system becomes unstable.

1.2 Integral Control: -

In a proportional control of a plant whose transfer function doesn’t possess an integrator


1/s, there is a steady-state error, or offset, in the response to a step input. Such an offset
can be eliminated if integral controller is included in the system.

In the integral control of a plant, the control signal, the output signal from the controller,
at any instant is the area under the actuating error signal curve up to that instant. But
while removing the steady-state error, it may lead to oscillatory response of slowly
decreasing amplitude or even increasing amplitude, both of which is usually undesirable
[5].

1.3 Proportional-plus-integral controllers: -

In control engineering a PI Controller (proportional-integral controller) is a feedback


controller which drives the plant to be controlled by a weighted sum of the error
(difference between the output and desired set point) and the integral of that value. It is a
special case of the PID controller in which the derivative (D) part of the error is not used.
The PI controller is mathematically denoted as:

Gc (s) = Kp + Ki / s
Fig.1.2 (courtesy-[5])

Integral control action added to the proportional controller converts the original system
into high order. Hence the control system may become unstable for a large value of K p
since roots of the characteristic eqn. may have positive real part. In this control,
proportional control action tends to stabilize the system, while the integral control action
tends to eliminate or reduce steady-state error in response to various inputs. As the value
of Ti is increased,
 Overshoot tends to be smaller
 Speed of the response tends to be slower.

1.4 Proportional-plus-derivative controllers: -


Proportional-Derivative or PD control combines proportional control and derivative
control in parallel. Derivative action acts on the derivative or rate of change of the control
error. This provides a fast response, as opposed to the integral action, but cannot
accommodate constant errors (i.e. the derivative of a constant, nonzero error is 0).
Derivatives have a phase of +90 degrees leading to an anticipatory or predictive response.
However, derivative control will produce large control signals in response to high
frequency control errors such as set point changes (step command) and measurement
noise [5].
In order to use derivative control, the transfer functions must be proper. This often
requires a pole to be added to the controller.

Gpd(s) = Kp + Kds or

= Kp(1+Tds)

With the increase of Td

 Overshoot tends to be smaller

 Slower rise time but similar settling time.


1.5 Proportional-plus-integral-plus-derivative controllers: -

The PID controller was first placed on the market in 1939 and has remained the most widely
used controller in process control until today. An investigation performed in 1989 in Japan
indicated that more than 90% of the controllers used in process industries are PID controllers
and advanced versions of the PID controller. PI controllers are fairly common, since
derivative action is sensitive to measurement noise

“PID control” is the method of feedback control that uses the PID controller as the main tool.
The basic structure of conventional feedback control systems is shown in Figure below, using
a block diagram representation. In this figure, the process is the object to be controlled. The
purpose of control is to make the process variable y follow the set-point value r. To achieve
this purpose, the manipulated variable u is changed at the command of the controller. As an
example of processes, consider a heating tank in which some liquid is heated to a desired
temperature by burning fuel gas. The process variable y is the temperature of the liquid, and
the manipulated variable u is the flow of the fuel gas. The “disturbance” is any factor, other
than the manipulated variable, that influences the process variable. Figure below assumes that
only one disturbance is added to the manipulated variable. In some applications, however, a
major disturbance enters the process in a different way, or plural disturbances need to be
considered. The error e is defined by e = r – y. The compensator C(s) is the computational
rule that determines the manipulated variable u based on its input data, which is the error e in
the case of Figure. The last thing to notice about the Figure is that the process variable y is
assumed to be measured by the detector, which is not shown explicitly here, with sufficient
accuracy instantaneously that the input to the controller can be regarded as being exactly
equal to y.
Fig. 1.3(courtesy-[5])

When used in this manner, the three elements of PID produces outputs with the
following nature:

 P element: proportional to the error at the instant t, this is the “present” error.
 I element: proportional to the integral of the error up to the instant t, which can
 be interpreted as the accumulation of the “past” error.

 D element: proportional to the derivative of the error at the instant t, which can be
interpreted as the prediction of the “future” error.

Thus, the PID controller can be understood as a controller that takes the present, the past, and
the future of the error into consideration. The transfer function G c(s) of the PID controller is :

G (s) = K (1 + Ti / s + Td s)

= Kp + Ki / s + K d s

1.6 Application: -
In the early history of automatic process control the PID controller was implemented as a
mechanical device. These mechanical controllers used a lever, spring and a mass and
were often energized by compressed air. These pneumatic controllers were once the
industry standard [5].
Electronic analog controllers can be made from a solid-state or tube amplifier, capacitor
and a resistance. Electronic analog PID control loops were often found within more
complex electronic systems, for example, the head positioning of a disk drive, the power
conditioning of a power supply, or even the movement-detection circuit of a modern
seismometer. Nowadays, electronic controllers have largely been replaced by digital
controllers implemented with microcontrollers or FPGAs.

Most modern PID controllers in industry are implemented in programmable logic


controllers (PLCs) or as a panel-mounted digital controller. Software implementations
have the advantages that they are relatively cheap and are flexible with respect to the
implementation of the PID algorithm [5].

Fig.1.4 Close-loop step response.


Chapter 3 TUNING OF PID CONTROLLER

3.1 Basic Introduction: -


“Tuning” is the engineering work to adjust the parameters of the controller so that the
control system exhibits desired property. Currently, more than half of the controllers
used in industry are PID controllers [5]. In the past, many of these controllers were
analog; however, many of today's controllers use digital signals and computers. When
a mathematical model of a system is available, the parameters of the controller can be
explicitly determined. However, when a mathematical model is unavailable, the
parameters must be determined experimentally. Controller tuning is the process of
determining the controller parameters which produce the desired output. Controller
tuning allows for optimization of a process and minimizes the error between the
variable of the process and its set point [5].
Types of controller tuning methods include the trial and error method, and process
reaction curve methods. The most common classical controller tuning methods are the
Ziegler-Nichols and Cohen-Coon methods. These methods are often used when the
mathematical model of the system is not available. The Ziegler-Nichols method can
be used for both closed and open loop systems, while Cohen-Coon is typically used
for open loop systems. A closed-loop control system is a system which uses feedback
control. In an open-loop system, the output is not compared to the input [5].
The equation below shows the PID controller: -.

1 t de(t)
u(t) = K
p
[ e(t) +
T  e(t') dt' + T (
d dt
)]+b

i 0

Where,

u is the control signal.

e is the difference between the current value and the set point.

Kc is the gain for a proportional controller.

Ti is the parameter that scales the integral controller.

Td is the parameter that scales the derivative controller.

t is the time taken for error measurement.

b is the set point value of the signal, also known as bias or offset.
3.2 Ziegler-Nichols Rules for tuning PID Controller:-

It has been observed that step responses of many processes to which PID controllers are applied
have monotonically increasing characteristics as shown in Figures a and b, so most traditional
design methods for PID controllers have been developed implicitly assuming this property.
However, there exist some processes that exhibit oscillatory responses to step inputs.
Two tuning methods were proposed by Ziegler and Nichols in 1942 and have been widely
utilized either in the original form or in modified forms. One of them, referred to as Ziegler–
Nichols‟ ultimate sensitivity method, is to determine the parameters as given in Table 1 using the
data Kcr and Tcr obtained from the ultimate sensitivity test. The other, referred to as Ziegler–
Nichols‟ step response method, is to assume the model FOPDT and to determine the parameters
of the PID controller as given in Table 2 using the parameters R and L of FOPDT which are
determined from the step response test.

Type of controller Kp Ti Td

P 0.5Kcr  0

PI 0.45Kcr 0.833Tcr 0

PID 0.6Kcr 0.5Tcr 0.125Tcr

Fig.3.1 Ziegler-Nichols ultimate sensitivity test [17].

Type of controller Kp Ti Td

P 1/RL  0

PI 0.9/RL L/0.3 0

PID 1.2/RL 2L 0.5L

Fig.3.2 Ziegler-Nichols step response method (RL0) [17].


Frequency-domain stability analysis tells that the above way of applying the Ziegler–Nichols‟
step response method to processes with self-regulation tends to set the parameters on the safe
side, in the sense that the actual gain and phase margins become larger than the values expected
in the case of integrating processes.
.These methods to determine PID parameter using empirical formula, as well as several other
tuning methods developed on the same principle, are often referred to as “classical” tuning
methods. Some of the other classical tuning methods are, Chien–Hrones–Reswick formula,
Cohen–Coon formula, refined Ziegler–Nichols tuning, Wang–Juang–Chan formula.

Disadvantage :-

The classical tuning methods explained above have the following features:
• The process is assumed, implicitly (in the case of Ziegler–Nichols‟ ultimate sensitivity
method) or explicitly (in the case of Ziegler–Nichols‟ step response method), to be
modelled by the simple transfer function.
• The optimal values of the PID parameters are given by formulae of the process parameters that
are determined directly and uniquely from experimental data.
The first feature is a weakness of these classical methods, in the sense that the applicable
processes are limited, or in other words that the claimed “optimal” values are not necessarily, and
are sometimes fairly far from, the true optimal in practical situations where the transfer function
is nothing but an approximation of the real process characteristics. Specifically, the problem is
serious when the pure delay L of the process is very short or very long, where “very short” and
“very long” roughly means outside the range 0.05≤L/T≤1.0 [17]. It can be interpreted as a
weakness in the sense that there is no room to improve the results by making use of more
detailed information about the process which is obtainable from theoretical study and accurate
measurement.
Many attempts have been made to make up for these weaknesses of the classical methods. Many
theoretical considerations have been used to develop sophisticated methods that use, as the basis
of tuning, the shape of the frequency response of the return ratio, poles (and zeros) of the closed-
loop transfer function, time-domain performance indices such as ISE, or frequency-domain
performance indices.

3.3 Cohen-Coon Method (C-C) for tuning of PID Controller:-


There are several ways to determine what values to used for the proportional,
integral, and differential parameters in the controller, and used the Cohen-Coon method is
one of the method . By looking at the system’s response to manual step changes without
the controller operating, initial values for the PID parameters and then tune them
manually are determine
.
The system’s response is modeled to a step change as a first order response plus
dead time, using the Cohen-Coon method. From this response, three parameters: K, τ, and
are founded. K is the output steady state divided by the input step change, τ is the
effective time constant of the first order response, and is the dead time

It is observed that the response of most of the processes under step change in input yields a
sigmoidal shape
Fig: Process Reaction Curve for Cohen Coon Method

Such sigmoidal shape can be adequately approximated by the response of a first order process with dead time.

(IV.70)

From the approximate response it is easy to estimate the parameters. The controllers are designed
as given in Table IV.5.
Table IV.5: Controller settings using Cohen-Coon design method

Soft Computing techniques for tuning of PID Controller:-

 Artificial Neural Network

Adaptive nature of Artificial Neural Network controllers have made them a major area of interest among
researchers in widespread fields[7-13], mainly because ANN controllers can efficiently learn the unknown or
continuously varying environment and act accordingly. Industrial automation applications prefer PID
(proportional Integral Derivative) controllers because of its simple structure and robustness etc.
A Neural Network tuned PID (NNPID) which has two inputs, one output and three layers which are input
layer, hidden layer and output layer. The input layer has two neurons and the output layer has one and their
neurons are P-neurons. The hidden layer has three neurons and they are P-neuron (H1), I-neuron (H2) and D-
neuron (H3) respectively. The NNPID is shown in Fig.2 In NNPID when suitable connective weights are
chosen, a NNPID becomes a conventional PID controller.

A well known continuous PID controller is described using

where u is the controller output, KP is the proportional gain, KI is the integral time, KD is the derivative time,
and e is the error between the set point and the process output. For a digital control of ts sampling periods,
we can write

The figure shows the block diagram of the approach followed


w1hj= +1, w2hj= -1, w1ho =KP, w2ho =KI, w3ho =KD
then,
H1i=w1h1I1+ w2h1I2
H2i=w1h2I1+ w2h2I2
H3i=w1h3I1+ w2h3I2
H1i, H2i and H3i are input part of hidden layer nodes. The output of the hidden layer nodes are:
H1o= Terr
H2o=∫ Terr dt
H3o=d Terr /dt then,
O1i= w1hoH1o + w2hoH2o + w3hoH3o

Then the plant with NNPID and feedback is

Back-propagation algorithms: In the present control system, the aim of the NNPID algorithms is to tune the
PID parameters in such a way that the mean square error (MSE) is minimum which is given by
The weights of NNPID are changed by Steepest descent in on-line training process. The details of the weight
adjustments used are as given in reference [5]. The training of the neural network has been done by varying
the PID parameters and taking the sample online. The NNPID was trained with a total of 50 sets of PID
parameters each having 360 data points

 Fuzzy Logic

 Metaheuristics
Chapter 4 Genetic Algorithm
GENETIC ALGORITHM

Genetic Algorithm is a class of computational models inspired by evolution. The algorithms

encrypt a potential solution to a particular problem on a given generic chromosome-like data

structure and apply recombination operators of these structures as to preserve crucial

information. Genetic Algorithms is often seen as function optimizer, although the range of

problems to which the GA have been applied are quite wide.

The implementation of genetic algorithm starts with a population of (typically random)

chromosomes. One then evaluates these structures and allocated reproductive opportunities in

such a way that these chromosomes which will potray a better solution to the target problem are

given more chances to 'reproduce than those chromosomes which are giving poorer solutions.

The 'correctness' of a solution is characterized with respect to the current population.

Evolutionary Cycle

Chromosome:- a set of genes; a chromosome contains the solution in form of genes.

Gene:- a part of chromosome ;a gene contains a part of the solution.

Population: number of individuals present with the same length of chromosome. Fitness: the

value assigned to an individual based on how far or close an individual is from the solution;

greater the fitness value better the solution it contains.

Fitness Function: a function that assigns value to the individual it is problem specific.

Crossover :-Taking two fit individuals and then intermingling their chromosome to create two

new individuals.

Mutation: changing a random gene in an individual.

Selection :- Selecting individuals for creating the next generation.

6.2Working of Genetic Algorithm

6.2.1 Encoding: In order to use GA to solve the problems variables(xl,x2....xn) are first encoded

to strings. Binary-coded strings having 1's and O's are mostly used. The length of the string is
usually determined according to the desired solution accuracy. For example, if four bits are used

to code each variable in a two-variable optimization problem, the strings (0000 0000) and (1111

1111) would represents the points (x11, x21) T (xlu, x2u) T respectively, because the substrings

(0000) and

(1111) have the minimum and maximum decoded values, any other eight bit string can be found

to represent a point in the search space according to a mapping rule.

6.2.2 GA Operators:

the operation of GAs begins with a population of a random strings representing design or

decision variables. The population is then operated by three main operators; reproduction,

crossover and mutation to create a new population of points. GAs can be viewed as trying to

maximize the fitness function, by evaluating several solution vectors. The purpose of these

operators is to create new solution vectors by selection, combination or alteration of the current

solution vectors that have shown to be good temporary solutions. The new population is further

evaluated and tested till termination. If the termination criterion is not met, the population is

iteratively operated by the above three operators and evaluated. This procedure is continued until

the termination criterion is met. One cycle of these operations and d the operators subsequent

evaluation procedure is known as a generation in GAs terminology. The are described in the

following steps

6.2.3 Reproduction: Reproduction (or selection) is an operator that makes more copies of better

strings in a new population. Reproduction is usually the first operator applied on a population.

Reproduction selects good strings in a population and forms a mating pool. This is one of the

reason for the reproduction operation to be sometimes known as the selection operator. Thus, in

reproduction operation the process of natural selection cause those individuals that encode To

sustain the generation of a new successful structures to produce copies more frequently.

Population, the reproduction of the individuals in the current population is necessary. For better
individuals, these should be from the fittest individuals of the previous population. There exist a

number of reproduction operators in GA literature, but the essential idea in all of them is that the

obese average strings are picked from the current population and their multiple copies are

inserted in the mating pool in a probabilistic manner.

6.2.4 Crossover

A crossover operator is used to recombine two strings to get a better string. In crossover

operation. Recombination process creates different individuals in the successive generations by

combining material from two individuals of the previous generation. In reproduction, good

strings in a population are probabilistic-ally assigned a larger number of copies and a mating

pool is formed. It is important to note that no new strings are formed in the reproduction phase.

In the crossover operator, new strings are created by exchanging information among strings of

the mating pool.

The two strings participating in the crossover operation are known as parent strings and the

resulting strings are known as children strings. It is intuitive from this construction that good

sub-strings from parent strings can be combined to form a better child string, if an appropriate

site is chosen. With a random site. the children strings produced may or may not have a

combination of good sub-strings from parent strings, depending on whether or not the crossing

site falls in the appropriate place. But this is not a matter of serious concern. because if good

strings are created by crossover. there st ill be more copies of them in the next mating pool

generated by crossover. It is clear from this discussion that the effect of cross over may he

detrimental or beneficial. Thus. in order to preserve some of the good strings that are already

present in the mating pool, all strings in the mating pool are not used in crossover. A-hen a

crossover probability. Defined here as Pc is used. only 100(1-Pc) per cent strings in -Pc) per cent

of the population the population are used in the crossover operation and 100(1 re • ins as they are
in the current population. A crossover operator is mainly responsible for the search of new string

even though mutation operator is also used for this purpose sparingly.

String 1 |011|01100| String 1 |011|11001|

String 2 |110|11001| String 2 |011|01100|

Before crossover After crossover

Fig:1

6.2.5 Mutation:

Mutation adds new information in a random way to the genetic search process and ultimately

helps to avoid getting trapped at local optima. It is an operator that introduces diversity in the

population whenever the population tends to become homogeneous due to repeated use of

reproduction and crossover operators. Mutation may cause the chromosomes of individuals to be

different from those of their parent individuals.

Mutation in a way is the process of randomly disturbing genetic information. They operate at the

hit level; when the bits are being copied from the current string to the new string, them is

probability that each bit may become mutated. This probability is usually a quite small value.

called as mutation probability Pm. A coin toss mechanism is employed; if random number

between zero and one is less than the mutation probability, then the bit is inverted, so that zero

becomes one and one becomes zero. This helps in introducing a bit of diversity to the population

by scattering the occasional points. This random scattering would result in a better optima, or

even modify a part of genetic code that will be beneficial in later operations. On the other hand,

it might produce a weak individual that will never be selected for further operations.

The need for mutation is to create a point in the neighbourhood of the current 1,4tint, theft+)

achieving a local search around the current solution. The mutation is also used to maintain
diversity in the population. For example, the Wowing population having four eight hit strings

may be considered:

Original Off Spring 1 : - 1 0 1 0 1 1 1 1 0 0 0

Original off Spring 2 :- 1 1 1 0 1 1 0 0 1 1 1

Mutated Off spring 1 :- 1 1 1 0 1 1 1 1 0 0 0

Mutated off spring 2 :- 1 1 1 0 1 1 0 0 1 0 1


CHAPTER 4:Artificial Bee Colony
Swarm Intelligence employs the collective behaviors in the animal societies to design algorithms. Agents interact
locally with each other and the environment and follow certain rules so as to arrive to the most optimum solution.
Eric Bonabeau defined Swarm intelligence as
“any attempt to design algorithms or distributed problem-solving devices inspired by the collective behaviour of
social insect colonies and other animal societies”

A Swarm is a configuration of tens of thousands of individuals that have chosen their own will to converge on a
common goal.

Swarm of bees

Two fundamental concepts that are necessary to obtain swarm intelligent behavior: -
• Self-organization : can be defined as a set of dynamic individuals that work in ally to achieve a common
goal under a set of rules. The rules ensure that the interactions are executed on the basis of purely local
information without any relation to the global pattern.

• Division of Labor: In swarm behavior various tasks are performed simultaneously by specialized individuals
which is specified to as the division of labor. It enables swarm to respond to a changed condition in the
search space specified for them.
Artificial Bee Colony –
ABC has been developed based on the behaviors of real bees on finding nectar and sharing the
information of food sources to the bees in the hive.
Three essential components of this process:
• Food Sources: The value of a food source depends upon its proximity to the hive, its concentration of its
energy and the ease of extracting this energy.

Employed Foragers: They are associated with a particular food source which they are currently exploiting
or are “employed” at. They carry with them information about this particular source, its distance and
direction from the nest, the profitability of the source and share this information with a certain probability
• Unemployed Foragers: They are continually at look out for a food source to exploit. There are two types of
unemployed foragers: scouts, searching the environment surrounding the nest for new food sources and
onlookers waiting in the nest and establishing a food source through the information shared by employed
foragers.

Key Agents in ABC:


The Employed Bee:
It stays on a food source and provides the neighborhood of the source in its memory.

The Onlooker Bee:


It gets the information of food sources from the employed bees in the hive and select one of the food
source to gathers the nectar.

The Scout:
It is responsible for finding new food, the new nectar, sources.

Exchange of Information among bees:


It is the most important occurrence in the formation of collective knowledge. Dancing Area is the most important
part of the hive with respect to exchanging information. Here communication among bees related to the food
sources quality takes place. This dance is called a Waggle dance. The employed foragers share the information
which they obtain with a probability proportional to the profitability of the food source, the higher the better and
the sharing of this information through waggle dancing takes a longer duration. The onlookers present on the
dance floor watches others dance and decides to employ herself at the most profitable source. Now we can see
that there is a greater probability of onlookers choosing more profitable sources since more information is
accumulated about the higher profitable sources.

Procedures of ABC:
1. Initialize the scouts (Move the scouts).

2. Deploy the onlookers.

3. Move the scouts only if the counters of the employed bees hit the limit.

4. Update the newly acquired memory

5. Keep hold of the terminational condition

Explanation
 Each cycle of search consists of three steps:
1. Moving the employed and onlooker bees onto the food sources
2. Calculating their nectar amounts
3. Determining the scout bees and directing them onto possible food sources.

 A food source position is a possible optimized solution.


 The amount of nectar of a food source corresponds to the quality of the solution. It holds a directly
proportional relationship.

 Using a probability-based selection process, onlookers are placed on the food sources.

Probability of Selecting a nectar source


P= F/SUM(F);
P: The probability of selecting the employed bee
F: The fitness value

 With the increase of food source in nectar, the probability value with which the food source is preferred by
onlookers increases, too.

 The scouts are characterized by low search costs and a low average in food source quality. One bee is
selected as the scout bee.

 The selection is controlled by a control parameter called "limit".

 If a solution representing a food source is not improved by a predetermined number of trials, then that
food source is abandoned and the employed bee is converted to a scout.

Control Parameters of ABC Algorithm


1. Swarm size / Bees Population

2. Limit

3. number of onlookers: 50% of the swarm

4. number of employed bees: 50% of the swarm

5. number of scouts: 1
CHAPTER 5:Genetically mutated Bee Colony

Figure 4.1: Image Classification of cat

The task performed by Image Classification here is to predict a single category

(or a distribution over categories) for a given input (image). 3-dimensional arrays of

integers ranging from 0 to 255, is what we call Images. This is in the Width x Height x

3 format. The 3 in the previous statement represents the three-color channels RGB.

4.1.3 Challenges
The detection of visual notion (e.g. cat) is comparatively minor for a

human to complete, but what is worth considering are the difficulties faced

from the viewpoint of a Computer Vision algorithm. Here we have

showcased the (an in-exhaustive) list of obstacles below and we know that

the raw representation of pictures as a 3-D array of brightness values:

Perspective variation. Orientation of any single instance of an object can be

10
varied by different methods with respect to the camera.
Scale Modification. Visual images often show change in their size
(size in the real world.)
Occlusion. The concerned objects may not have been blocked.
Occasionally, only a small chunk of an object may be in the vicinity.
Illumination conditions. The fallout of brightness are radical on the
picture element level.
Background clutter. The concerned objects may brew into their
surroundings, making their identification difficult.
Intra-class variation. The concerned classes can often be relatively
expansive, (ex., a chair). There are many disparate objects, each with
their own actualization, but essentially the same.
A sound image classification model must be free from the cross product of all these

changes, while consequently accommodating observance to the inter-class discords.

Figure 4.2: Image Classification on the basis of different factors

4.1.4 Data Driven Approach


Our main task is to write an algorithm that can classify different specimen images

into distinct brackets. Therefore, instead of trying to explain and code what each one of

the branches of interest look like directly, we provide the system with diverse and huge

11
data of each type and then develop self- learning algorithms that look at these examples

and learn about the visual appearance of each category. Since this method involves

accumulation of training dataset of labeled images, hence it is called data-driven method.

4.1.5 The image classification pipeline


The complete image classification pipeline can be validated as follow:

Input: The input provided to the system comprises of N images, each of


them labeled with the K distinct branches. This data is our training set.
Learning: The task here is to utilize the exercised set to learn what each class
highlights. This step is termed as training the classifier, or learning the model.
Evaluation: Lastly, we assess the standard of the classifier predicting different
labels for a distinct and fresh set of photos. Comparison of the true labels of
the data set to the ones predicted by the classifier is performed.

4.2 Elements of Supervised Learning


Neural Networks, CNN, XGBoost are employed for supervised learning problems.

Here, we would be utilizing the multi-featured training data set xi to predict the target

output yi.

4.2.1 Model and Parameters


The model used in supervised learning is the mathematical composition of the

prediction process to output yi when given xi. For example, a general representation is a

linear model, where the predicted result is shown as

, [4.1]

12
A linear composition of weighted input attributes. Different interpretations

can be made from the predicted values which are also dependent on the

task, i.e., regression or classification. Mainly, learning from the data is

required to extract the undetermined parameters.

4.2.2 Objective Function: Training Loss + Regularization


Different interpretations of yi gives us different problems, some of them being

ordering, regression, classification, etc. Our objective is to determine the best

parameters from the given training dataset. For this, we define an objective function,

to compute the performance of the model given a certain set of specifications.

An important aspect about objective functions is that they comprise of two parts:

Training loss and regularization.

[4.2] where L is the training loss function, and is

the regularization term.

The training loss calculates the accuracy of prediction computed by

the model after employed on the training data. A commonly used training

loss is mean squared error.

[4.3]

Logistic loss for logistic regression is also a form of a loss represented below:

[4.4]

13
Overfitting is minimized by controlling the complexity of the model

by the regularization term.

Figure 4.3 s

The answer is marked in red. The need of the hours is that we want

to structure both a basic and probabilistic model.

4.2.3 Neural Networks


4.2.3.1 PERCEPTRON

A perceptron takes several binary inputs and produces single binary output.

14
Figure 4.4: Perceptron Input and Output

From the above figure, we can see that here x1, x2, x3 are binary inputs

and after passing through a perceptron we get an output (0 or 1) based on

certain calculations involving the weights assigned to each of the inputs.

Alteration of weights and threshold gives us the freedom to design

different models to arrive at a decision, making parameter changes in

order to produce a desired output.

Output= [4.5]

From the above equation, we can see that the output is dependent on the

threshold. The conditions mentioned above involving threshold is cumbersome. So,

in order to overcome this drawback, two notational variations are made to simplify it.

The first one is written in dot product form where w and x are vectors. These

vectors have components which is basically the weights and inputs, respectively.

15
The next variation is to eliminate the threshold by moving it to the other side of

the inequality. This is further replaced by perceptron bias, Now

after making these two changes, the equations modify into the following format:

Output= [4.6]

Assigning of weights and thus arriving at the mathematical model of a neuron is done as:

Figure 4.5: Representation of a Perceptron

Thus, a complex network of perceptrons can give us subtle

decisions which could help us in our classification problem.

16
Figure 4.6: Network of P

In the structure, first column of perceptrons are called first layer of

perceptrons - this is to make three decisions. This is done by weighing the input

evidence. A perceptron in the second layer is needed to make decisions at a more

fundamental and theoretical level than perceptrons present in the first layer. Similarly,

a multi-layered network of perceptrons indulge in more complex decision making.

The steps involved here are: neurons performing dot product with the input itself

and the weights assigned, then addition of the bias takes place. Lastly application of the

the non-linearity (or activation function), in this case, the sigmoid is done:

(x)=1/(1+e^( x)) [4.7]

4.2.3.2 ACTIVATION FUNCTIONS

A single number is taken by any activation function (or non-linearity)

and a certain fixed mathematical exercise is performed on it. The different

activation functions we come across in practice are:

17
Figure 4.7: Sigmoid Function

Sigmoid non-linearity squashes real numbers to range between [0, 1]

4.2.3.2.1 SIGMOID

The mathematical form of sigmoid non-linearity is represented by eq[4.7].

As mentioned previously, it inputs a real-valued number and compresses it into

range between 0 and 1. Particularly, large negative numbers are approximated to 0

and positive 1. Frequently used as the firing rate of a neuron,

the interpretation involved is: No firing (0) to fully-saturated firing at supreme

frequency (1). The sigmoid non-linearity has two major drawbacks:

Sigmoids saturate and kill gradients. The gradient at few regions is approximately

0 and this is a very unwanted property of the sigmoid neuron. This is due to the reason

at the time of backpropagation. Hence, if the value of the local gradient is negligible, it

18
will diminish the gradient and hence no sign and will pass through the neuron to

its weights and vice versa to the data. Moreover, cautious approach should be

employed during the process of initialization of weights of sigmoid neurons.

Sigmoid outputs are not zero-centered. This issue is less severe and has relatively

easy going consequences compared to the saturated activation problem. The neurons in

subsequent layers during processing in a Neural Network receive non zero-centered data.

This leads to introduction of detrimental zig-zag dynamics in the gradient updates for the

weights. However, upon addition of these gradients up across the batch of data the final

updated weights have variable signs, thereby overcoming this drawback.

4.2.3.2.1.1 RELU

The Rectified Linear Unit calculates the function by maximizing x.

Alternatively, the activation function is basically threshold at zero. The

different advantages and disadvantages of using ReLU are:

(+) Observations were made that acceleration provided to convergence

of stochastic gradient descent compared to the sigmoid/tanh functions.

(+) ReLU can be executed by thresholding a matrix of activations at

zero in comparison to sigmoid neurons involving expensive

procedures (exponentials etc.)

(-) Contrastingly, ReLU units can be delicate while training and can effectively

19
4.2.4 Convolutional Neural Networks (CNNs)
Convolutional Neural Networks are made up of neurons that have learnable

weights and biases. Receiving of some inputs and performing a dot product with

weights, is one of the initial task done by neurons in CNN. Mostly it follows a non-

linearity. Overall whole network results a single differentiable score function: where

the raw image pixels are on one end and class scores at the other end. However CNN

are somehow different from the normal neural network. One assumption has been

considered that the input are most of the times are images, which leads to some

properties while designing its architecture. It results to the reduction of number of

parameters in the network and the increased efficiency of the leading function.

4.2.4.1 ARCHITECTURE OVERVIEW

Regular neural nets consist of the hidden layers of neurons, which are

fully connected with the neurons in the previous layer. Different neurons in a

particular layer are completely independent and the weights are unshared. So

the interconnectivity is on very large scale, so if an image is of size 100*200*3,

it would results in 60,000 weights between different neurons. Still this is

admissible but in the cases like if picture is of large size like 500*200*3. The

increase in weights as compared to the previous example is significant. This

is really wasteful and leads to some significant problems like overfitting.

20
Figure 4.8: Representation of Perceptron including hidden layer

Figure 4.9: A regular 3-layer Neural Network

Each stimulus from the input is converted into the result by

activation of 3-D neurons in each layer. In the above example the input

image in the starting is fed into the 3-D CNN, which takes its dimensions as

height, width and depth would be 3 (Green, Red and Blue Channels).

21
4.2.4.2 LAYERS USED TO BUILD CONVOLUTIONAL NEURAL

NETWORKS The simple convolutional neural networks

architecture consists of three layers: Convolutional Layer

Pooling Layer

Fully-Connected Layer

Overall transformation of the original image to the final score includes many hidden

Networks are calculated using gradient descent to keep the output class

scores efficient with the labels in the training set for each image.

4.2.4.2.1 CONVOLUTIONAL LAYER

The Convolutional layer is the integral part of Convolutional neural network,

which have some parameters they consists of a set of adjustable or learnable filters.

In Convolutional neural network each filter slides across the input volume and maps

the input volume to a particular response using 2D activation map. So overall

Convolution Layer consists of some filters, ehich maps them to 2D matrix, each filter

will produce different 2D matrix, all of these are stacked together and results the

output volume. Convolutional neural network as it resembles with the human brain

neural network, the neurons have different connectivity and arrangement. So now we

discuss the connectivity and parameter sharing scheme of neurons.

22
4.2.4.2.1.1 LOCAL CONNECTIVITY.

In case of high-dimensional inputs like images of large size, we need

to connect neurons to its local region of input volume unlike like normal

neural networks that is wasteful.

The connectivity of a neuron with its input volume is called the receptive field

of the neuron. The depth of the Convolutional Layer can be compared to the actual

depth of the input volume. There is symmetry in the spatial dimensions (height and

width) and depth in terms of connectivity. There is a local connectivity in case of

spatial dimensions but there is full connectivity in case of depth.

Figure 4.10: Local connectivity representation of Perceptron

So we get to know that neurons overall the same as like normal

neural networks, difference is only there in connectivity. Here neurons are

connected locally in spatial dimensions.

23
Figure 4.11: Representation of Perceptron body

4.2.4.2.1.2 SPATIAL ARRANGEMENT

Connectivity explains that how different neurons in layer are connected

with the input layer or any other layer. But how many neurons are in the

hidden layer and the arrangement of them is not defined till now, arrangement

consists of the parameters like depth, stride and zero-padding.

Depth: Each layer consists of different layers, which further

consists of different filters; the no of filters used in a particular layer is

defined as the depth of the layer. Each neuron is assigned some particular

region, if that region is activated it will give stimulus corresponding to it.

Stride: A filter in a layer after mapping of particular region moves to the next

region after leaving some pixels of the input raw image to do mapping again. This

movement of filter in a layer and the no. of pixels left at each step is known as stride.

24
Zero-padding: Sometimes we want to keep the dimensions of output raw

image is equal to the input raw image, to keep the maximum details of input after

convolution. To do that we add some zeros on the boundary of the input.

W is the size of input volume

F is the Receptive field size of Convolutional Layer

Neurons S is the Stride

P is the zero padding

Illustrated formula to calculate the no. of neurons fit

is giver by No. of neurons= (W-F+2P)/S + 1 A

graphical example:

Figure 4.12: Spatial Arrangement of Neurons

4.2.4.2.1.3 PARAMETER SHARING

There is a scheme used in Convolutional Neural Network unlike in Normal

Neural networks known as Parameter sharing. Parameter sharing results in a

reduction of parameters, if one set of parameters are used in a particular position

(x,y), then this set of parameters is further implied at (x1,y1).

25
4.2.4.2.2 POOLING LAYER

After each layer of convolution there is a pooling layer. That results in

reduction of the parameters and spatial size. It overall control overfitting.

4.2.4.2.3 GENERAL POOLING

In accession to max pooling, the pooling building blocks could as well

execute different routines, such as mean pooling or L2-norm pooling. Mean

pooling constituted historically but bears of late accrued away of favour equated

to the max pooling cognitive process, which delivers bettor in practice.

Figure 4.13: General Pooling

26
Figure 4.14: Illustration of Pooling

Individually, in respective depth portion of the stimulant volume, Pooling

layer down samples the volume spatially.

4.2.4.2.4 BACKPROPAGATION

Backswept pass for a max(x, y) cognitive operation has a elementary

rendition as simply rootling the slope to the stimulant that delivered the most

high-pitched appraise in the forward-moving pass. Therefore, on the forward pass

of a pooling bed it's basal to go along the track of the indicator of the max

activation in order that gradient gouging costs effective on backpropagation.

4.2.4.2.5 GETTING RID OF POOLING

A lot people disfavour the pooling cognitive process and believe that we can break

loose without them. A few advices to cast aside the pooling bed in favour of architecture

that alone comprises of duplicated Convolutional layers. To abridge the sizing by the

theatrical performance, they suggest employing more gravid pace in CONV

27
layer from time to time. Disposing pooling layers causes variations between

coaching benevolent procreative models, such as variationally autoencoders

or generative adversarial networks. It appears belike that hereafter

architectures bequeath features very a couple to no pooling beds.

4.2.4.2.6 NORMALIZATION LAYER

Numerous cases of normalization have been nominated for employment in

Convolutional neuronal Networks architectures, occasionally with the purposes of

carrying out suppression connives abided by in the biologic brain. All the same, these

beds get passed off of favor since in practice their share consumes are minimal.

4.2.4.3 FULLY-CONNECTED LAYER

Neurons in an amply colligated layer bear entire connectives to each and every

activation in the former bed, as seen in regular neuronal Networks. Their

could thence be calculated on a matrix multiplication abided by a predetermine setoff.

4.2.4.3.1 CONVERTING FC LAYERS TO CONV LAYERS

It's valuable to mention that the only deviation between Fully Connected and

Convolutional layers constitutes that the neurons in the Convolutional bed are attached

only to a local domain in the input. Overall the configuration of both Fully Connected and

Convolutional layer i.e. taking the dot product of input with the content of the filters is

same. Consequently, it's imaginable to change over betwixt FC and CONV beds.

28
4.2.5 MODELLING OF XGBOOST
4.2.5.1 TREE ENSEMBLE

Classification and regression trees (CART) comprises of the tree ensemble model

The anticipation grades by apiece case-by-case tree is added together to acquire the

concluding score. A crucial fact is that the 2 trees adjudicate to complement to each other.

Mathematically, we can compose our exemplary model in the build as follows:

[4.8]

Where K is the number of trees, f is a function in the functional space F, and F is


the set of all possile CARTs. Therefore our objective to optimize can be written as

Obj( )= [4.9]

4.2.5.2 TREE BOOSTING (TRAINING OF TREE)

As follows for each supervised ascertaining models: delineate an

object function, and optimise it. We could bear the accompanying objective

function (It always calls for grooming loss and regularisation)

Obj= [4.9]

4.2.5.3 ADDITIVE TRAINING

This is a great deal more arduous than conventional optimisation trouble wherever we

can take the slope and exit. It is not easygoing to coach altogether the trees at once.

Alternatively, we exercise an accumulative scheme: fix what we have got ascertained,

29
and append one fresh tree at once. We compose the forecasting measure

at step t as y(t), so we accept

=0

= +

= + [4.10]

It continues to call for, which tree's act do we desire for each one step. An innate

matter constitutes to add together the match that optimises our objective.

= +

Whenever we deliberate applying MSE as our departure function, it converts to

the accompanying form.

+ [4.11]

In the above anatomy it has a 1st decree condition (generally addressed as the

residuary) and a quadratic equation condition. For some other deprivations of concern (as

30
case, logistical departure), it's not so facile to acquire such a courteous

form. So in the cosmopolitan example, we adopt the Taylor elaboration of

the departure function up to the 2d degree.

= )+ [4.11]

where the gi and hi are delimitated as

After we abolish all the constants, the specific objective at step t takes the form:

[4.12] This converts to our

optimisation destination for the fresh tree. A significant vantage of this definition

constitutes that it only depends upon gi and hi. This is how xgboost can affirm

customised departure functions. We could optimise all departure functions,

letting in logistical regression and weighted down logistical regression,

applying incisively the equivalent solver that accepts gi and hi as stimulant.

4.2.5.3.1.1 MODEL COMPLEXITY


We bear acquainted the training step, but there is one significant matter, the

so, we first elaborate the definition of the tree f(x) whilst

31
[4.13] Here w is the transmitter

by grades on leaves, q is a function ascribing apiece datum to the

representing leaf, and T is the count of leaves. In XGBoost, we delineate the

complexness as

[4.14] At that place could constitute to

a higher degree one way to delineate the complexness, only this particular one and

only does work advantageously in practice. The regularisation embodies one portion

most tree software packages address to a lesser extent cautiously, or just dismiss.

This constituted since the conventional discourse of tree learning entirely

emphasised ameliorating impureness, whilst the complexness ascendance was

entrusted to heuristics. By delineating it officially, we could sabot a

better estimate of what we are acquiring, and it cultivates advantageously in practice.

***

32
CHAPTER 5:

Before putting the algorithms to work for our humongous data, we need various

pre-processing techniques to feed data properly to the networks for maximum efficiency.

5.1 Preprocessing
Working with these files can be a challenge, especially given their

heterogeneous nature. Some preprocessing is required before CNNs can be applied

on the dataset. A comprehensive overview of useful steps to take before the data hits

our Convolutional Neural Networks/other ML method. We follow the following stages:

Figure 5.1: Flow Diagram of Classification

33
Before we start, we need to import some packages to determine the available patients.

5.1.1 Loading the files


Dicom is the de-facto file standard in medical imaging. These files

contain a lot of metadata (such as the pixel size, so how long one pixel is

in every dimension in the real world).

This pixel size/coarseness of the scan differs from scan to scan (e.g. the

distance between slices may differ), which can hurt performance of CNN approaches.

We can deal with this by isomorphic resampling, which we will do later.

We write a code to load a scan, which consists of multiple slices, which we simply

save in a Python list. Every folder in the dataset is one scan (so one patient). One

metadata field is missing, the pixel size in the Z direction, which is the slice

thickness. Fortunately, we can infer this, and we add this to the metadata.

The unit of measurement in CT scans is the Hounsfield Unit (HU), which is a

measure of radio density. CT scanners are carefully calibrated to accurately measure this.

34
Figure 5.2: Hounsfield Unit of different matter

By default, however, the returned values are not in this unit. Let's fix this.

Some scanners have cylindrical scanning bounds, but the output image is

square. The pixels that fall outside of these bounds get the fixed value -2000. The

first step is setting these values to 0, which currently corresponds to air. Next,

let's go back to HU units, by multiplying with the rescale slope and adding the

intercept (which are conveniently stored in the metadata of the scans).

35
Figure 5.3: Frequency vs. Hounsfield Unit

Figure 5.4: Lung Slice

Looking at the table from Wikipedia and this histogram, we can clearly see

which pixels are air and which are tissue. These images are used for lung

segmentation. Now, we can begin to iterate through the patients and gather their

respective data. We're certainly going to need to do some preprocessing of this data.

36
We iterate through each patient, we grab their label, we get the full path to that

specific patient (inside THAT path contains ~200 scans which we also iterate over, but

Do note here that the actual scan, when loaded by dicom, is clearly not JUST some

sort of array of values, instead it has got attributes. There are a few attributes here of

arrays, but not all of them. We're sorting by the actual image position in the scan. Later,

we could actually put these together to get a full 3D rendering of the scan.

One immediate thing to note here is those rows and columns 512 x 512.

This means, our 3D rendering is a 195 x 512 x 512 right now.

We already know that we're going to absolutely need to resize this

data. Being 512 x 512, we already expect all this data to be the same size,

but let's see what we have from other patients too.

We just went ahead and grabbed the pixel array attribute, which is what we

assumed to be the scan slice itself (we will confirm this soon), but immediately we

are surprised by this non-uniformity of slices. This isn't quite ideal and will cause

a problem later. All of our images are the same size, but the slices aren t. In terms

of a 3D rendering, these actually are not the same size.

We've got to actually figure out a way to solve that uniformity problem, but also

these images are just way too big for a convolutional neural network to handle without

37
some serious computing power. Thus, we already know out of the gate that we're going to

need to down-sample this data quite a bit, and somehow make the depth uniform.

5.1.2 Resampling
A scan may have a pixel spacing of [2.5, 0.5, 0.5], which means that the distance

between slices is 2.5 millimeters. For a different scan this may be [1.5, 0.725, 0.725], this

can be problematic for automatic analysis (e.g. using Convolutional Neural Networks)

A common method of dealing with this is resampling the full dataset to a

certain isotropic resolution. If we choose to resample everything to 1mm-

1mm-1mm pixels we can use 3D Convolutional Neural Networks without

worrying about learning zoom/slice thickness invariance.

Whilst this may seem like a very simple step, it has quite some edge

cases due to rounding. Also, it takes quite a while.

38
Figure 5.5: Z-Axis Sweep

39
Figure 5.6: X-Axis Sweep

40
Figure 5.7: Y-Axis Sweep

It is easy to see the 3 different perspectives. Another neat thing is that once

we got the scans we managed to visualize them using very basic python open source

tools - basically numpy and matplotlib. No need for fancy medical imaging tools.

We have no medical expertise whatsoever so we can just stare at them for

a bit and maybe read more on human anatomy.


41
5.1.3 Edge detection and other convolutional filters:
Since we're interested in detecting whether a patient will be diagnosed with

lung cancer, we can try to see if we can detect pulmonary nodules using something

like edge detection. This can be done using a sobel filter (aka hand crafted one-filter

CNN). Let's also take a look at the distribution of the pixels values in an image first:

Figure 5.8: Distribution of Pixels in Image

Interesting - the distribution seems to be roughly bimodal with a

bunch of pixels set at - 2000 - probably for missing values.

42
Figure 5.9 Edge Detection 1

The Sobel filter does find the edges but the image is very low intensity. One

thing we can do is to simply threshold the image to see the segmentation better:

Figure 5.10 Edge Detection 2

43
Figure 5.11 Edge Detection 3

Interesting results, however the issue here is that the filter will also detect the

blood vessels in the lung. So, some sort of 3-D surface detection that differentiates

between spheres and tubes would be more suitable for this situation.

44
5.1.4 3D plotting the scan
For visualization it is useful to be able to show a 3D image of the scan.

Unfortunately, the packages available in the Docker image is very limited in this sense, so

we will use marching cubes to create an approximate mesh for our 3D object, and plot

this with matplotlib. Quite slow and ugly, but this was the best possible way.

Our plot function takes a threshold argument which we can use to plot certain

structures, such as all tissue or only the bones. 400 is a good threshold for

showing the bones only (from the Hounsfield unit table above).

Figure 5.12: 3D Plot of Lung

45
5.1.5 Lung segmentation using water sheds
Most suggested Lung Segmentation methods mainly involve thresholding the lung

tissue based on its Hounsfield value and using morphological dilation to include nodules

in border regions. These methods have the severe drawback of also including lots of

tissue that is neither lung, nor a region of interest. We coded an algorithm based on the

one presented in R Shojaii et al [8] with some modifications, that we present here.

The resulting CT Images are in HU and have the same (not necessarily

equidistant) scale as the original scans.

Figure 5.13: Original Input Slice

In order to use marker based watershed segmentation, we need to identify two

markers. An internal marker, that is definitely lung tissue and an external marker, that is

definitely outside of our ROI. We're starting by creating the internal marker by

thresholding the Image and removing all regions but the biggest one. The external marker

is created by morphological dilation of the internal marker with 2 different iterations and

46
subtracting the results. A watershed marker is created superimposing the 2 markers with

different grayscale values.

Figure 5.14: Internal Marked Slice

Figure 5.15: External Marked Slice

47
Figure 5.16: Watershed Marked Slice

Now we apply the marker based Watershed algorithm to find the

precise border of the Lung located in the Black strip of the Watershed

marker shown above. In order to do the algorithm, we also need the Sobel-

Gradient-Image of our original scan, which is calculated first.

In order to not miss nodules located next to the border regions a Black Top

Hat Operation is performed to re-include those areas and areas surrounding the lung.

This is the main advantage of this method here over the methods. Only areas that

need re-inclusion get dilated, everywhere else the lung border stays precise.

48
Figure 5.17: Sobel Gradient

Figure 5.18: Watershed Image

49
Figure 5.19: Outline After Re-Inclusion

Figure 5.20: Lung filter after Closing

50
Figure 5.21: Segmented Lung

The resulting images of this code are still in the original dimensions of the

CT Scan and in Hounsfield Units with the filtered areas being assigned -2000.

This method of lung segmentation preserves the original lung border

very precisely while re-including possible nodule candidates in border

regions. The main downside is the much longer processing time per patient.

5.1.6 3D Lung segmentation


In order to reduce the problem space, we can segment the lungs

(and usually some tissue around it). It involves quite a few smart steps. It

consists of a series of applications of region growing and morphological

operations. In this case, we will use only connected component analysis.

51
The steps:

Threshold the image (-320 HU is a good threshold, but it doesn't

matter much for this approach).

Do connect components, determine label of air around person, fill

this with 1s in the binary image.

For every axial slice in the scan, determine the largest solid connected

component, and set others to 0. This fills the structures in the lungs in the mask.

Keep only the largest air pocket (the human body has other pockets of air).

Figure 5.22: 3D Segmented Lung

52
But there's one thing we can fix, it is probably a good idea to include structures

within the lung (as the nodules are solid), we do not only want air in the lungs.

Figure 5.23: 3D Lung-2

We also visualized the difference between the two.

53
Figure 5.24:3D Lung-3 (Difference)

When we want to use this mask, we first apply a dilation

morphological operation on it (i.e. with a circular kernel). This expands the

mask in all directions. The air + structures in the lung alone will not contain

all nodules, in particular it will miss those that are stuck to the side of the

lung, where they often appear. So we expand the mask a little.

54
This segmentation may fail for some edge cases. It relies on the fact that the

air outside the patient is not connected to the air in the lungs. If the patient has a

tracheostomy, this will not be the case, we do not know whether this is present in the

dataset. Also, particularly noisy images (for instance due to a pacemaker in the image

below) this method may also fail. Instead, the second largest air pocket in the body

will be segmented. We can recognize this by checking the fraction of image that the

mask corresponds to, which will be very small for this case. We can then first apply a

morphological closing operation with a kernel a few mm in size to close these holes,

after which it should work (or more simply, we do not use the mask for this image).

5.1.7 Normalization
Our values currently range from -1024 to around 2000. Anything

above 400 is not interesting to us, as these are simply bones with different

radio density. Thus, we normalize between -1000 and 400.

5.1.8 Zero centering


As a final preprocessing step, it is advisory to zero center our data so

that our mean value is 0. To do this we simply subtract the mean pixel value

from all pixels. We found this to be around 0.25 in the LUNA16 dataset.

mean per image. The CT

scanners are calibrated to return accurate HU measurements. There is no such

thing as an image with lower contrast or brightness like in normal pictures.

55
5.2 Running the algorithms
With these steps our images are ready for consumption by our Neural Network,

CNN, XGBoost and/or other ML methods. We can do all these steps offline (one time

and save the result), and we let it run overnight as it took a long time.

Now, the data we have is actually 3D data, not 2D data that's covered in

most tutorials and papers.

Our convolutional window/padding/strides need to change. Now, to

have a bigger window, our processing penalty increases significantly as

we increase in size, obviously much more than with 2D windows.

Now we're set to train the network. When running locally, we make sure our

training data is NOT the sample images, it should be the stage1 images. Our

training file should be ~700mb with ~1400 total labeled samples.

5.2.1 CNN Parameters


5.2.1.1 CONVOLUTIONAL LAYERS COUNT = 2:

The additional convolutional layers the dearer (moderately, because for each

one convolutional layer abbreviates the count of stimulant characteristics to the fully

colligated layers). Although afterward around 2 or 3 layers the truth amplification gets

quite belittled and so we call for to accept a tradeoff between generalization truth and

training time. That aforesaid, whole image identification chores are dissimilar and so

the dearest technique is to merely adjudicate incrementing the count of convolutional

layers one after another till we get an acceptable final outcome.

56
5.2.1.2 NODES PER HIDDEN LAYER COUNT = 64:

In that respect is an expression to determine the count of nodes because it's

unlike for for each one task that demands to be executed. To bring about a rough

maneuver we broadly keep the count of nodes 2/3 the size of the former layer, holding

the 1st layer 2/3 the size of the concluding feature mappings. This nevertheless is

barely a approximative guide and reckons chiefly upon the dataset. Another ordinarily

employed selection is to commence with an overweening count of nodes, then to take

away the unneeded nodes through with a method addressed pruning.

5.2.1.3 STRIDE SIZE = [1,2,2,2,1]:

Forthwith, we essentially assign the pace with which we glide the filter.

Whilst the pace is 1 then we motion the filters unitary pixel at once. Whilst the

pace is deuce (or uncommonly deuce-ace or more, although this constitutes

infrequent in practice) and so the filters alternate 2 pixels at once. This

consequences in bringing forth belittled end product intensities spatially.

5.2.1.4 THE NUMBER OF HIDDEN LAYERS=4:

The count by concealed layers called for hinges upon the intrinsical complexness of

our dataset, this could be empathised by considering what apiece layer accomplishes:

0 Concealed layers admit the network to framework just a linear function.

This represents as poor for just about all image identification chores.

57
Unitary concealed layer allows for the network to framework an

indiscriminately complicated function. This constitutes decent

enough for numerous image recognition chores.

In theory, deuce concealed layers offer up minuscule profit over an undivided

layer, nevertheless, in practice a few chores could ascertain a supplementary

layer salutary. This had better be handled with cautiousness, because a 2d

layer can cause over-fitting. Employing to a higher degree deuce concealed

layers is just about never beneficial only advantageous for peculiarly complex

chores, or when a very macroscopic quantity of training data is accessible.

5.2.2 XGBoost Parameters:


5.2.2.1 N_ESTIMATORS = 1500:

This constitutes the count of sequential trees to be modeled in the

Boosting Tree. GBM is reasonably robust whilst working with high

number of trees. Nevertheless, it may still overfit sometimes. Thus,

this needs to be tuned for a particular learning rate.

5.2.2.2 MIN_CHILD_WEIGHT = 9:

This is defined as a fraction of the total number of observations

instead of an integer.

5.2.2.3 MAX_DEPTH = 10:

58
This parameter controls over-fitting. The deeper the tree, the better

the model learns relations specific to a definitive sample.

5.2.2.4 LEARNING_RATE = 0.05:

The learning rate resolves the impact of each tree on the final result.

Initially, GBM works by beginning with an antecedent appraisal

which is thereupon renewed using the output of each tree. The

learning parameter administers the magnitude of these changes.

In order to devise a robust model, lower values are predominantly preferred.

This makes the specific characteristics of tree to be generalized as well.

However, this is computationally extravagant since lower values

would require greater number of trees to model all relations.

5.2.2.5 COLSAMPLE_BYTREE = 0.80:

This is the ratio of columns (sub-sample) used while constructing the tree.

5.2.2.6 SEED = 42:

We need to generate random folds for cross validation, which is

done with the seed.

5.2.2.7 NTHREAD = 8

With multi-core processing available, the maximum number of

parallel trees to be processed simultaneously can be set with this.

59
5.2.2.8 SUBSAMPLE = 0.80:

The number of observations by the total, to be chosen for each tree.

This is accomplished by executing random sampling.

To reduce variance, values slightly less than 1 is taken, to make the

model robust to it.

5.2.2.9 VERBOSE = TRUE

We have an output to be printed when the model fits the cross-

validation sets. The disparate type of values it can take are:

0: no output generated (default)

1: output generated for trees in certain intervals

>1: output generated for all trees

5.2.2.10 EARLY_STOPPING_ROUNDS = 50:

To train complex machine learning models and avoid overfitting, we use

this. It monitors the performance of the model being trained on a separate

test dataset. This training procedure is thereupon curbed when the

performance on the test dataset does not show any improvisation even

after a number of training iterations, that can be specified by the user.

60
CHAPTER 6:

We have used Log-loss, a (incorporating the

idea of probabilistic confidence), as a principal evaluation metric. It is essentially the

cross entropy between true labels and predictions distribution pattern. It is the extra

unpredictability when one assumes a different distribution than the true distribution,

added to the entropy of the true distribution. We thus maximized the accuracy of our

classifier by minimizing the cross entropy. Here

[6.1]

· n is number of patients in the test dataset


· is predicted probability of a cancer patient image
· is 1 if cancer is diagnosed, else 0.
· log() is natural logarithm

Figure 6.1: Log Loss v/s probability for a single positive instance

61
6.1.1 Neural Network
25 Iterations were run for the Preprocessed data (muchdata-50-50-

20.npy) in the root folder as:

Figure 6.2: Root Folder

Iterations appear as below, while processing:

62
Figure 6.3: Results with each Iteration

A mean Log-Loss of 0.88 is achieved using this algorithm.

6.1.2 Convolutional Neural Network


Iterations for the convolutional neural network run as follows:

63
Figure 6.4: System Usage Stats and iterations result

Thus, we see a similar curve to the Neural Network, but lesser Log-Loss,

as expected. We achieve a mean Log-Loss error of 0.63 on the test dataset.

64
6.1.3 XGBoost
Using a grid search for optimum value of parameters, and printing values after

each cross validation, we obtain the following result:

Figure 6.5: XGBOOST Output

65
We see a high-variance curve with lesser Log-Loss on average. We achieve a

mean Log-Loss error of 0.57 on the test dataset.

6.1.4 Comparison
Comparing the errors for the three algorithms, we have:

Figure 6.6: Comparison Curve of CNN and XGBOOST

This shows the Log-loss error comparison for the first 25 iterations
of the said algorithms. We thus have the mean Log-loss errors as:

66
Table 6.1: Log loss Error of Neural Network, CNN and XGBOOST

S.no. Algorithm Log-loss error

1. Neural Network 0.88

2. Convolution Neural Network 0.63

3. XGBoost 0.57

67
CHAPTER 7:

In this thesis, we have presented a new approach for lung disease

detection from low dose CT scan data, which can help Radiologists reduce lung

cancer mortality rate. Till date, CNN was considered to be de-facto for image

recognition problems. Our results showcase XGBoost as a promising alternative,

bettering the output of CNN by a Log-Loss margin of 0.06 for the aforementioned

data. Thus, further scope of improvement in these algorithms will provide ways to

make sizeable progress in the field of curing lung diseases.

Our method (input as 3D segmented and holistic images) is significantly

incommensurable to antecedent image patch-based algorithms and 2D

convolutions. It brings to light a more realistic and practical clinical solution.

The clinical meaning and learned feature-values from the algorithms is a direction

we plan to seek further research for. Furthermore, other lung-diseases can be recognized,

making identifying multiple diseases at a single slice additionally interesting.

68
[1] K. Murphy, B. van Ginneken, A. M. R. Schilham, B. J. de Hoop, H. A. Gietema, and
M
CT using local image features and k-nearest-
Analysis, vol. 13, pp. 757 770, 2009.

[2] A. A. A. Setio, C. Jacobs, J. Gelderblom,

10, pp. 5642 5653, 2015.

[3] E. M. van Rikxoort, B. de Hoop, M. A. Viergever, M. Prokop, and B. van Ginneken,


"Automatic lung segmentation from thoracic computed tomography scans using a hybrid

approach with error detection", Medical Physics, vol. 4236 no. 10, pp. 2934-2947, 2009.

[4] C. Jacobs, E. M. van Rikxoort, T. Twellmann, E. T. Scholten, P. A. de Jong, J. M.


Kuhnigk, M. Oudkerk, H. J. de Koning, M. Prokop, C. Schaefer-Prokop, and B. van

384, 2014

[5] Friedman, Jerome H. Greedy function approximation: A gradient boosting


machine. Ann. Statist. 29 (2001), no. 5, 1189--1232. doi:10.1214/aos/1013203451.

[6] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang,
Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang (2015): MXNet: A
Flexible and Efficient Machine Learning Library for Heterogeneous
Distributed Systems, arXiv:1512.01274 [cs.DC]

69
[7] Mingchen Gao, Ulas Bagci, Le Lu, Aaron Wu, Mario Buty, Hoo-Chang Shin,
Holger Roth, Georgios Z. Papadakis, Adrien Depeursinge, Ronald M.
Summers, Ziyue Xu & Daniel J. Mollura (2016): Holistic classification of CT
attenuation patterns for interstitial lung diseases via deep convolutional
neural networks, Computer Methods in Biomechanics and Biomedical
Engineering: Imaging & Visualization, DOI: 10.1080/21681163.2015.1124249

[8] Arnaud Arindra Adiyoso Setio, Alberto Traverso, Thomas de Bel, Moira S.N.
Berens, Cas van den Bogaard, Piergiorgio Cerello, Hao Chen, Qi Dou, Maria Evelina
Fantacci, Bram Geurts, Robbert van der Gugten, Pheng Ann Heng, Bart Jansen,
Michael M.J. de Kaste, Valentin Kotov, Jack Yu-Hung Lin, Jeroen T.M.C. Manders,
Alexander Sónora-Mengana, Juan Carlos García-Naranjo, Mathias Prokop, Marco
Saletta, Cornelia M Schaefer-Prokop, Ernst T. Scholten, Luuk Scholten, Miranda M.
Snoeren, Ernesto Lopez Torres, Jef Vandemeulebroucke, Nicole Walasek, Guido C.A.
Zuidhof, Bram van Ginneken, Colin Jacobs (2017): Validation, comparison, and
combination of algorithms for automatic detection of pulmonary nodules in computed
tomography images: the LUNA16 challenge, arXiv:1612.08012

[9]
Mitosis detection in breast cancer histology images with deep neural
networks. In International Conference on Medical Image Computing and
Computer-assisted Intervention (pp. 411-418). Springer Berlin Heidelberg.

[10] Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E. and Nielsen, M., 2013,

September. Deep feature learning for knee cartilage segmentation using a triplanar

convolutional neural network. In International conference on medical image computing

and computer-assisted intervention (pp. 246-253). Springer Berlin Heidelberg.

70
[11] Sharif Razavian, A., Azizpour, H., Sullivan, J. and Carlsson, S., 2014. CNN features off-

the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition Workshops (pp. 806-813).

[12] Christodoulidis, S., Anthimopoulos, M., Ebner, L., Christe, A. and Mougiakakou,

S., 2017. Multisource Transfer Learning With Convolutional Neural Networks for Lung

Pattern Analysis. IEEE journal of biomedical and health informatics, 21(1), pp.76-84.

[13] Gao, M., Xu, Z., Lu, L., Wu, A., Nogues, I., Summers, R.M. and Mollura,
D.J., 2016, April. Segmentation label propagation using deep convolutional
neural networks and dense conditional random field. In Biomedical Imaging
(ISBI), 2016 IEEE 13th International Symposium on (pp. 1265-1268). IEEE.

[14] Pan, Y., Huang, W., Lin, Z., Zhu, W., Zhou, J., Wong, J. and Ding, Z., 2015,
August. Brain tumor grading based on neural networks and convolutional
neural networks. In Engineering in Medicine and Biology Society (EMBC), 2015
37th Annual International Conference of the IEEE (pp. 699-702). IEEE.

[15] Ciompi, F., de Hoop, B., van Riel, S.J., Chung, K., Scholten, E.T.,
Oudkerk, M., de Jong, P.A., Prokop, M. and van Ginneken, B., 2015.
Automatic classification of pulmonary peri-fissural nodules in computed
tomography using an ensemble of 2D views and a convolutional neural
network out-of-the-box. Medical image analysis, 26(1), pp.195-202.

[16] Anthimopoulos, M., Christodoulidis, S., Christe, A. and Mougiakakou, S., 2014,
August. Classification of interstitial lung disease patterns using local DCT features
and random forest. In Engineering in Medicine and Biology Society (EMBC), 2014 36th
Annual International Conference of the IEEE (pp. 6040-6043). IEEE.

71
[17] Li, Q., Cai, W., Wang, X., Zhou, Y., Feng, D.D. and Chen, M., 2014,
December. Medical image classification with convolutional neural network.
In Control Automation Robotics & Vision (ICARCV), 2014 13th International
Conference on (pp. 844-848). IEEE.

[18] Nogues, I., Yao, J., Mollura, D. and Summers, R.M., Deep Convolutional
Neural Networks for Computer-Aided Detection: CNN Architectures,
Dataset Characteristics and Transfer Learning.

[19] Samala, R.K., Chan, H.P., Hadjiiski, L., Helvie, M.A., Wei, J. and Cha, K., 2016.

Mass detection in digital breast tomosynthesis: Deep convolutional neural network

with transfer learning from mammography. Medical Physics, 43(12), pp.6654-6666.

[20] Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E. and Greenspan, H., 2015,

April. Chest pathology detection using deep learning with non-medical


training. In Biomedical Imaging (ISBI), 2015 IEEE 12th International
Symposium on (pp. 294-297). IEEE.

[21] Fotin, S.V., Yin, Y., Haldankar, H., Hoffmeister, J.W. and Periaswamy, S., 2016,
March. Detection of soft tissue densities from digital breast tomosynthesis:
Comparison of conventional and deep learning approaches. In SPIE Medical
Imaging (pp. 97850X-97850X). International Society for Optics and Photonics.
[22] Kooi, T., Gubern-Merida, A., Mordang, J.J., Mann, R., Pijnappel, R.,
Schuur, K., den Heeten, A. and Karssemeijer, N., 2016, June. A comparison
between a deep convolutional neural network and radiologists for
classifying regions of interest in mammography. In International Workshop
on Digital Mammography (pp. 51-56). Springer International Publishing.

72
[23] Cha, K.H., Hadjiiski, L.M., Samala, R.K., Chan, H.P., Cohan, R.H., Caoili, E.M.,
Paramagul, C., Alva, A. and Weizer, A.Z., 2016. Bladder cancer segmentation in CT for
treatment response assessment: Application of deep-learning convolution neural

network A pilot study. Tomography: a journal for imaging research, 2(4), p.421.

[24] Shen, W., Zhou, M., Yang, F., Yang, C. and Tian, J., 2015, June. Multi-
scale convolutional neural networks for lung nodule classification. In
International Conference on Information Processing in Medical Imaging
(pp. 588-599). Springer International Publishing.

[25] Mingchen Gao, Ulas Bagci, Le Lu, Aaron Wu, Mario Buty, Hoo-Chang
Shin, Holger Roth, Georgios Z. Papadakis, Adrien Depeursinge, Ronald M.
Summers, Ziyue Xu & Daniel J. Mollura (2016): Holistic classification of CT
attenuation patterns for interstitial lung diseases via deep convolutional
neural networks, Computer Methods in Biomechanics and Biomedical
Engineering: Imaging & Visualization, DOI: 10.1080/21681163.2015.1124249

73
Students involved in the project team include:

1. Amit Kumar Singh, 424/IC/13, amitsinghrajput14@gmail.com

2. Rohan Challana, 526/IC/13, rohan.challana.10@gmail.com

3. Utkarsh Jain, 557/IC/13, utkarshj.ic@nsit.net.in

74

Potrebbero piacerti anche