Thesis Report

B.E.
PROJECT
ON
PID Controller tuning using soft computing

techniques
Submitted by
Pulak Malik (513/IC/14)
Sarthak Garg (526/IC/14)
Shashwat Bhageria (531/IC/14)
Shubham Singh (536/IC/14)
(In partial fulfillment of B.E. (Instrumentation and Control
Engineering) degree of University of Delhi
Under the Guidance of

Dr. J.R.P. Gupta and Prof. A.N. Jha
DIVISION OF INSTRUMENTATION AND CONTROL ENGINEERING

NETAJI SUBHAS INSTITUTE OF TECHNOL0GY
UNIVERSITY OF DELHI, DELHI
JUNE 2018
DEDICATION
This thesis is dedicated to Professor Andrew Ng, Director of the
Stanford Artificial Intelligence Lab, and Chairman and Co-founder of
Coursera. His passion for teaching has set a new standard for anyone
involved in education, training and development, specially in the Artificial
Intelligence give everyone in the world access to a great
education, for free
field. This thesis uses the basics from his esteemed course,
Machine Learning, from Coursera.
Amit Kumar Singh Rohan Challana Utkarsh Jain

Roll No. 424/IC/13 Roll No. 526/IC/13 Roll No. 557/IC/13
Instrumentation and Control Engineering Department

Netaji Subhas Institute of Technology (NSIT)
Azad Hind Fauj Marg
Sector-3, Dwarka, New Delhi
PIN - 110078
i
ACKNOWLEDGEMENTS
We would like to express our deepest gratitude to our advisers Dr. J.R.P.
Gupta and Prof. A.N. Jha for giving us the opportunity to work under their able
supervision. We are indebted towards the faculty of NSIT, whose guidance and
teaching for the past four years have helped us in understanding the theory and
practices of Instrumentation and Control Engineering. We also thank our parents
and families who always supported us unconditionally in our thick and thin. Their
belief in our capabilities always pushed us to give our best in this project. Without
these invaluable contributions, this project could not have been completed.
Pulak Malik Sarthak Garg Shashwat Bhageria Shubham Singh

513/IC/14 526/IC/14 531/IC/14 536/IC/14

Azad Hind Fauj Marg
PIN - 110078
ii
DECLARATION
This is to certify that the project entitled “PID controller tuning

using soft computing techniques” by Pulak Malik, Sarthak
Garg, Shashwat Bhageria and Shubham Singh is a record of
bona-fide work carried out by us, in the department of
Instrumentation and Control Engineering, Netaji Subhas
Institute of Technology, University of Delhi, New Delhi, in partial
fulfillment of requirements for the award of the degree of
Bachelor of Engineering in Instrumentation and Control
Engineering, University of Delhi in the academic year 2017-
2018. The results presented in this thesis have not been
submitted to any other university in any form for the award of
any other degree.
Pulak Malik Sarthak Garg Shashwat Bhageria Shubham Singh

513/IC/14 526/IC/14 531/IC/14
536/IC/14

Azad Hind Fauj Marg
PIN - 110078
iii
CERTIFICATE
This is to certify that the project entitled “PID controller tuning using
soft computing techniques” by Pulak Malik, Sarthak Garg, Shashwat
Bhageria and Shubham Singh is a record of bona-fide work carried
out by us, in the department of Instrumentation and Control
Engineering, Netaji Subhas Institute of Technology, University of Delhi,
New Delhi, under our supervision and guidance in partial fulfillment of
requirements for the award of the degree of Bachelor of Engineering in
Instrumentation and Control Engineering, University of Delhi in the
academic year 2017-2018.
The results presented in this thesis have not been submitted to
any other university in any form for the award of any other degree.
Dr. J.R.P Gupta Mr. A.N. Jha

(Professor Emeritus) (Associate Professor)

Azad Hind Fauj Marg
PIN - 110078
iv
CERTIFICATE
This is to certify that the project entitled “PID controller tuning using
soft computing techniques” by Pulak Malik, Sarthak Garg, Shashwat
Bhageria and Shubham Singh is a record of bona-fide work carried
out by us, in the department of Instrumentation and Control
Engineering, Netaji Subhas Institute of Technology, University of Delhi,
New Delhi, under our supervision and guidance in partial fulfillment of
requirements for the award of the degree of Bachelor of Engineering in
Instrumentation and Control Engineering, University of Delhi in the
academic year 2017-2018.
Prof. Smriti Srivastava

Head of the Department
Department of Instrumentation and Control Engineering
Azad Hind Fauj Marg
PIN - 110078
v
PLAGIARISM REPORT
vi
ABSTRACT
Artificial bee colony (ABC) algorithm has proved its importance in

solving a number of problems including engineering optimization
problems. ABC algorithm is one of the most popular and youngest
member of the family of population-based nature inspired meta-
heuristic swarm intelligence method. ABC has been proved its
superiority over some other Nature Inspired Algorithms (NIA) when
applied for both benchmark functions and real world problems. The
performance of search process of ABC depends on a random value
which tries to balance exploration and exploitation phase. In order to
increase the performance, it is required to balance the exploration of
search space and exploitation of optimal solution of the ABC. This
report outlines a new hybrid of ABC algorithm with Genetic
Algorithm. This report reviews Artificial Bee Colony (ABC) and
Genetic Algorithm (GA), both of which are two powerful meta-
heuristics. This paper explains some major defects of these two
algorithms at first then proposes a new hybrid model Experimental
results show that proposed hybrid algorithm is effective and its
performance including speed and accuracy beats other version.
vii
LIST OF TABLES
Table 6.1: Log loss Error of Neural Network, CNN and XGBOOST ..................................67
viii
LIST OF FIGURES
Figure 4.1: Image Classification of cat..............................................................................................10
Figure 4.2: Image Classification on the basis of different factors...................................11
Figure 4.3: User's Iterest with different step functions ...........................................................14
Figure 4.4: Perceptron Input and Output.........................................................................................15
Figure 4.5: Representation of a Perceptron...................................................................................16
Figure 4.6: Network of Perceptron's...................................................................................................17
Figure 4.7: Sigmoid Function..................................................................................................................18
Figure 4.8: Representation of Perceptron including hidden layer. ..................................21
Figure 4.9: A regular 3-layer Neural Network.................................................................................21
Figure 4.10: Local connectivity represetation of Perceptron ..............................................23
Figure 4.11: Representation of Perceptron body........................................................................24
Figure 4.12: Spatial Arrangement of Neurons...............................................................................25
Figure 4.13: General Pooling....................................................................................................................26
Figure 4.14: Illustration of Pooilng.......................................................................................................27
Figure 5.1: Flow Diagram of Classification ...................................................................................33
Figure 5.2:Hounsfield Unit of different matter..............................................................................35
Figure 5.3:Frequency vs. Hounsfield Unit.......................................................................................36
Figure 5.4: Lung Slice..................................................................................................................................36
Figure 5.5: Z-Axis Sweep...........................................................................................................................39
Figure 5.6: X-Axis Sweep...........................................................................................................................40
ix
Figure 5.7: Z-Axis Sweep...........................................................................................................................41
Figure 5.8: Distribution of Pixels in Image......................................................................................42
Figure 5.9:Edge Detection 1.....................................................................................................................43
Figure 5.10: Edge Detection 2.................................................................................................................43
Figure 5.11: Edge Detection 3.................................................................................................................44
Figure 5.12: 3D Plot of lung......................................................................................................................45
Figure 5.13: Original Input Slice............................................................................................................46
Figure 5.14: Internal Marked Slice........................................................................................................47
Figure 5.15:External Marked Slice........................................................................................................47
Figure 5.16: Watershed Marked Slice.................................................................................................48
Figure 5.17:Sobel Gradient.......................................................................................................................49
Figure 5.18: Watershed Image................................................................................................................49
Figure 5.19: Outline After Re-Inclusion.............................................................................................50
Figure 5.20: Lung Filter After Closing................................................................................................50
Figure 5.21: Segmentedd Lung..............................................................................................................51
Figure 5.22: 3D Segmented Lung..........................................................................................................52
Figure 5.23: 3D Lung-2.................................................................................................................................53
Figure 5.24: 3D Lung-3.................................................................................................................................54
Figure 6.1: Log Loss v/s probability for a single positive instance................................61
Figure 6.2: Root Folder................................................................................................................................62
Figure 6.3:Results with each Iteration...............................................................................................63
Figure 6.4: System Usage Stats and iterations result..............................................................64
x
Figure 6.5:XGBOOST Output...................................................................................................................65
Figure 6.6: Comparison Curve of CNN and XGBOOST...........................................................66
xi
INDEX OF EQUATIONS
Equation Caption Page
Equation 4.1 Linear Combination of features 13

Equation 4.2 Objective Function of supervised Learning
13
Equation 4.3 Mean Squared Error
13
Equation 4.4 Logistic Loss Error
13
Equation 4.5 Output corresponding to the input parameters
15
Equation 4.6 Output corresponding to the input parameters and
biases
Equation 4.7 Sigmoid Function 16

Equation 4.8 Training Loss
17
Equation 4.9 Objective function for XGBOOST
28
Equation 4.10 Additive Training Of XGBOOST
28
Equation 4.11 Objective function for XGBOOST for MSE
29
Equation 4.12 Objective function for XGBOOST in term of
constants
Equation 4.13 Final Objective function of XGBOOST 29
xii
Equation 4.14 General Tree function 29
Equation 4.15 Complexity function in XGBOOST
30
Equation 6.1 Log Loss Error
61
xiii
TABLE OF CONTENTS
CHAPTER 1: INTRODUCTION ........................................................................................ 1
CHAPTER 2: PROJECT REQUIREMENTS ..................................................................... 4

2.1 System Requirements ................................................................................................ 4
2.1.1 Operating System: ............................................................................................. 4
2.1.2 Graphics Processing Unit: ................................................................................. 4
2.2 Software Requirements.............................................................................................. 4
2.2.1 Python 2.7+: ...................................................................................................... 4
2.2.2 TensorFlow:....................................................................................................... 4
2.2.3 Anaconda:.......................................................................................................... 5
2.2.4 Python Libraries used: ....................................................................................... 5
CHAPTER 3: PROJECT APPROACH ............................................................................... 7

3.1 Topic Research and Selection:................................................................................... 7
3.2 Research:.................................................................................................................... 7
3.3 Dataset Collection:..................................................................................................... 7
3.4 Data Preparation: ....................................................................................................... 8
3.5 Training of the Algorithms: ....................................................................................... 8
CHAPTER 4: THEORETICAL BACKGROUND ............................................................. 9

4.1 Image Classification .................................................................................................. 9
4.1.1 Motivation ......................................................................................................... 9
4.1.2 Example............................................................................................................. 9
4.1.3 Challenges ....................................................................................................... 10
4.1.4 Data Driven Approach..................................................................................... 11
4.1.5 The image classification pipeline .................................................................... 12
4.2 Elements of Supervised Learning ............................................................................ 12
4.2.1 Model and Parameters ..................................................................................... 12
4.2.2 Objective Function: Training Loss + Regularization ...................................... 13
4.2.3 Neural Networks.............................................................................................. 14
4.2.3.1 PERCEPTRON...................................................................................... 14
4.2.3.2 ACTIVATION FUNCTIONS ............................................................... 17
4.2.4 Convolutional Neural Networks (CNNs) ........................................................ 20
4.2.4.1 Architecture Overview........................................................................... 20
4.2.4.2 Layers used to build convolutional neural networks ............................. 22
4.2.4.3 FULLY-CONNECTED LAYER........................................................... 28
4.2.5 MODELLING OF XGBOOST ....................................................................... 29
4.2.5.1 Tree Ensemble ....................................................................................... 29
xiv
4.2.5.2 Tree Boosting (Training of Tree)...........................................................29
4.2.5.3 Additive Training............................................................................................29
CHAPTER 5: IMPLEMENTATION DETAILS........................................................................33

5.1 Preprocessing........................................................................................................................33
5.1.1 Loading the files.........................................................................................................34
5.1.2 Resampling....................................................................................................................38
5.1.3 Edge detection and other convolutional filters:.....................................42
5.1.4 3D plotting the scan.................................................................................................45
5.1.5 Lung segmentation using water sheds........................................................46
5.1.6 3D Lung segmentation...........................................................................................51
5.1.7 Normalization...............................................................................................................55
5.1.8 Zero centering..............................................................................................................55
5.2 Running the algorithms....................................................................................................56
5.2.1 CNN Parameters.........................................................................................................56
5.2.1.1 CONVOLUTIONAL LAYERS COUNT = 2:.........................................56
5.2.1.2 NODES PER HIDDEN LAYER COUNT = 64:...................................57
5.2.1.3 STRIDE SIZE = [1,2,2,2,1]:.........................................................................57
5.2.1.4 THE NUMBER OF HIDDEN LAYERS=4:............................................57
5.2.2 XGBoost Parameters:.............................................................................................58
5.2.2.1 N_ESTIMATORS = 1500:............................................................................58
5.2.2.2 MIN_CHILD_WEIGHT = 9:.........................................................................58
5.2.2.3 MAX_DEPTH = 10:.........................................................................................58
5.2.2.4 LEARNING_RATE = 0.05:..........................................................................59
5.2.2.5 COLSAMPLE_BYTREE = 0.80:..............................................................59
5.2.2.6 SEED = 42:..........................................................................................................59
5.2.2.7 NTHREAD = 8...................................................................................................59
5.2.2.8 SUBSAMPLE = 0.80:.....................................................................................60
5.2.2.9 VERBOSE = TRUE.........................................................................................60
5.2.2.10 EARLY_STOPPING_ROUNDS = 50:.................................................60
CHAPTER 6: EVALUATION AND RESULT..........................................................................61

6.1.1 Neural Network............................................................................................................62
6.1.2 Convolutional Neural Network...........................................................................63
6.1.3 XGBoost...........................................................................................................................65
6.1.4 Comparison...................................................................................................................66
CHAPTER 7: CONCLUSION AND FUTURE........................................................................68

CHAPTER 1:
The Proportional-Integral-Derivative (PID) controller has been proved the most popular
controller of this century for its remarkable effectiveness, easiness of implementation and vast
applicability,though the tuning of PID Controller has been found to be hard over the years. The
tuning of PID controller was done using various methods which include Genetic Algorithms,
Evolutionary Programming and Particle Swarm Optimization.
All these tuning methods are done manually and are difficult as well as time consuming. For
using PID controller efficiently, the optimal tuning of the parameters of PID controller has
become a significant research area and is taken up by various control graduates. Optimization
problems have been resolved with the aid of numerous soft computing techniques which include
fuzzy logic,artificial neural network and meta heurisitic methods which are an alternative to the
traditional approches. Many algorithms like those of Evolutionary Programming were taking a
lot of time to show the desired results.
Genetic Algorithms (GAs) are a random global search method that replicates the evolution
process and was developed in the United States in the 1970 at the University of Michigan.
The Genetic Algorithm uses two major principles i.e. the Crossover of population and mutation
to get to the closest result possible.
The Genetic Algorithm has no knowledge of the correct solution and depends entirely on
responses from the environment and evolution operators such as crossover and mutation to arrive
at the best proposed solution. The algorithm avoids reaching the local minima and converges to
sub optimal solutions, by starting with many independent points and searching them in parallel.
But for very complex problems, it may converge at local minima. The time consumed by this
optimization algorithm is also high since it involves so many parameters. Also, the genetic
algorithm is sensitive to the initial population used while we want a wide diversity of easily
feasible solution.
The biggest limitation of Genetic Algorithm is that it cannot guarantee an optimal solution. The
solution quality also degrades with the rise of problem size. The convergence rate is slow in GA.
However it can generate good quality solutions for any given problem and function type.
The quality of the results depends highly on:
 The population size

 The genetic operators (crossover, selection, mutation) and whether they are well-groomed
for the problem you're solving.

 The probabilities of crossover and mutation.
ARTIFICIAL BEE COLONY (ABC) METHOD
The natural behaviour of bees and their collective activities in their hives has been fascinating
researchers all over the world for decades. Recently, the researchers have been focusing on
Swarm Intelligence followed by developing Swarm Optimization Methods which have extended
their knowledge to animal societies especially about insect colonies. Ants, termites, bees, firefly
and wasps are the most important social insects inspiring efficient problem solving algorithms.
In 2005, Karaboga presented Artificial Bee Colony (ABC) algorithm to optimize numeric
benchmark functions. It was then extended by Karaboga and Basturk and showed to outperform
other recognized heuristic methods such as GA [18] as well as DE, PSO and ACO. In addition, it
has been successfully applied on constrained optimization problems, neural networks.
ABC, which is similar to other nature-based algorithms, models honey bees lifecycle but not
precisely. In this model, the honey bees are categorized as employed, onlooker and scout. An
employed bee is a forager associated with a certain food source which she is currently exploiting.
She remembers the character of the food source and then after returning to the hive, shares it
with other bees waiting there via a peculiar communication called waggle dance. An onlooker
bee is an unemployed bee at the hive which tries to find a new food source using the information
provided by employed bees. While ignoring the other’s information, a scout searches around the
hive randomly. In nature, the recruitment of unemployed bees happens in a nearly similar way. In
addition, when the quality of a food source is below a certain level, it will be abandoned to make
the bees explore for new food sources. In ABC algorithm, the solutions are modelled as food
sources and their corresponding objective functions/fitness function as the quality (nectar
amount) of the food source.
Although the exploration of solution of ABC algorithm is good, exploitation to found food
sources is very bad. . And it falls into local optimum because of premature and improved
convergence rate, as a result of which there was a need of more optimised and improved
technique.
As we know that many practical control problems faced by control engineers are of higher order with time delay.
The Metaheuristic algorithm discussed above are generally works good for lesser order problem but as the order of
problem increases these algorithms converges to local maxima and many times they even don’t give satisfactory
result.
Genetic algorithm being very robust have high convergence rate for lower order problems but for higher order
practical problems convergence rate reduces and they provide local maxima as optimal solution.
Artificial Bee Colony method being flexible, easy to implement and good exploration of the solution
but exploitation to found food sources is very bad and falls into local optimum solution as a result of premature.
Advantages of Genetically mutated bee colony optimization technique is it’s-

1- Simplicity, Flexibility and Robustness
2- Ability to overcome local maxima problem faced by Ga
3- Easy to converge for complex problem
4- Ability to handle multidimensional Cost Function
5- Works even with Larger Population Size
6- Converges to optimal solution within few iterations.
7- Can be applied to problems of various practical domain
8- Less dependent on user input variables
9- Applied to higher order practical systems like 3 tank system.
10- It guarantees optimality.
CHAPTER 2: Controller and its type
For the past few years, control system has assumed an increasingly important role in the
development and advancement of modern civilization and technology. Practically every
aspect of our day-to-day activities is affected by some type of control systems. Automatic
control system is found in majority in all sectors of industry, such as quality control of
manufactured products, automatic assembly line, machine-tool control, space technology
and weapon system, computer control, transportation systems, power systems, robotics
and many others. It is essential in such industrial operations as controlling pressure,
temperature, humidity, and flow in the process industries.
Recent application of modern control theory includes such non-engineering systems as

biological, biomedical, control of inventory, economic and socioeconomic systems.
The basic ingredients of a control system can be described by:

 Objectives of control.
 Control system components.
 Results or output.
Fig2.1
Automatic Controllers:-
An automatic controller is used to compare the actual value of plant result with
reference command, determines the difference, and produces a control signal that
will reduce this difference to a negligible value. The manner in which the
automatic controller produces such a control signal is called the control action.
An industrial control system comprises of an automatic controller, an actuator, a
plant, and a sensor (measuring element). The controller detects the actuating
error command, which is usually at a very low power level, and amplifies it to a
very high level. The output of the automatic controller is fed to an actuator, such
as a hydraulic motor, an electric motor or a pneumatic motor or valve (or any
other sources of energy). The actuator is a power device that produces input to the
plant according to the control signal so that the output signal will point to the
reference input signal.
The sensor or the measuring element is a device that converts the output variable
into another optimum variable, such as a displacement, pressure or voltage, that
can be used to compare the output to the reference input command. This element
is in a feedback path of the closed loop system. The set point controller must be
converted to reference input with the same unit as the feedback signal from the
sensor element.
Classification of Industrial controllers: -
Industrial controllers may be classified according to their control action as:

 Two-position or on-off controllers
 Proportional controllers
 Integral controllers
 Proportional-plus-integral controllers
 Proportional-plus-derivative controllers
 Proportional-plus-integral-plus-derivative controllers
Type of controller to use must be decided depending upon the nature of the plant and the
operating condition, including such consideration as safety, cost, availability, reliability,
accuracy, weight and size.
Two-position or on-off controllers: -
In a two-position control system, the actuating part has only two fixed positions,
which are, in many simple cases, simply on and off. Due to its simplicity and
inexpensiveness, it is being very widely used in both industrial and domestic control
system.
Let the output signal from the controller be u(t) and the actuating error signal be
e(t). Then mathematically,
u(t) = U1, for e(t) > 0
= U2, for e(t) < 0
Where,
U1 and U2 are constants and the minimum value of U2 is usually either
zero or - U1.
1.1 Proportional Control :-

A proportional control system is a type of linear feedback control system. Proportional
control is how most drivers control the speed of a car. If the car is at target speed and the
speed increases slightly, the power is reduced slightly, or in proportion to the error (the
actual versus target speed), so that the car reduces speed gradually and reaches the target
point with very little, if any, "overshoot", so the result is much smoother control than on-
off control [5].
In the proportional control algorithm, the controller output is proportional to the error
signal, which is the difference between the set point and the process variable. In other
words, the output of a proportional controller is the multiplication product of the error
signal and the proportional gain. This can be mathematically expressed as
Pout = Kp e(t)
Where
Pout: Output of the proportional controller
Kp: Proportional gain
e(t): Instantaneous process error at time 't'. e(t) = SP − PV

SP: Set point
PV: Process variable
With increase in Kp :
· Response speed of the system increases.
 Overshoot of the closed-loop system increases.

 Steady-state error decreases.
But with high Kp value, closed-loop system becomes unstable.
1.2 Integral Control: -
In a proportional control of a plant whose transfer function doesn’t possess an integrator

1/s, there is a steady-state error, or offset, in the response to a step input. Such an offset
can be eliminated if integral controller is included in the system.
In the integral control of a plant, the control signal, the output signal from the controller,
at any instant is the area under the actuating error signal curve up to that instant. But
while removing the steady-state error, it may lead to oscillatory response of slowly
decreasing amplitude or even increasing amplitude, both of which is usually undesirable
[5].
1.3 Proportional-plus-integral controllers: -
In control engineering a PI Controller (proportional-integral controller) is a feedback

controller which drives the plant to be controlled by a weighted sum of the error
(difference between the output and desired set point) and the integral of that value. It is a
special case of the PID controller in which the derivative (D) part of the error is not used.
The PI controller is mathematically denoted as:
Gc (s) = Kp + Ki / s
Fig.1.2 (courtesy-[5])
Integral control action added to the proportional controller converts the original system
into high order. Hence the control system may become unstable for a large value of K p
since roots of the characteristic eqn. may have positive real part. In this control,
proportional control action tends to stabilize the system, while the integral control action
tends to eliminate or reduce steady-state error in response to various inputs. As the value
of Ti is increased,
 Overshoot tends to be smaller
 Speed of the response tends to be slower.
1.4 Proportional-plus-derivative controllers: -

Proportional-Derivative or PD control combines proportional control and derivative
control in parallel. Derivative action acts on the derivative or rate of change of the control
error. This provides a fast response, as opposed to the integral action, but cannot
accommodate constant errors (i.e. the derivative of a constant, nonzero error is 0).
Derivatives have a phase of +90 degrees leading to an anticipatory or predictive response.
However, derivative control will produce large control signals in response to high
frequency control errors such as set point changes (step command) and measurement
noise [5].
In order to use derivative control, the transfer functions must be proper. This often
requires a pole to be added to the controller.
Gpd(s) = Kp + Kds or
= Kp(1+Tds)
With the increase of Td
 Overshoot tends to be smaller
 Slower rise time but similar settling time.

1.5 Proportional-plus-integral-plus-derivative controllers: -
The PID controller was first placed on the market in 1939 and has remained the most widely
used controller in process control until today. An investigation performed in 1989 in Japan
indicated that more than 90% of the controllers used in process industries are PID controllers
and advanced versions of the PID controller. PI controllers are fairly common, since
derivative action is sensitive to measurement noise
“PID control” is the method of feedback control that uses the PID controller as the main tool.
The basic structure of conventional feedback control systems is shown in Figure below, using
a block diagram representation. In this figure, the process is the object to be controlled. The
purpose of control is to make the process variable y follow the set-point value r. To achieve
this purpose, the manipulated variable u is changed at the command of the controller. As an
example of processes, consider a heating tank in which some liquid is heated to a desired
temperature by burning fuel gas. The process variable y is the temperature of the liquid, and
the manipulated variable u is the flow of the fuel gas. The “disturbance” is any factor, other
than the manipulated variable, that influences the process variable. Figure below assumes that
only one disturbance is added to the manipulated variable. In some applications, however, a
major disturbance enters the process in a different way, or plural disturbances need to be
considered. The error e is defined by e = r – y. The compensator C(s) is the computational
rule that determines the manipulated variable u based on its input data, which is the error e in
the case of Figure. The last thing to notice about the Figure is that the process variable y is
assumed to be measured by the detector, which is not shown explicitly here, with sufficient
accuracy instantaneously that the input to the controller can be regarded as being exactly
equal to y.
Fig. 1.3(courtesy-[5])
When used in this manner, the three elements of PID produces outputs with the
following nature:
 P element: proportional to the error at the instant t, this is the “present” error.
 I element: proportional to the integral of the error up to the instant t, which can
 be interpreted as the accumulation of the “past” error.
 D element: proportional to the derivative of the error at the instant t, which can be
interpreted as the prediction of the “future” error.
Thus, the PID controller can be understood as a controller that takes the present, the past, and
the future of the error into consideration. The transfer function G c(s) of the PID controller is :
G (s) = K (1 + Ti / s + Td s)
= Kp + Ki / s + K d s
1.6 Application: -
In the early history of automatic process control the PID controller was implemented as a
mechanical device. These mechanical controllers used a lever, spring and a mass and
were often energized by compressed air. These pneumatic controllers were once the
industry standard [5].
Electronic analog controllers can be made from a solid-state or tube amplifier, capacitor
and a resistance. Electronic analog PID control loops were often found within more
complex electronic systems, for example, the head positioning of a disk drive, the power
conditioning of a power supply, or even the movement-detection circuit of a modern
seismometer. Nowadays, electronic controllers have largely been replaced by digital
controllers implemented with microcontrollers or FPGAs.
Most modern PID controllers in industry are implemented in programmable logic

controllers (PLCs) or as a panel-mounted digital controller. Software implementations
have the advantages that they are relatively cheap and are flexible with respect to the
implementation of the PID algorithm [5].
Fig.1.4 Close-loop step response.

Chapter 3 TUNING OF PID CONTROLLER
3.1 Basic Introduction: -

“Tuning” is the engineering work to adjust the parameters of the controller so that the
control system exhibits desired property. Currently, more than half of the controllers
used in industry are PID controllers [5]. In the past, many of these controllers were
analog; however, many of today's controllers use digital signals and computers. When
a mathematical model of a system is available, the parameters of the controller can be
explicitly determined. However, when a mathematical model is unavailable, the
parameters must be determined experimentally. Controller tuning is the process of
determining the controller parameters which produce the desired output. Controller
tuning allows for optimization of a process and minimizes the error between the
variable of the process and its set point [5].
Types of controller tuning methods include the trial and error method, and process
reaction curve methods. The most common classical controller tuning methods are the
Ziegler-Nichols and Cohen-Coon methods. These methods are often used when the
mathematical model of the system is not available. The Ziegler-Nichols method can
be used for both closed and open loop systems, while Cohen-Coon is typically used
for open loop systems. A closed-loop control system is a system which uses feedback
control. In an open-loop system, the output is not compared to the input [5].
The equation below shows the PID controller: -.
1 t de(t)
u(t) = K
p
[ e(t) +
T  e(t') dt' + T (
d dt
)]+b
i 0
Where,
u is the control signal.
e is the difference between the current value and the set point.
Kc is the gain for a proportional controller.
Ti is the parameter that scales the integral controller.
Td is the parameter that scales the derivative controller.
t is the time taken for error measurement.
b is the set point value of the signal, also known as bias or offset.
3.2 Ziegler-Nichols Rules for tuning PID Controller:-
It has been observed that step responses of many processes to which PID controllers are applied
have monotonically increasing characteristics as shown in Figures a and b, so most traditional
design methods for PID controllers have been developed implicitly assuming this property.
However, there exist some processes that exhibit oscillatory responses to step inputs.
Two tuning methods were proposed by Ziegler and Nichols in 1942 and have been widely
utilized either in the original form or in modified forms. One of them, referred to as Ziegler–
Nichols‟ ultimate sensitivity method, is to determine the parameters as given in Table 1 using the
data Kcr and Tcr obtained from the ultimate sensitivity test. The other, referred to as Ziegler–
Nichols‟ step response method, is to assume the model FOPDT and to determine the parameters
of the PID controller as given in Table 2 using the parameters R and L of FOPDT which are
determined from the step response test.
Type of controller Kp Ti Td
P 0.5Kcr  0
PI 0.45Kcr 0.833Tcr 0
PID 0.6Kcr 0.5Tcr 0.125Tcr
Fig.3.1 Ziegler-Nichols ultimate sensitivity test [17].
Type of controller Kp Ti Td
P 1/RL  0
PI 0.9/RL L/0.3 0
PID 1.2/RL 2L 0.5L
Fig.3.2 Ziegler-Nichols step response method (RL0) [17].

Frequency-domain stability analysis tells that the above way of applying the Ziegler–Nichols‟
step response method to processes with self-regulation tends to set the parameters on the safe
side, in the sense that the actual gain and phase margins become larger than the values expected
in the case of integrating processes.
.These methods to determine PID parameter using empirical formula, as well as several other
tuning methods developed on the same principle, are often referred to as “classical” tuning
methods. Some of the other classical tuning methods are, Chien–Hrones–Reswick formula,
Cohen–Coon formula, refined Ziegler–Nichols tuning, Wang–Juang–Chan formula.
Disadvantage :-
The classical tuning methods explained above have the following features:
• The process is assumed, implicitly (in the case of Ziegler–Nichols‟ ultimate sensitivity
method) or explicitly (in the case of Ziegler–Nichols‟ step response method), to be
modelled by the simple transfer function.
• The optimal values of the PID parameters are given by formulae of the process parameters that
are determined directly and uniquely from experimental data.
The first feature is a weakness of these classical methods, in the sense that the applicable
processes are limited, or in other words that the claimed “optimal” values are not necessarily, and
are sometimes fairly far from, the true optimal in practical situations where the transfer function
is nothing but an approximation of the real process characteristics. Specifically, the problem is
serious when the pure delay L of the process is very short or very long, where “very short” and
“very long” roughly means outside the range 0.05≤L/T≤1.0 [17]. It can be interpreted as a
weakness in the sense that there is no room to improve the results by making use of more
detailed information about the process which is obtainable from theoretical study and accurate
measurement.
Many attempts have been made to make up for these weaknesses of the classical methods. Many
theoretical considerations have been used to develop sophisticated methods that use, as the basis
of tuning, the shape of the frequency response of the return ratio, poles (and zeros) of the closed-
loop transfer function, time-domain performance indices such as ISE, or frequency-domain
performance indices.
3.3 Cohen-Coon Method (C-C) for tuning of PID Controller:-

There are several ways to determine what values to used for the proportional,
integral, and differential parameters in the controller, and used the Cohen-Coon method is
one of the method . By looking at the system’s response to manual step changes without
the controller operating, initial values for the PID parameters and then tune them
manually are determine
.
The system’s response is modeled to a step change as a first order response plus
dead time, using the Cohen-Coon method. From this response, three parameters: K, τ, and
are founded. K is the output steady state divided by the input step change, τ is the
effective time constant of the first order response, and is the dead time
It is observed that the response of most of the processes under step change in input yields a
sigmoidal shape
Fig: Process Reaction Curve for Cohen Coon Method
Such sigmoidal shape can be adequately approximated by the response of a first order process with dead time.
(IV.70)
From the approximate response it is easy to estimate the parameters. The controllers are designed
as given in Table IV.5.
Table IV.5: Controller settings using Cohen-Coon design method
Soft Computing techniques for tuning of PID Controller:-
 Artificial Neural Network
Adaptive nature of Artificial Neural Network controllers have made them a major area of interest among
researchers in widespread fields[7-13], mainly because ANN controllers can efficiently learn the unknown or
continuously varying environment and act accordingly. Industrial automation applications prefer PID
(proportional Integral Derivative) controllers because of its simple structure and robustness etc.
A Neural Network tuned PID (NNPID) which has two inputs, one output and three layers which are input
layer, hidden layer and output layer. The input layer has two neurons and the output layer has one and their
neurons are P-neurons. The hidden layer has three neurons and they are P-neuron (H1), I-neuron (H2) and D-
neuron (H3) respectively. The NNPID is shown in Fig.2 In NNPID when suitable connective weights are
chosen, a NNPID becomes a conventional PID controller.
A well known continuous PID controller is described using
where u is the controller output, KP is the proportional gain, KI is the integral time, KD is the derivative time,
and e is the error between the set point and the process output. For a digital control of ts sampling periods,
we can write
The figure shows the block diagram of the approach followed

w1hj= +1, w2hj= -1, w1ho =KP, w2ho =KI, w3ho =KD
then,
H1i=w1h1I1+ w2h1I2
H2i=w1h2I1+ w2h2I2
H3i=w1h3I1+ w2h3I2
H1i, H2i and H3i are input part of hidden layer nodes. The output of the hidden layer nodes are:
H1o= Terr
H2o=∫ Terr dt
H3o=d Terr /dt then,
O1i= w1hoH1o + w2hoH2o + w3hoH3o
Then the plant with NNPID and feedback is
Back-propagation algorithms: In the present control system, the aim of the NNPID algorithms is to tune the
PID parameters in such a way that the mean square error (MSE) is minimum which is given by
The weights of NNPID are changed by Steepest descent in on-line training process. The details of the weight
adjustments used are as given in reference [5]. The training of the neural network has been done by varying
the PID parameters and taking the sample online. The NNPID was trained with a total of 50 sets of PID
parameters each having 360 data points
 Fuzzy Logic
 Metaheuristics
Chapter 4 Genetic Algorithm
GENETIC ALGORITHM
Genetic Algorithm is a class of computational models inspired by evolution. The algorithms
encrypt a potential solution to a particular problem on a given generic chromosome-like data
structure and apply recombination operators of these structures as to preserve crucial
information. Genetic Algorithms is often seen as function optimizer, although the range of
problems to which the GA have been applied are quite wide.
The implementation of genetic algorithm starts with a population of (typically random)
chromosomes. One then evaluates these structures and allocated reproductive opportunities in
such a way that these chromosomes which will potray a better solution to the target problem are
given more chances to 'reproduce than those chromosomes which are giving poorer solutions.
The 'correctness' of a solution is characterized with respect to the current population.
Evolutionary Cycle
Chromosome:- a set of genes; a chromosome contains the solution in form of genes.
Gene:- a part of chromosome ;a gene contains a part of the solution.
Population: number of individuals present with the same length of chromosome. Fitness: the
value assigned to an individual based on how far or close an individual is from the solution;
greater the fitness value better the solution it contains.
Fitness Function: a function that assigns value to the individual it is problem specific.
Crossover :-Taking two fit individuals and then intermingling their chromosome to create two
new individuals.
Mutation: changing a random gene in an individual.
Selection :- Selecting individuals for creating the next generation.
6.2Working of Genetic Algorithm
6.2.1 Encoding: In order to use GA to solve the problems variables(xl,x2....xn) are first encoded
to strings. Binary-coded strings having 1's and O's are mostly used. The length of the string is
usually determined according to the desired solution accuracy. For example, if four bits are used
to code each variable in a two-variable optimization problem, the strings (0000 0000) and (1111
1111) would represents the points (x11, x21) T (xlu, x2u) T respectively, because the substrings
(0000) and
(1111) have the minimum and maximum decoded values, any other eight bit string can be found
to represent a point in the search space according to a mapping rule.
6.2.2 GA Operators:
the operation of GAs begins with a population of a random strings representing design or
decision variables. The population is then operated by three main operators; reproduction,
crossover and mutation to create a new population of points. GAs can be viewed as trying to
maximize the fitness function, by evaluating several solution vectors. The purpose of these
operators is to create new solution vectors by selection, combination or alteration of the current
solution vectors that have shown to be good temporary solutions. The new population is further
evaluated and tested till termination. If the termination criterion is not met, the population is
iteratively operated by the above three operators and evaluated. This procedure is continued until
the termination criterion is met. One cycle of these operations and d the operators subsequent
evaluation procedure is known as a generation in GAs terminology. The are described in the
following steps
6.2.3 Reproduction: Reproduction (or selection) is an operator that makes more copies of better
strings in a new population. Reproduction is usually the first operator applied on a population.
Reproduction selects good strings in a population and forms a mating pool. This is one of the
reason for the reproduction operation to be sometimes known as the selection operator. Thus, in
reproduction operation the process of natural selection cause those individuals that encode To
sustain the generation of a new successful structures to produce copies more frequently.
Population, the reproduction of the individuals in the current population is necessary. For better
individuals, these should be from the fittest individuals of the previous population. There exist a
number of reproduction operators in GA literature, but the essential idea in all of them is that the
obese average strings are picked from the current population and their multiple copies are
inserted in the mating pool in a probabilistic manner.
6.2.4 Crossover
A crossover operator is used to recombine two strings to get a better string. In crossover
operation. Recombination process creates different individuals in the successive generations by
combining material from two individuals of the previous generation. In reproduction, good
strings in a population are probabilistic-ally assigned a larger number of copies and a mating
pool is formed. It is important to note that no new strings are formed in the reproduction phase.
In the crossover operator, new strings are created by exchanging information among strings of
the mating pool.
The two strings participating in the crossover operation are known as parent strings and the
resulting strings are known as children strings. It is intuitive from this construction that good
sub-strings from parent strings can be combined to form a better child string, if an appropriate
site is chosen. With a random site. the children strings produced may or may not have a
combination of good sub-strings from parent strings, depending on whether or not the crossing
site falls in the appropriate place. But this is not a matter of serious concern. because if good
strings are created by crossover. there st ill be more copies of them in the next mating pool
generated by crossover. It is clear from this discussion that the effect of cross over may he
detrimental or beneficial. Thus. in order to preserve some of the good strings that are already
present in the mating pool, all strings in the mating pool are not used in crossover. A-hen a
crossover probability. Defined here as Pc is used. only 100(1-Pc) per cent strings in -Pc) per cent
of the population the population are used in the crossover operation and 100(1 re • ins as they are
in the current population. A crossover operator is mainly responsible for the search of new string
even though mutation operator is also used for this purpose sparingly.
String 1 |011|01100| String 1 |011|11001|
String 2 |110|11001| String 2 |011|01100|
Before crossover After crossover
Fig:1
6.2.5 Mutation:
Mutation adds new information in a random way to the genetic search process and ultimately
helps to avoid getting trapped at local optima. It is an operator that introduces diversity in the
population whenever the population tends to become homogeneous due to repeated use of
reproduction and crossover operators. Mutation may cause the chromosomes of individuals to be
different from those of their parent individuals.
Mutation in a way is the process of randomly disturbing genetic information. They operate at the
hit level; when the bits are being copied from the current string to the new string, them is
probability that each bit may become mutated. This probability is usually a quite small value.
called as mutation probability Pm. A coin toss mechanism is employed; if random number
between zero and one is less than the mutation probability, then the bit is inverted, so that zero
becomes one and one becomes zero. This helps in introducing a bit of diversity to the population
by scattering the occasional points. This random scattering would result in a better optima, or
even modify a part of genetic code that will be beneficial in later operations. On the other hand,
it might produce a weak individual that will never be selected for further operations.
The need for mutation is to create a point in the neighbourhood of the current 1,4tint, theft+)
achieving a local search around the current solution. The mutation is also used to maintain
diversity in the population. For example, the Wowing population having four eight hit strings
may be considered:
Original Off Spring 1 : - 1 0 1 0 1 1 1 1 0 0 0
Original off Spring 2 :- 1 1 1 0 1 1 0 0 1 1 1
Mutated Off spring 1 :- 1 1 1 0 1 1 1 1 0 0 0
Mutated off spring 2 :- 1 1 1 0 1 1 0 0 1 0 1

CHAPTER 4:Artificial Bee Colony
Swarm Intelligence employs the collective behaviors in the animal societies to design algorithms. Agents interact
locally with each other and the environment and follow certain rules so as to arrive to the most optimum solution.
Eric Bonabeau defined Swarm intelligence as
“any attempt to design algorithms or distributed problem-solving devices inspired by the collective behaviour of
social insect colonies and other animal societies”
A Swarm is a configuration of tens of thousands of individuals that have chosen their own will to converge on a
common goal.
Swarm of bees
Two fundamental concepts that are necessary to obtain swarm intelligent behavior: -
• Self-organization : can be defined as a set of dynamic individuals that work in ally to achieve a common
goal under a set of rules. The rules ensure that the interactions are executed on the basis of purely local
information without any relation to the global pattern.
• Division of Labor: In swarm behavior various tasks are performed simultaneously by specialized individuals
which is specified to as the division of labor. It enables swarm to respond to a changed condition in the
search space specified for them.
Artificial Bee Colony –
ABC has been developed based on the behaviors of real bees on finding nectar and sharing the
information of food sources to the bees in the hive.
Three essential components of this process:
• Food Sources: The value of a food source depends upon its proximity to the hive, its concentration of its
energy and the ease of extracting this energy.
Employed Foragers: They are associated with a particular food source which they are currently exploiting
or are “employed” at. They carry with them information about this particular source, its distance and
direction from the nest, the profitability of the source and share this information with a certain probability
• Unemployed Foragers: They are continually at look out for a food source to exploit. There are two types of
unemployed foragers: scouts, searching the environment surrounding the nest for new food sources and
onlookers waiting in the nest and establishing a food source through the information shared by employed
foragers.
Key Agents in ABC:

The Employed Bee:
It stays on a food source and provides the neighborhood of the source in its memory.
The Onlooker Bee:

It gets the information of food sources from the employed bees in the hive and select one of the food
source to gathers the nectar.
The Scout:
It is responsible for finding new food, the new nectar, sources.
Exchange of Information among bees:

It is the most important occurrence in the formation of collective knowledge. Dancing Area is the most important
part of the hive with respect to exchanging information. Here communication among bees related to the food
sources quality takes place. This dance is called a Waggle dance. The employed foragers share the information
which they obtain with a probability proportional to the profitability of the food source, the higher the better and
the sharing of this information through waggle dancing takes a longer duration. The onlookers present on the
dance floor watches others dance and decides to employ herself at the most profitable source. Now we can see
that there is a greater probability of onlookers choosing more profitable sources since more information is
accumulated about the higher profitable sources.
Procedures of ABC:
1. Initialize the scouts (Move the scouts).
2. Deploy the onlookers.
3. Move the scouts only if the counters of the employed bees hit the limit.
4. Update the newly acquired memory
5. Keep hold of the terminational condition
Explanation
 Each cycle of search consists of three steps:
1. Moving the employed and onlooker bees onto the food sources
2. Calculating their nectar amounts
3. Determining the scout bees and directing them onto possible food sources.
 A food source position is a possible optimized solution.

 The amount of nectar of a food source corresponds to the quality of the solution. It holds a directly
proportional relationship.
 Using a probability-based selection process, onlookers are placed on the food sources.
Probability of Selecting a nectar source

P= F/SUM(F);
P: The probability of selecting the employed bee
F: The fitness value
 With the increase of food source in nectar, the probability value with which the food source is preferred by
onlookers increases, too.
 The scouts are characterized by low search costs and a low average in food source quality. One bee is
selected as the scout bee.
 The selection is controlled by a control parameter called "limit".
 If a solution representing a food source is not improved by a predetermined number of trials, then that
food source is abandoned and the employed bee is converted to a scout.
Control Parameters of ABC Algorithm

1. Swarm size / Bees Population
2. Limit
3. number of onlookers: 50% of the swarm
4. number of employed bees: 50% of the swarm
5. number of scouts: 1
CHAPTER 5:Genetically mutated Bee Colony
Figure 4.1: Image Classification of cat
The task performed by Image Classification here is to predict a single category
(or a distribution over categories) for a given input (image). 3-dimensional arrays of
integers ranging from 0 to 255, is what we call Images. This is in the Width x Height x
3 format. The 3 in the previous statement represents the three-color channels RGB.
4.1.3 Challenges
The detection of visual notion (e.g. cat) is comparatively minor for a
human to complete, but what is worth considering are the difficulties faced
from the viewpoint of a Computer Vision algorithm. Here we have
showcased the (an in-exhaustive) list of obstacles below and we know that
the raw representation of pictures as a 3-D array of brightness values:
Perspective variation. Orientation of any single instance of an object can be
10
varied by different methods with respect to the camera.
Scale Modification. Visual images often show change in their size
(size in the real world.)
Occlusion. The concerned objects may not have been blocked.
Occasionally, only a small chunk of an object may be in the vicinity.
Illumination conditions. The fallout of brightness are radical on the
picture element level.
Background clutter. The concerned objects may brew into their
surroundings, making their identification difficult.
Intra-class variation. The concerned classes can often be relatively
expansive, (ex., a chair). There are many disparate objects, each with
their own actualization, but essentially the same.
A sound image classification model must be free from the cross product of all these
changes, while consequently accommodating observance to the inter-class discords.
Figure 4.2: Image Classification on the basis of different factors
4.1.4 Data Driven Approach

Our main task is to write an algorithm that can classify different specimen images
into distinct brackets. Therefore, instead of trying to explain and code what each one of
the branches of interest look like directly, we provide the system with diverse and huge
11
data of each type and then develop self- learning algorithms that look at these examples
and learn about the visual appearance of each category. Since this method involves
accumulation of training dataset of labeled images, hence it is called data-driven method.
4.1.5 The image classification pipeline

The complete image classification pipeline can be validated as follow:
Input: The input provided to the system comprises of N images, each of

them labeled with the K distinct branches. This data is our training set.
Learning: The task here is to utilize the exercised set to learn what each class
highlights. This step is termed as training the classifier, or learning the model.
Evaluation: Lastly, we assess the standard of the classifier predicting different
labels for a distinct and fresh set of photos. Comparison of the true labels of
the data set to the ones predicted by the classifier is performed.
4.2 Elements of Supervised Learning

Neural Networks, CNN, XGBoost are employed for supervised learning problems.
Here, we would be utilizing the multi-featured training data set xi to predict the target
output yi.
4.2.1 Model and Parameters

The model used in supervised learning is the mathematical composition of the
prediction process to output yi when given xi. For example, a general representation is a
linear model, where the predicted result is shown as
, [4.1]
12
A linear composition of weighted input attributes. Different interpretations
can be made from the predicted values which are also dependent on the
task, i.e., regression or classification. Mainly, learning from the data is
required to extract the undetermined parameters.
4.2.2 Objective Function: Training Loss + Regularization

Different interpretations of yi gives us different problems, some of them being
ordering, regression, classification, etc. Our objective is to determine the best
parameters from the given training dataset. For this, we define an objective function,
to compute the performance of the model given a certain set of specifications.
An important aspect about objective functions is that they comprise of two parts:
Training loss and regularization.
[4.2] where L is the training loss function, and is
the regularization term.
The training loss calculates the accuracy of prediction computed by
the model after employed on the training data. A commonly used training
loss is mean squared error.
[4.3]
Logistic loss for logistic regression is also a form of a loss represented below:
[4.4]
13
Overfitting is minimized by controlling the complexity of the model
by the regularization term.
Figure 4.3 s
The answer is marked in red. The need of the hours is that we want
to structure both a basic and probabilistic model.
4.2.3 Neural Networks

4.2.3.1 PERCEPTRON
A perceptron takes several binary inputs and produces single binary output.
14
Figure 4.4: Perceptron Input and Output
From the above figure, we can see that here x1, x2, x3 are binary inputs
and after passing through a perceptron we get an output (0 or 1) based on
certain calculations involving the weights assigned to each of the inputs.
Alteration of weights and threshold gives us the freedom to design
different models to arrive at a decision, making parameter changes in
order to produce a desired output.
Output= [4.5]
From the above equation, we can see that the output is dependent on the
threshold. The conditions mentioned above involving threshold is cumbersome. So,
in order to overcome this drawback, two notational variations are made to simplify it.
The first one is written in dot product form where w and x are vectors. These
vectors have components which is basically the weights and inputs, respectively.
15
The next variation is to eliminate the threshold by moving it to the other side of
the inequality. This is further replaced by perceptron bias, Now
after making these two changes, the equations modify into the following format:
Output= [4.6]
Assigning of weights and thus arriving at the mathematical model of a neuron is done as:
Figure 4.5: Representation of a Perceptron
Thus, a complex network of perceptrons can give us subtle
decisions which could help us in our classification problem.
16
Figure 4.6: Network of P
In the structure, first column of perceptrons are called first layer of
perceptrons - this is to make three decisions. This is done by weighing the input
evidence. A perceptron in the second layer is needed to make decisions at a more
fundamental and theoretical level than perceptrons present in the first layer. Similarly,
a multi-layered network of perceptrons indulge in more complex decision making.
The steps involved here are: neurons performing dot product with the input itself
and the weights assigned, then addition of the bias takes place. Lastly application of the
the non-linearity (or activation function), in this case, the sigmoid is done:
(x)=1/(1+e^( x)) [4.7]
4.2.3.2 ACTIVATION FUNCTIONS
A single number is taken by any activation function (or non-linearity)
and a certain fixed mathematical exercise is performed on it. The different
activation functions we come across in practice are:
17
Figure 4.7: Sigmoid Function
Sigmoid non-linearity squashes real numbers to range between [0, 1]
4.2.3.2.1 SIGMOID
The mathematical form of sigmoid non-linearity is represented by eq[4.7].
As mentioned previously, it inputs a real-valued number and compresses it into
range between 0 and 1. Particularly, large negative numbers are approximated to 0
and positive 1. Frequently used as the firing rate of a neuron,
the interpretation involved is: No firing (0) to fully-saturated firing at supreme
frequency (1). The sigmoid non-linearity has two major drawbacks:
Sigmoids saturate and kill gradients. The gradient at few regions is approximately
0 and this is a very unwanted property of the sigmoid neuron. This is due to the reason
at the time of backpropagation. Hence, if the value of the local gradient is negligible, it
18
will diminish the gradient and hence no sign and will pass through the neuron to
its weights and vice versa to the data. Moreover, cautious approach should be
employed during the process of initialization of weights of sigmoid neurons.
Sigmoid outputs are not zero-centered. This issue is less severe and has relatively
easy going consequences compared to the saturated activation problem. The neurons in
subsequent layers during processing in a Neural Network receive non zero-centered data.
This leads to introduction of detrimental zig-zag dynamics in the gradient updates for the
weights. However, upon addition of these gradients up across the batch of data the final
updated weights have variable signs, thereby overcoming this drawback.
4.2.3.2.1.1 RELU
The Rectified Linear Unit calculates the function by maximizing x.
Alternatively, the activation function is basically threshold at zero. The
different advantages and disadvantages of using ReLU are:
(+) Observations were made that acceleration provided to convergence
of stochastic gradient descent compared to the sigmoid/tanh functions.
(+) ReLU can be executed by thresholding a matrix of activations at
zero in comparison to sigmoid neurons involving expensive
procedures (exponentials etc.)
(-) Contrastingly, ReLU units can be delicate while training and can effectively
19
4.2.4 Convolutional Neural Networks (CNNs)
Convolutional Neural Networks are made up of neurons that have learnable
weights and biases. Receiving of some inputs and performing a dot product with
weights, is one of the initial task done by neurons in CNN. Mostly it follows a non-
linearity. Overall whole network results a single differentiable score function: where
the raw image pixels are on one end and class scores at the other end. However CNN
are somehow different from the normal neural network. One assumption has been
considered that the input are most of the times are images, which leads to some
properties while designing its architecture. It results to the reduction of number of
parameters in the network and the increased efficiency of the leading function.
4.2.4.1 ARCHITECTURE OVERVIEW
Regular neural nets consist of the hidden layers of neurons, which are
fully connected with the neurons in the previous layer. Different neurons in a
particular layer are completely independent and the weights are unshared. So
the interconnectivity is on very large scale, so if an image is of size 100*200*3,
it would results in 60,000 weights between different neurons. Still this is
admissible but in the cases like if picture is of large size like 500*200*3. The
increase in weights as compared to the previous example is significant. This
is really wasteful and leads to some significant problems like overfitting.
20
Figure 4.8: Representation of Perceptron including hidden layer
Figure 4.9: A regular 3-layer Neural Network
Each stimulus from the input is converted into the result by
activation of 3-D neurons in each layer. In the above example the input
image in the starting is fed into the 3-D CNN, which takes its dimensions as
height, width and depth would be 3 (Green, Red and Blue Channels).
21
4.2.4.2 LAYERS USED TO BUILD CONVOLUTIONAL NEURAL
NETWORKS The simple convolutional neural networks
architecture consists of three layers: Convolutional Layer
Pooling Layer
Fully-Connected Layer
Overall transformation of the original image to the final score includes many hidden
Networks are calculated using gradient descent to keep the output class
scores efficient with the labels in the training set for each image.
4.2.4.2.1 CONVOLUTIONAL LAYER
The Convolutional layer is the integral part of Convolutional neural network,
which have some parameters they consists of a set of adjustable or learnable filters.
In Convolutional neural network each filter slides across the input volume and maps
the input volume to a particular response using 2D activation map. So overall
Convolution Layer consists of some filters, ehich maps them to 2D matrix, each filter
will produce different 2D matrix, all of these are stacked together and results the
output volume. Convolutional neural network as it resembles with the human brain
neural network, the neurons have different connectivity and arrangement. So now we
discuss the connectivity and parameter sharing scheme of neurons.
22
4.2.4.2.1.1 LOCAL CONNECTIVITY.
In case of high-dimensional inputs like images of large size, we need
to connect neurons to its local region of input volume unlike like normal
neural networks that is wasteful.
The connectivity of a neuron with its input volume is called the receptive field
of the neuron. The depth of the Convolutional Layer can be compared to the actual
depth of the input volume. There is symmetry in the spatial dimensions (height and
width) and depth in terms of connectivity. There is a local connectivity in case of
spatial dimensions but there is full connectivity in case of depth.
Figure 4.10: Local connectivity representation of Perceptron
So we get to know that neurons overall the same as like normal
neural networks, difference is only there in connectivity. Here neurons are
connected locally in spatial dimensions.
23
Figure 4.11: Representation of Perceptron body
4.2.4.2.1.2 SPATIAL ARRANGEMENT
Connectivity explains that how different neurons in layer are connected
with the input layer or any other layer. But how many neurons are in the
hidden layer and the arrangement of them is not defined till now, arrangement
consists of the parameters like depth, stride and zero-padding.
Depth: Each layer consists of different layers, which further
consists of different filters; the no of filters used in a particular layer is
defined as the depth of the layer. Each neuron is assigned some particular
region, if that region is activated it will give stimulus corresponding to it.
Stride: A filter in a layer after mapping of particular region moves to the next
region after leaving some pixels of the input raw image to do mapping again. This
movement of filter in a layer and the no. of pixels left at each step is known as stride.
24
Zero-padding: Sometimes we want to keep the dimensions of output raw
image is equal to the input raw image, to keep the maximum details of input after
convolution. To do that we add some zeros on the boundary of the input.
W is the size of input volume
F is the Receptive field size of Convolutional Layer
Neurons S is the Stride
P is the zero padding
Illustrated formula to calculate the no. of neurons fit
is giver by No. of neurons= (W-F+2P)/S + 1 A
graphical example:
Figure 4.12: Spatial Arrangement of Neurons
4.2.4.2.1.3 PARAMETER SHARING
There is a scheme used in Convolutional Neural Network unlike in Normal
Neural networks known as Parameter sharing. Parameter sharing results in a
reduction of parameters, if one set of parameters are used in a particular position
(x,y), then this set of parameters is further implied at (x1,y1).
25
4.2.4.2.2 POOLING LAYER
After each layer of convolution there is a pooling layer. That results in
reduction of the parameters and spatial size. It overall control overfitting.
4.2.4.2.3 GENERAL POOLING
In accession to max pooling, the pooling building blocks could as well
execute different routines, such as mean pooling or L2-norm pooling. Mean
pooling constituted historically but bears of late accrued away of favour equated
to the max pooling cognitive process, which delivers bettor in practice.
Figure 4.13: General Pooling
26
Figure 4.14: Illustration of Pooling
Individually, in respective depth portion of the stimulant volume, Pooling
layer down samples the volume spatially.
4.2.4.2.4 BACKPROPAGATION
Backswept pass for a max(x, y) cognitive operation has a elementary
rendition as simply rootling the slope to the stimulant that delivered the most
high-pitched appraise in the forward-moving pass. Therefore, on the forward pass
of a pooling bed it's basal to go along the track of the indicator of the max
activation in order that gradient gouging costs effective on backpropagation.
4.2.4.2.5 GETTING RID OF POOLING
A lot people disfavour the pooling cognitive process and believe that we can break
loose without them. A few advices to cast aside the pooling bed in favour of architecture
that alone comprises of duplicated Convolutional layers. To abridge the sizing by the
theatrical performance, they suggest employing more gravid pace in CONV
27
layer from time to time. Disposing pooling layers causes variations between
coaching benevolent procreative models, such as variationally autoencoders
or generative adversarial networks. It appears belike that hereafter
architectures bequeath features very a couple to no pooling beds.
4.2.4.2.6 NORMALIZATION LAYER
Numerous cases of normalization have been nominated for employment in
Convolutional neuronal Networks architectures, occasionally with the purposes of
carrying out suppression connives abided by in the biologic brain. All the same, these
beds get passed off of favor since in practice their share consumes are minimal.
4.2.4.3 FULLY-CONNECTED LAYER
Neurons in an amply colligated layer bear entire connectives to each and every
activation in the former bed, as seen in regular neuronal Networks. Their
could thence be calculated on a matrix multiplication abided by a predetermine setoff.
4.2.4.3.1 CONVERTING FC LAYERS TO CONV LAYERS
It's valuable to mention that the only deviation between Fully Connected and
Convolutional layers constitutes that the neurons in the Convolutional bed are attached
only to a local domain in the input. Overall the configuration of both Fully Connected and
Convolutional layer i.e. taking the dot product of input with the content of the filters is
same. Consequently, it's imaginable to change over betwixt FC and CONV beds.
28
4.2.5 MODELLING OF XGBOOST
4.2.5.1 TREE ENSEMBLE
Classification and regression trees (CART) comprises of the tree ensemble model
The anticipation grades by apiece case-by-case tree is added together to acquire the
concluding score. A crucial fact is that the 2 trees adjudicate to complement to each other.
Mathematically, we can compose our exemplary model in the build as follows:
[4.8]
Where K is the number of trees, f is a function in the functional space F, and F is

the set of all possile CARTs. Therefore our objective to optimize can be written as
Obj( )= [4.9]
4.2.5.2 TREE BOOSTING (TRAINING OF TREE)
As follows for each supervised ascertaining models: delineate an
object function, and optimise it. We could bear the accompanying objective
function (It always calls for grooming loss and regularisation)
Obj= [4.9]
4.2.5.3 ADDITIVE TRAINING
This is a great deal more arduous than conventional optimisation trouble wherever we
can take the slope and exit. It is not easygoing to coach altogether the trees at once.
Alternatively, we exercise an accumulative scheme: fix what we have got ascertained,
29
and append one fresh tree at once. We compose the forecasting measure
at step t as y(t), so we accept
=0
= +
= + [4.10]
It continues to call for, which tree's act do we desire for each one step. An innate
matter constitutes to add together the match that optimises our objective.
= +
Whenever we deliberate applying MSE as our departure function, it converts to
the accompanying form.
+ [4.11]
In the above anatomy it has a 1st decree condition (generally addressed as the
residuary) and a quadratic equation condition. For some other deprivations of concern (as
30
case, logistical departure), it's not so facile to acquire such a courteous
form. So in the cosmopolitan example, we adopt the Taylor elaboration of
the departure function up to the 2d degree.
= )+ [4.11]
where the gi and hi are delimitated as
After we abolish all the constants, the specific objective at step t takes the form:
[4.12] This converts to our
optimisation destination for the fresh tree. A significant vantage of this definition
constitutes that it only depends upon gi and hi. This is how xgboost can affirm
customised departure functions. We could optimise all departure functions,
letting in logistical regression and weighted down logistical regression,
applying incisively the equivalent solver that accepts gi and hi as stimulant.
4.2.5.3.1.1 MODEL COMPLEXITY

We bear acquainted the training step, but there is one significant matter, the
so, we first elaborate the definition of the tree f(x) whilst
31
[4.13] Here w is the transmitter
by grades on leaves, q is a function ascribing apiece datum to the
representing leaf, and T is the count of leaves. In XGBoost, we delineate the
complexness as
[4.14] At that place could constitute to
a higher degree one way to delineate the complexness, only this particular one and
only does work advantageously in practice. The regularisation embodies one portion
most tree software packages address to a lesser extent cautiously, or just dismiss.
This constituted since the conventional discourse of tree learning entirely
emphasised ameliorating impureness, whilst the complexness ascendance was
entrusted to heuristics. By delineating it officially, we could sabot a
better estimate of what we are acquiring, and it cultivates advantageously in practice.
***
32
CHAPTER 5:
Before putting the algorithms to work for our humongous data, we need various
pre-processing techniques to feed data properly to the networks for maximum efficiency.
5.1 Preprocessing
Working with these files can be a challenge, especially given their
heterogeneous nature. Some preprocessing is required before CNNs can be applied
on the dataset. A comprehensive overview of useful steps to take before the data hits
our Convolutional Neural Networks/other ML method. We follow the following stages:
Figure 5.1: Flow Diagram of Classification
33
Before we start, we need to import some packages to determine the available patients.
5.1.1 Loading the files

Dicom is the de-facto file standard in medical imaging. These files
contain a lot of metadata (such as the pixel size, so how long one pixel is
in every dimension in the real world).
This pixel size/coarseness of the scan differs from scan to scan (e.g. the
distance between slices may differ), which can hurt performance of CNN approaches.
We can deal with this by isomorphic resampling, which we will do later.
We write a code to load a scan, which consists of multiple slices, which we simply
save in a Python list. Every folder in the dataset is one scan (so one patient). One
metadata field is missing, the pixel size in the Z direction, which is the slice
thickness. Fortunately, we can infer this, and we add this to the metadata.
The unit of measurement in CT scans is the Hounsfield Unit (HU), which is a
measure of radio density. CT scanners are carefully calibrated to accurately measure this.
34
Figure 5.2: Hounsfield Unit of different matter
By default, however, the returned values are not in this unit. Let's fix this.
Some scanners have cylindrical scanning bounds, but the output image is
square. The pixels that fall outside of these bounds get the fixed value -2000. The
first step is setting these values to 0, which currently corresponds to air. Next,
let's go back to HU units, by multiplying with the rescale slope and adding the
intercept (which are conveniently stored in the metadata of the scans).
35
Figure 5.3: Frequency vs. Hounsfield Unit
Figure 5.4: Lung Slice
Looking at the table from Wikipedia and this histogram, we can clearly see
which pixels are air and which are tissue. These images are used for lung
segmentation. Now, we can begin to iterate through the patients and gather their
respective data. We're certainly going to need to do some preprocessing of this data.
36
We iterate through each patient, we grab their label, we get the full path to that
specific patient (inside THAT path contains ~200 scans which we also iterate over, but
Do note here that the actual scan, when loaded by dicom, is clearly not JUST some
sort of array of values, instead it has got attributes. There are a few attributes here of
arrays, but not all of them. We're sorting by the actual image position in the scan. Later,
we could actually put these together to get a full 3D rendering of the scan.
One immediate thing to note here is those rows and columns 512 x 512.
This means, our 3D rendering is a 195 x 512 x 512 right now.
We already know that we're going to absolutely need to resize this
data. Being 512 x 512, we already expect all this data to be the same size,
but let's see what we have from other patients too.
We just went ahead and grabbed the pixel array attribute, which is what we
assumed to be the scan slice itself (we will confirm this soon), but immediately we
are surprised by this non-uniformity of slices. This isn't quite ideal and will cause
a problem later. All of our images are the same size, but the slices aren t. In terms
of a 3D rendering, these actually are not the same size.
We've got to actually figure out a way to solve that uniformity problem, but also
these images are just way too big for a convolutional neural network to handle without
37
some serious computing power. Thus, we already know out of the gate that we're going to
need to down-sample this data quite a bit, and somehow make the depth uniform.
5.1.2 Resampling
A scan may have a pixel spacing of [2.5, 0.5, 0.5], which means that the distance
between slices is 2.5 millimeters. For a different scan this may be [1.5, 0.725, 0.725], this
can be problematic for automatic analysis (e.g. using Convolutional Neural Networks)
A common method of dealing with this is resampling the full dataset to a
certain isotropic resolution. If we choose to resample everything to 1mm-
1mm-1mm pixels we can use 3D Convolutional Neural Networks without
worrying about learning zoom/slice thickness invariance.
Whilst this may seem like a very simple step, it has quite some edge
cases due to rounding. Also, it takes quite a while.
38
Figure 5.5: Z-Axis Sweep
39
Figure 5.6: X-Axis Sweep
40
Figure 5.7: Y-Axis Sweep
It is easy to see the 3 different perspectives. Another neat thing is that once
we got the scans we managed to visualize them using very basic python open source
tools - basically numpy and matplotlib. No need for fancy medical imaging tools.
We have no medical expertise whatsoever so we can just stare at them for
a bit and maybe read more on human anatomy.

41
5.1.3 Edge detection and other convolutional filters:
Since we're interested in detecting whether a patient will be diagnosed with
lung cancer, we can try to see if we can detect pulmonary nodules using something
like edge detection. This can be done using a sobel filter (aka hand crafted one-filter
CNN). Let's also take a look at the distribution of the pixels values in an image first:
Figure 5.8: Distribution of Pixels in Image
Interesting - the distribution seems to be roughly bimodal with a
bunch of pixels set at - 2000 - probably for missing values.
42
Figure 5.9 Edge Detection 1
The Sobel filter does find the edges but the image is very low intensity. One
thing we can do is to simply threshold the image to see the segmentation better:
43
Interesting results, however the issue here is that the filter will also detect the
blood vessels in the lung. So, some sort of 3-D surface detection that differentiates
between spheres and tubes would be more suitable for this situation.
44
5.1.4 3D plotting the scan
For visualization it is useful to be able to show a 3D image of the scan.
Unfortunately, the packages available in the Docker image is very limited in this sense, so
we will use marching cubes to create an approximate mesh for our 3D object, and plot
this with matplotlib. Quite slow and ugly, but this was the best possible way.
Our plot function takes a threshold argument which we can use to plot certain
structures, such as all tissue or only the bones. 400 is a good threshold for
showing the bones only (from the Hounsfield unit table above).
Figure 5.12: 3D Plot of Lung
45
5.1.5 Lung segmentation using water sheds
Most suggested Lung Segmentation methods mainly involve thresholding the lung
tissue based on its Hounsfield value and using morphological dilation to include nodules
in border regions. These methods have the severe drawback of also including lots of
tissue that is neither lung, nor a region of interest. We coded an algorithm based on the
one presented in R Shojaii et al [8] with some modifications, that we present here.
The resulting CT Images are in HU and have the same (not necessarily
equidistant) scale as the original scans.
Figure 5.13: Original Input Slice
In order to use marker based watershed segmentation, we need to identify two
markers. An internal marker, that is definitely lung tissue and an external marker, that is
definitely outside of our ROI. We're starting by creating the internal marker by
thresholding the Image and removing all regions but the biggest one. The external marker
is created by morphological dilation of the internal marker with 2 different iterations and
46
subtracting the results. A watershed marker is created superimposing the 2 markers with
different grayscale values.
Figure 5.14: Internal Marked Slice
Figure 5.15: External Marked Slice
47
Figure 5.16: Watershed Marked Slice
Now we apply the marker based Watershed algorithm to find the
precise border of the Lung located in the Black strip of the Watershed
marker shown above. In order to do the algorithm, we also need the Sobel-
Gradient-Image of our original scan, which is calculated first.
In order to not miss nodules located next to the border regions a Black Top
Hat Operation is performed to re-include those areas and areas surrounding the lung.
This is the main advantage of this method here over the methods. Only areas that
need re-inclusion get dilated, everywhere else the lung border stays precise.
48
Figure 5.17: Sobel Gradient
Figure 5.18: Watershed Image
49
Figure 5.19: Outline After Re-Inclusion
Figure 5.20: Lung filter after Closing
50
Figure 5.21: Segmented Lung
The resulting images of this code are still in the original dimensions of the
CT Scan and in Hounsfield Units with the filtered areas being assigned -2000.
This method of lung segmentation preserves the original lung border
very precisely while re-including possible nodule candidates in border
regions. The main downside is the much longer processing time per patient.
5.1.6 3D Lung segmentation

In order to reduce the problem space, we can segment the lungs
(and usually some tissue around it). It involves quite a few smart steps. It
consists of a series of applications of region growing and morphological
operations. In this case, we will use only connected component analysis.
51
The steps:
Threshold the image (-320 HU is a good threshold, but it doesn't
matter much for this approach).
Do connect components, determine label of air around person, fill
this with 1s in the binary image.
For every axial slice in the scan, determine the largest solid connected
component, and set others to 0. This fills the structures in the lungs in the mask.
Keep only the largest air pocket (the human body has other pockets of air).
Figure 5.22: 3D Segmented Lung
52
But there's one thing we can fix, it is probably a good idea to include structures
within the lung (as the nodules are solid), we do not only want air in the lungs.
Figure 5.23: 3D Lung-2
We also visualized the difference between the two.
53
Figure 5.24:3D Lung-3 (Difference)
When we want to use this mask, we first apply a dilation
morphological operation on it (i.e. with a circular kernel). This expands the
mask in all directions. The air + structures in the lung alone will not contain
all nodules, in particular it will miss those that are stuck to the side of the
lung, where they often appear. So we expand the mask a little.
54
This segmentation may fail for some edge cases. It relies on the fact that the
air outside the patient is not connected to the air in the lungs. If the patient has a
tracheostomy, this will not be the case, we do not know whether this is present in the
dataset. Also, particularly noisy images (for instance due to a pacemaker in the image
below) this method may also fail. Instead, the second largest air pocket in the body
will be segmented. We can recognize this by checking the fraction of image that the
mask corresponds to, which will be very small for this case. We can then first apply a
morphological closing operation with a kernel a few mm in size to close these holes,
after which it should work (or more simply, we do not use the mask for this image).
5.1.7 Normalization
Our values currently range from -1024 to around 2000. Anything
above 400 is not interesting to us, as these are simply bones with different
radio density. Thus, we normalize between -1000 and 400.
5.1.8 Zero centering

As a final preprocessing step, it is advisory to zero center our data so
that our mean value is 0. To do this we simply subtract the mean pixel value
from all pixels. We found this to be around 0.25 in the LUNA16 dataset.
mean per image. The CT
scanners are calibrated to return accurate HU measurements. There is no such
thing as an image with lower contrast or brightness like in normal pictures.
55
5.2 Running the algorithms
With these steps our images are ready for consumption by our Neural Network,
CNN, XGBoost and/or other ML methods. We can do all these steps offline (one time
and save the result), and we let it run overnight as it took a long time.
Now, the data we have is actually 3D data, not 2D data that's covered in
most tutorials and papers.
Our convolutional window/padding/strides need to change. Now, to
have a bigger window, our processing penalty increases significantly as
we increase in size, obviously much more than with 2D windows.
Now we're set to train the network. When running locally, we make sure our
training data is NOT the sample images, it should be the stage1 images. Our
training file should be ~700mb with ~1400 total labeled samples.
5.2.1 CNN Parameters

5.2.1.1 CONVOLUTIONAL LAYERS COUNT = 2:
The additional convolutional layers the dearer (moderately, because for each
one convolutional layer abbreviates the count of stimulant characteristics to the fully
colligated layers). Although afterward around 2 or 3 layers the truth amplification gets
quite belittled and so we call for to accept a tradeoff between generalization truth and
training time. That aforesaid, whole image identification chores are dissimilar and so
the dearest technique is to merely adjudicate incrementing the count of convolutional
layers one after another till we get an acceptable final outcome.
56
5.2.1.2 NODES PER HIDDEN LAYER COUNT = 64:
In that respect is an expression to determine the count of nodes because it's
unlike for for each one task that demands to be executed. To bring about a rough
maneuver we broadly keep the count of nodes 2/3 the size of the former layer, holding
the 1st layer 2/3 the size of the concluding feature mappings. This nevertheless is
barely a approximative guide and reckons chiefly upon the dataset. Another ordinarily
employed selection is to commence with an overweening count of nodes, then to take
away the unneeded nodes through with a method addressed pruning.
5.2.1.3 STRIDE SIZE = [1,2,2,2,1]:
Forthwith, we essentially assign the pace with which we glide the filter.
Whilst the pace is 1 then we motion the filters unitary pixel at once. Whilst the
pace is deuce (or uncommonly deuce-ace or more, although this constitutes
infrequent in practice) and so the filters alternate 2 pixels at once. This
consequences in bringing forth belittled end product intensities spatially.
5.2.1.4 THE NUMBER OF HIDDEN LAYERS=4:
The count by concealed layers called for hinges upon the intrinsical complexness of
our dataset, this could be empathised by considering what apiece layer accomplishes:
0 Concealed layers admit the network to framework just a linear function.
This represents as poor for just about all image identification chores.
57
Unitary concealed layer allows for the network to framework an
indiscriminately complicated function. This constitutes decent
enough for numerous image recognition chores.
In theory, deuce concealed layers offer up minuscule profit over an undivided
layer, nevertheless, in practice a few chores could ascertain a supplementary
layer salutary. This had better be handled with cautiousness, because a 2d
layer can cause over-fitting. Employing to a higher degree deuce concealed
layers is just about never beneficial only advantageous for peculiarly complex
chores, or when a very macroscopic quantity of training data is accessible.
5.2.2 XGBoost Parameters:

5.2.2.1 N_ESTIMATORS = 1500:
This constitutes the count of sequential trees to be modeled in the
Boosting Tree. GBM is reasonably robust whilst working with high
number of trees. Nevertheless, it may still overfit sometimes. Thus,
this needs to be tuned for a particular learning rate.
5.2.2.2 MIN_CHILD_WEIGHT = 9:
This is defined as a fraction of the total number of observations
instead of an integer.
5.2.2.3 MAX_DEPTH = 10:
58
This parameter controls over-fitting. The deeper the tree, the better
the model learns relations specific to a definitive sample.
5.2.2.4 LEARNING_RATE = 0.05:
The learning rate resolves the impact of each tree on the final result.
Initially, GBM works by beginning with an antecedent appraisal
which is thereupon renewed using the output of each tree. The
learning parameter administers the magnitude of these changes.
In order to devise a robust model, lower values are predominantly preferred.
This makes the specific characteristics of tree to be generalized as well.
However, this is computationally extravagant since lower values
would require greater number of trees to model all relations.
5.2.2.5 COLSAMPLE_BYTREE = 0.80:
This is the ratio of columns (sub-sample) used while constructing the tree.
5.2.2.6 SEED = 42:
We need to generate random folds for cross validation, which is
done with the seed.
5.2.2.7 NTHREAD = 8
With multi-core processing available, the maximum number of
parallel trees to be processed simultaneously can be set with this.
59
5.2.2.8 SUBSAMPLE = 0.80:
The number of observations by the total, to be chosen for each tree.
This is accomplished by executing random sampling.
To reduce variance, values slightly less than 1 is taken, to make the
model robust to it.
5.2.2.9 VERBOSE = TRUE
We have an output to be printed when the model fits the cross-
validation sets. The disparate type of values it can take are:
0: no output generated (default)
1: output generated for trees in certain intervals
>1: output generated for all trees
5.2.2.10 EARLY_STOPPING_ROUNDS = 50:
To train complex machine learning models and avoid overfitting, we use
this. It monitors the performance of the model being trained on a separate
test dataset. This training procedure is thereupon curbed when the
performance on the test dataset does not show any improvisation even
after a number of training iterations, that can be specified by the user.
60
CHAPTER 6:
We have used Log-loss, a (incorporating the
idea of probabilistic confidence), as a principal evaluation metric. It is essentially the
cross entropy between true labels and predictions distribution pattern. It is the extra
unpredictability when one assumes a different distribution than the true distribution,
added to the entropy of the true distribution. We thus maximized the accuracy of our
classifier by minimizing the cross entropy. Here
[6.1]
· n is number of patients in the test dataset

· is predicted probability of a cancer patient image
· is 1 if cancer is diagnosed, else 0.
· log() is natural logarithm
Figure 6.1: Log Loss v/s probability for a single positive instance
61
6.1.1 Neural Network
25 Iterations were run for the Preprocessed data (muchdata-50-50-
20.npy) in the root folder as:
Figure 6.2: Root Folder
Iterations appear as below, while processing:
62
Figure 6.3: Results with each Iteration
A mean Log-Loss of 0.88 is achieved using this algorithm.
6.1.2 Convolutional Neural Network

Iterations for the convolutional neural network run as follows:
63
Figure 6.4: System Usage Stats and iterations result
Thus, we see a similar curve to the Neural Network, but lesser Log-Loss,
as expected. We achieve a mean Log-Loss error of 0.63 on the test dataset.
64
6.1.3 XGBoost
Using a grid search for optimum value of parameters, and printing values after
each cross validation, we obtain the following result:
Figure 6.5: XGBOOST Output
65
We see a high-variance curve with lesser Log-Loss on average. We achieve a
mean Log-Loss error of 0.57 on the test dataset.
6.1.4 Comparison
Comparing the errors for the three algorithms, we have:
Figure 6.6: Comparison Curve of CNN and XGBOOST
This shows the Log-loss error comparison for the first 25 iterations
of the said algorithms. We thus have the mean Log-loss errors as:
66
Table 6.1: Log loss Error of Neural Network, CNN and XGBOOST
S.no. Algorithm Log-loss error
1. Neural Network 0.88
2. Convolution Neural Network 0.63
3. XGBoost 0.57
67
CHAPTER 7:
In this thesis, we have presented a new approach for lung disease
detection from low dose CT scan data, which can help Radiologists reduce lung
cancer mortality rate. Till date, CNN was considered to be de-facto for image
recognition problems. Our results showcase XGBoost as a promising alternative,
bettering the output of CNN by a Log-Loss margin of 0.06 for the aforementioned
data. Thus, further scope of improvement in these algorithms will provide ways to
make sizeable progress in the field of curing lung diseases.
Our method (input as 3D segmented and holistic images) is significantly
incommensurable to antecedent image patch-based algorithms and 2D
convolutions. It brings to light a more realistic and practical clinical solution.
The clinical meaning and learned feature-values from the algorithms is a direction
we plan to seek further research for. Furthermore, other lung-diseases can be recognized,
making identifying multiple diseases at a single slice additionally interesting.
68
[1] K. Murphy, B. van Ginneken, A. M. R. Schilham, B. J. de Hoop, H. A. Gietema, and
M
CT using local image features and k-nearest-
Analysis, vol. 13, pp. 757 770, 2009.
[2] A. A. A. Setio, C. Jacobs, J. Gelderblom,
10, pp. 5642 5653, 2015.
[3] E. M. van Rikxoort, B. de Hoop, M. A. Viergever, M. Prokop, and B. van Ginneken,

"Automatic lung segmentation from thoracic computed tomography scans using a hybrid
approach with error detection", Medical Physics, vol. 4236 no. 10, pp. 2934-2947, 2009.
[4] C. Jacobs, E. M. van Rikxoort, T. Twellmann, E. T. Scholten, P. A. de Jong, J. M.

Kuhnigk, M. Oudkerk, H. J. de Koning, M. Prokop, C. Schaefer-Prokop, and B. van
384, 2014
[5] Friedman, Jerome H. Greedy function approximation: A gradient boosting

machine. Ann. Statist. 29 (2001), no. 5, 1189--1232. doi:10.1214/aos/1013203451.
[6] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang,
Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang (2015): MXNet: A
Flexible and Efficient Machine Learning Library for Heterogeneous
Distributed Systems, arXiv:1512.01274 [cs.DC]
69
[7] Mingchen Gao, Ulas Bagci, Le Lu, Aaron Wu, Mario Buty, Hoo-Chang Shin,
Holger Roth, Georgios Z. Papadakis, Adrien Depeursinge, Ronald M.
Summers, Ziyue Xu & Daniel J. Mollura (2016): Holistic classification of CT
attenuation patterns for interstitial lung diseases via deep convolutional
neural networks, Computer Methods in Biomechanics and Biomedical
Engineering: Imaging & Visualization, DOI: 10.1080/21681163.2015.1124249
[8] Arnaud Arindra Adiyoso Setio, Alberto Traverso, Thomas de Bel, Moira S.N.
Berens, Cas van den Bogaard, Piergiorgio Cerello, Hao Chen, Qi Dou, Maria Evelina
Fantacci, Bram Geurts, Robbert van der Gugten, Pheng Ann Heng, Bart Jansen,
Michael M.J. de Kaste, Valentin Kotov, Jack Yu-Hung Lin, Jeroen T.M.C. Manders,
Alexander Sónora-Mengana, Juan Carlos García-Naranjo, Mathias Prokop, Marco
Saletta, Cornelia M Schaefer-Prokop, Ernst T. Scholten, Luuk Scholten, Miranda M.
Snoeren, Ernesto Lopez Torres, Jef Vandemeulebroucke, Nicole Walasek, Guido C.A.
Zuidhof, Bram van Ginneken, Colin Jacobs (2017): Validation, comparison, and
combination of algorithms for automatic detection of pulmonary nodules in computed
tomography images: the LUNA16 challenge, arXiv:1612.08012
[9]
Mitosis detection in breast cancer histology images with deep neural
networks. In International Conference on Medical Image Computing and
Computer-assisted Intervention (pp. 411-418). Springer Berlin Heidelberg.
[10] Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E. and Nielsen, M., 2013,
September. Deep feature learning for knee cartilage segmentation using a triplanar
convolutional neural network. In International conference on medical image computing
and computer-assisted intervention (pp. 246-253). Springer Berlin Heidelberg.
70
[11] Sharif Razavian, A., Azizpour, H., Sullivan, J. and Carlsson, S., 2014. CNN features off-
the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition Workshops (pp. 806-813).
[12] Christodoulidis, S., Anthimopoulos, M., Ebner, L., Christe, A. and Mougiakakou,
S., 2017. Multisource Transfer Learning With Convolutional Neural Networks for Lung
Pattern Analysis. IEEE journal of biomedical and health informatics, 21(1), pp.76-84.
[13] Gao, M., Xu, Z., Lu, L., Wu, A., Nogues, I., Summers, R.M. and Mollura,
D.J., 2016, April. Segmentation label propagation using deep convolutional
neural networks and dense conditional random field. In Biomedical Imaging
(ISBI), 2016 IEEE 13th International Symposium on (pp. 1265-1268). IEEE.
[14] Pan, Y., Huang, W., Lin, Z., Zhu, W., Zhou, J., Wong, J. and Ding, Z., 2015,
August. Brain tumor grading based on neural networks and convolutional
neural networks. In Engineering in Medicine and Biology Society (EMBC), 2015
37th Annual International Conference of the IEEE (pp. 699-702). IEEE.
[15] Ciompi, F., de Hoop, B., van Riel, S.J., Chung, K., Scholten, E.T.,
Oudkerk, M., de Jong, P.A., Prokop, M. and van Ginneken, B., 2015.
Automatic classification of pulmonary peri-fissural nodules in computed
tomography using an ensemble of 2D views and a convolutional neural
network out-of-the-box. Medical image analysis, 26(1), pp.195-202.
[16] Anthimopoulos, M., Christodoulidis, S., Christe, A. and Mougiakakou, S., 2014,
August. Classification of interstitial lung disease patterns using local DCT features
and random forest. In Engineering in Medicine and Biology Society (EMBC), 2014 36th
Annual International Conference of the IEEE (pp. 6040-6043). IEEE.
71
[17] Li, Q., Cai, W., Wang, X., Zhou, Y., Feng, D.D. and Chen, M., 2014,
December. Medical image classification with convolutional neural network.
In Control Automation Robotics & Vision (ICARCV), 2014 13th International
Conference on (pp. 844-848). IEEE.
[18] Nogues, I., Yao, J., Mollura, D. and Summers, R.M., Deep Convolutional
Neural Networks for Computer-Aided Detection: CNN Architectures,
Dataset Characteristics and Transfer Learning.
[19] Samala, R.K., Chan, H.P., Hadjiiski, L., Helvie, M.A., Wei, J. and Cha, K., 2016.
Mass detection in digital breast tomosynthesis: Deep convolutional neural network
with transfer learning from mammography. Medical Physics, 43(12), pp.6654-6666.
[20] Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E. and Greenspan, H., 2015,
April. Chest pathology detection using deep learning with non-medical

training. In Biomedical Imaging (ISBI), 2015 IEEE 12th International
Symposium on (pp. 294-297). IEEE.
[21] Fotin, S.V., Yin, Y., Haldankar, H., Hoffmeister, J.W. and Periaswamy, S., 2016,
March. Detection of soft tissue densities from digital breast tomosynthesis:
Comparison of conventional and deep learning approaches. In SPIE Medical
Imaging (pp. 97850X-97850X). International Society for Optics and Photonics.
[22] Kooi, T., Gubern-Merida, A., Mordang, J.J., Mann, R., Pijnappel, R.,
Schuur, K., den Heeten, A. and Karssemeijer, N., 2016, June. A comparison
between a deep convolutional neural network and radiologists for
classifying regions of interest in mammography. In International Workshop
on Digital Mammography (pp. 51-56). Springer International Publishing.
72
[23] Cha, K.H., Hadjiiski, L.M., Samala, R.K., Chan, H.P., Cohan, R.H., Caoili, E.M.,
Paramagul, C., Alva, A. and Weizer, A.Z., 2016. Bladder cancer segmentation in CT for
treatment response assessment: Application of deep-learning convolution neural
network A pilot study. Tomography: a journal for imaging research, 2(4), p.421.
[24] Shen, W., Zhou, M., Yang, F., Yang, C. and Tian, J., 2015, June. Multi-
scale convolutional neural networks for lung nodule classification. In
International Conference on Information Processing in Medical Imaging
(pp. 588-599). Springer International Publishing.
[25] Mingchen Gao, Ulas Bagci, Le Lu, Aaron Wu, Mario Buty, Hoo-Chang
Shin, Holger Roth, Georgios Z. Papadakis, Adrien Depeursinge, Ronald M.
Summers, Ziyue Xu & Daniel J. Mollura (2016): Holistic classification of CT
attenuation patterns for interstitial lung diseases via deep convolutional
neural networks, Computer Methods in Biomechanics and Biomedical
Engineering: Imaging & Visualization, DOI: 10.1080/21681163.2015.1124249
73
Students involved in the project team include:
1. Amit Kumar Singh, 424/IC/13, amitsinghrajput14@gmail.com
2. Rohan Challana, 526/IC/13, rohan.challana.10@gmail.com
3. Utkarsh Jain, 557/IC/13, utkarshj.ic@nsit.net.in
74

Thesis Report

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Thesis Report

Caricato da

Copyright:

Formati disponibili

B.E.

PID Controller tuning using soft computing

Under the Guidance of

DIVISION OF INSTRUMENTATION AND CONTROL ENGINEERING

This thesis is dedicated to Professor Andrew Ng, Director of the

Stanford Artificial Intelligence Lab, and Chairman and Co-founder of

involved in education, training and development, specially in the Artificial

Intelligence give everyone in the world access to a great

education, for free

Machine Learning, from Coursera.

Amit Kumar Singh Rohan Challana Utkarsh Jain

Instrumentation and Control Engineering Department

practices of Instrumentation and Control Engineering. We also thank our parents

Pulak Malik Sarthak Garg Shashwat Bhageria Shubham Singh

Instrumentation and Control Engineering Department

This is to certify that the project entitled “PID controller tuning

Pulak Malik Sarthak Garg Shashwat Bhageria Shubham Singh

Instrumentation and Control Engineering Department

soft computing techniques” by Pulak Malik, Sarthak Garg, Shashwat

Bhageria and Shubham Singh is a record of bona-fide work carried

out by us, in the department of Instrumentation and Control

Engineering, Netaji Subhas Institute of Technology, University of Delhi,

New Delhi, under our supervision and guidance in partial fulfillment of

requirements for the award of the degree of Bachelor of Engineering in

Instrumentation and Control Engineering, University of Delhi in the

academic year 2017-2018.

The results presented in this thesis have not been submitted to

Dr. J.R.P Gupta Mr. A.N. Jha

Instrumentation and Control Engineering Department

soft computing techniques” by Pulak Malik, Sarthak Garg, Shashwat

Bhageria and Shubham Singh is a record of bona-fide work carried

out by us, in the department of Instrumentation and Control

Engineering, Netaji Subhas Institute of Technology, University of Delhi,

New Delhi, under our supervision and guidance in partial fulfillment of

requirements for the award of the degree of Bachelor of Engineering in

Instrumentation and Control Engineering, University of Delhi in the

academic year 2017-2018.

Prof. Smriti Srivastava

Artificial bee colony (ABC) algorithm has proved its importance in

Figure 4.1: Image Classification of cat..............................................................................................10

Figure 4.2: Image Classification on the basis of different factors...................................11

Figure 4.3: User's Iterest with different step functions ...........................................................14

Figure 4.4: Perceptron Input and Output.........................................................................................15

Figure 4.5: Representation of a Perceptron...................................................................................16

Figure 4.6: Network of Perceptron's...................................................................................................17

Figure 4.7: Sigmoid Function..................................................................................................................18

Figure 4.8: Representation of Perceptron including hidden layer. ..................................21

Figure 4.9: A regular 3-layer Neural Network.................................................................................21

Figure 4.10: Local connectivity represetation of Perceptron ..............................................23

Figure 4.11: Representation of Perceptron body........................................................................24

Figure 4.12: Spatial Arrangement of Neurons...............................................................................25

Figure 4.13: General Pooling....................................................................................................................26

Figure 4.14: Illustration of Pooilng.......................................................................................................27

Figure 5.1: Flow Diagram of Classification ...................................................................................33

Figure 5.2:Hounsfield Unit of different matter..............................................................................35

Figure 5.3:Frequency vs. Hounsfield Unit.......................................................................................36

Figure 5.4: Lung Slice..................................................................................................................................36

Figure 5.5: Z-Axis Sweep...........................................................................................................................39

Figure 5.6: X-Axis Sweep...........................................................................................................................40

Figure 5.8: Distribution of Pixels in Image......................................................................................42

Figure 5.9:Edge Detection 1.....................................................................................................................43

Figure 5.10: Edge Detection 2.................................................................................................................43

Figure 5.11: Edge Detection 3.................................................................................................................44

Figure 5.12: 3D Plot of lung......................................................................................................................45