Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
PROJECT
ON
Coursera. His passion for teaching has set a new standard for anyone
field. This thesis uses the basics from his esteemed course,
i
ACKNOWLEDGEMENTS
We would like to express our deepest gratitude to our advisers Dr. J.R.P.
Gupta and Prof. A.N. Jha for giving us the opportunity to work under their able
supervision. We are indebted towards the faculty of NSIT, whose guidance and
teaching for the past four years have helped us in understanding the theory and
and families who always supported us unconditionally in our thick and thin. Their
belief in our capabilities always pushed us to give our best in this project. Without
these invaluable contributions, this project could not have been completed.
ii
DECLARATION
This is to certify that the project entitled “PID controller tuning using
any other university in any form for the award of any other degree.
iv
CERTIFICATE
This is to certify that the project entitled “PID controller tuning using
v
PLAGIARISM REPORT
vi
ABSTRACT
vii
LIST OF TABLES
Table 6.1: Log loss Error of Neural Network, CNN and XGBOOST ..................................67
viii
LIST OF FIGURES
ix
Figure 5.7: Z-Axis Sweep...........................................................................................................................41
Figure 6.1: Log Loss v/s probability for a single positive instance................................61
x
Figure 6.5:XGBOOST Output...................................................................................................................65
xi
INDEX OF EQUATIONS
15
Equation 4.6 Output corresponding to the input parameters and
biases
constants
xii
Equation 4.14 General Tree function 29
Equation 4.15 Complexity function in XGBOOST
30
Equation 6.1 Log Loss Error
61
xiii
TABLE OF CONTENTS
xiv
4.2.5.2 Tree Boosting (Training of Tree)...........................................................29
4.2.5.3 Additive Training............................................................................................29
controller of this century for its remarkable effectiveness, easiness of implementation and vast
applicability,though the tuning of PID Controller has been found to be hard over the years. The
tuning of PID controller was done using various methods which include Genetic Algorithms,
All these tuning methods are done manually and are difficult as well as time consuming. For
using PID controller efficiently, the optimal tuning of the parameters of PID controller has
become a significant research area and is taken up by various control graduates. Optimization
problems have been resolved with the aid of numerous soft computing techniques which include
fuzzy logic,artificial neural network and meta heurisitic methods which are an alternative to the
traditional approches. Many algorithms like those of Evolutionary Programming were taking a
Genetic Algorithms (GAs) are a random global search method that replicates the evolution
process and was developed in the United States in the 1970 at the University of Michigan.
The Genetic Algorithm uses two major principles i.e. the Crossover of population and mutation
The Genetic Algorithm has no knowledge of the correct solution and depends entirely on
responses from the environment and evolution operators such as crossover and mutation to arrive
at the best proposed solution. The algorithm avoids reaching the local minima and converges to
sub optimal solutions, by starting with many independent points and searching them in parallel.
But for very complex problems, it may converge at local minima. The time consumed by this
optimization algorithm is also high since it involves so many parameters. Also, the genetic
algorithm is sensitive to the initial population used while we want a wide diversity of easily
feasible solution.
The biggest limitation of Genetic Algorithm is that it cannot guarantee an optimal solution. The
solution quality also degrades with the rise of problem size. The convergence rate is slow in GA.
However it can generate good quality solutions for any given problem and function type.
The natural behaviour of bees and their collective activities in their hives has been fascinating
researchers all over the world for decades. Recently, the researchers have been focusing on
Swarm Intelligence followed by developing Swarm Optimization Methods which have extended
their knowledge to animal societies especially about insect colonies. Ants, termites, bees, firefly
and wasps are the most important social insects inspiring efficient problem solving algorithms.
In 2005, Karaboga presented Artificial Bee Colony (ABC) algorithm to optimize numeric
benchmark functions. It was then extended by Karaboga and Basturk and showed to outperform
other recognized heuristic methods such as GA [18] as well as DE, PSO and ACO. In addition, it
ABC, which is similar to other nature-based algorithms, models honey bees lifecycle but not
precisely. In this model, the honey bees are categorized as employed, onlooker and scout. An
employed bee is a forager associated with a certain food source which she is currently exploiting.
She remembers the character of the food source and then after returning to the hive, shares it
with other bees waiting there via a peculiar communication called waggle dance. An onlooker
bee is an unemployed bee at the hive which tries to find a new food source using the information
provided by employed bees. While ignoring the other’s information, a scout searches around the
hive randomly. In nature, the recruitment of unemployed bees happens in a nearly similar way. In
addition, when the quality of a food source is below a certain level, it will be abandoned to make
the bees explore for new food sources. In ABC algorithm, the solutions are modelled as food
sources and their corresponding objective functions/fitness function as the quality (nectar
Although the exploration of solution of ABC algorithm is good, exploitation to found food
sources is very bad. . And it falls into local optimum because of premature and improved
convergence rate, as a result of which there was a need of more optimised and improved
technique.
As we know that many practical control problems faced by control engineers are of higher order with time delay.
The Metaheuristic algorithm discussed above are generally works good for lesser order problem but as the order of
problem increases these algorithms converges to local maxima and many times they even don’t give satisfactory
result.
Genetic algorithm being very robust have high convergence rate for lower order problems but for higher order
practical problems convergence rate reduces and they provide local maxima as optimal solution.
Artificial Bee Colony method being flexible, easy to implement and good exploration of the solution
but exploitation to found food sources is very bad and falls into local optimum solution as a result of premature.
For the past few years, control system has assumed an increasingly important role in the
development and advancement of modern civilization and technology. Practically every
aspect of our day-to-day activities is affected by some type of control systems. Automatic
control system is found in majority in all sectors of industry, such as quality control of
manufactured products, automatic assembly line, machine-tool control, space technology
and weapon system, computer control, transportation systems, power systems, robotics
and many others. It is essential in such industrial operations as controlling pressure,
temperature, humidity, and flow in the process industries.
Fig2.1
Automatic Controllers:-
An automatic controller is used to compare the actual value of plant result with
reference command, determines the difference, and produces a control signal that
will reduce this difference to a negligible value. The manner in which the
automatic controller produces such a control signal is called the control action.
An industrial control system comprises of an automatic controller, an actuator, a
plant, and a sensor (measuring element). The controller detects the actuating
error command, which is usually at a very low power level, and amplifies it to a
very high level. The output of the automatic controller is fed to an actuator, such
as a hydraulic motor, an electric motor or a pneumatic motor or valve (or any
other sources of energy). The actuator is a power device that produces input to the
plant according to the control signal so that the output signal will point to the
reference input signal.
The sensor or the measuring element is a device that converts the output variable
into another optimum variable, such as a displacement, pressure or voltage, that
can be used to compare the output to the reference input command. This element
is in a feedback path of the closed loop system. The set point controller must be
converted to reference input with the same unit as the feedback signal from the
sensor element.
Type of controller to use must be decided depending upon the nature of the plant and the
operating condition, including such consideration as safety, cost, availability, reliability,
accuracy, weight and size.
Two-position or on-off controllers: -
In a two-position control system, the actuating part has only two fixed positions,
which are, in many simple cases, simply on and off. Due to its simplicity and
inexpensiveness, it is being very widely used in both industrial and domestic control
system.
Let the output signal from the controller be u(t) and the actuating error signal be
e(t). Then mathematically,
Where,
U1 and U2 are constants and the minimum value of U2 is usually either
zero or - U1.
In the proportional control algorithm, the controller output is proportional to the error
signal, which is the difference between the set point and the process variable. In other
words, the output of a proportional controller is the multiplication product of the error
signal and the proportional gain. This can be mathematically expressed as
Pout = Kp e(t)
Where
With increase in Kp :
· Response speed of the system increases.
In the integral control of a plant, the control signal, the output signal from the controller,
at any instant is the area under the actuating error signal curve up to that instant. But
while removing the steady-state error, it may lead to oscillatory response of slowly
decreasing amplitude or even increasing amplitude, both of which is usually undesirable
[5].
Gc (s) = Kp + Ki / s
Fig.1.2 (courtesy-[5])
Integral control action added to the proportional controller converts the original system
into high order. Hence the control system may become unstable for a large value of K p
since roots of the characteristic eqn. may have positive real part. In this control,
proportional control action tends to stabilize the system, while the integral control action
tends to eliminate or reduce steady-state error in response to various inputs. As the value
of Ti is increased,
Overshoot tends to be smaller
Speed of the response tends to be slower.
Gpd(s) = Kp + Kds or
= Kp(1+Tds)
The PID controller was first placed on the market in 1939 and has remained the most widely
used controller in process control until today. An investigation performed in 1989 in Japan
indicated that more than 90% of the controllers used in process industries are PID controllers
and advanced versions of the PID controller. PI controllers are fairly common, since
derivative action is sensitive to measurement noise
“PID control” is the method of feedback control that uses the PID controller as the main tool.
The basic structure of conventional feedback control systems is shown in Figure below, using
a block diagram representation. In this figure, the process is the object to be controlled. The
purpose of control is to make the process variable y follow the set-point value r. To achieve
this purpose, the manipulated variable u is changed at the command of the controller. As an
example of processes, consider a heating tank in which some liquid is heated to a desired
temperature by burning fuel gas. The process variable y is the temperature of the liquid, and
the manipulated variable u is the flow of the fuel gas. The “disturbance” is any factor, other
than the manipulated variable, that influences the process variable. Figure below assumes that
only one disturbance is added to the manipulated variable. In some applications, however, a
major disturbance enters the process in a different way, or plural disturbances need to be
considered. The error e is defined by e = r – y. The compensator C(s) is the computational
rule that determines the manipulated variable u based on its input data, which is the error e in
the case of Figure. The last thing to notice about the Figure is that the process variable y is
assumed to be measured by the detector, which is not shown explicitly here, with sufficient
accuracy instantaneously that the input to the controller can be regarded as being exactly
equal to y.
Fig. 1.3(courtesy-[5])
When used in this manner, the three elements of PID produces outputs with the
following nature:
P element: proportional to the error at the instant t, this is the “present” error.
I element: proportional to the integral of the error up to the instant t, which can
be interpreted as the accumulation of the “past” error.
D element: proportional to the derivative of the error at the instant t, which can be
interpreted as the prediction of the “future” error.
Thus, the PID controller can be understood as a controller that takes the present, the past, and
the future of the error into consideration. The transfer function G c(s) of the PID controller is :
G (s) = K (1 + Ti / s + Td s)
= Kp + Ki / s + K d s
1.6 Application: -
In the early history of automatic process control the PID controller was implemented as a
mechanical device. These mechanical controllers used a lever, spring and a mass and
were often energized by compressed air. These pneumatic controllers were once the
industry standard [5].
Electronic analog controllers can be made from a solid-state or tube amplifier, capacitor
and a resistance. Electronic analog PID control loops were often found within more
complex electronic systems, for example, the head positioning of a disk drive, the power
conditioning of a power supply, or even the movement-detection circuit of a modern
seismometer. Nowadays, electronic controllers have largely been replaced by digital
controllers implemented with microcontrollers or FPGAs.
1 t de(t)
u(t) = K
p
[ e(t) +
T e(t') dt' + T (
d dt
)]+b
i 0
Where,
e is the difference between the current value and the set point.
b is the set point value of the signal, also known as bias or offset.
3.2 Ziegler-Nichols Rules for tuning PID Controller:-
It has been observed that step responses of many processes to which PID controllers are applied
have monotonically increasing characteristics as shown in Figures a and b, so most traditional
design methods for PID controllers have been developed implicitly assuming this property.
However, there exist some processes that exhibit oscillatory responses to step inputs.
Two tuning methods were proposed by Ziegler and Nichols in 1942 and have been widely
utilized either in the original form or in modified forms. One of them, referred to as Ziegler–
Nichols‟ ultimate sensitivity method, is to determine the parameters as given in Table 1 using the
data Kcr and Tcr obtained from the ultimate sensitivity test. The other, referred to as Ziegler–
Nichols‟ step response method, is to assume the model FOPDT and to determine the parameters
of the PID controller as given in Table 2 using the parameters R and L of FOPDT which are
determined from the step response test.
Type of controller Kp Ti Td
P 0.5Kcr 0
PI 0.45Kcr 0.833Tcr 0
Type of controller Kp Ti Td
P 1/RL 0
PI 0.9/RL L/0.3 0
Disadvantage :-
The classical tuning methods explained above have the following features:
• The process is assumed, implicitly (in the case of Ziegler–Nichols‟ ultimate sensitivity
method) or explicitly (in the case of Ziegler–Nichols‟ step response method), to be
modelled by the simple transfer function.
• The optimal values of the PID parameters are given by formulae of the process parameters that
are determined directly and uniquely from experimental data.
The first feature is a weakness of these classical methods, in the sense that the applicable
processes are limited, or in other words that the claimed “optimal” values are not necessarily, and
are sometimes fairly far from, the true optimal in practical situations where the transfer function
is nothing but an approximation of the real process characteristics. Specifically, the problem is
serious when the pure delay L of the process is very short or very long, where “very short” and
“very long” roughly means outside the range 0.05≤L/T≤1.0 [17]. It can be interpreted as a
weakness in the sense that there is no room to improve the results by making use of more
detailed information about the process which is obtainable from theoretical study and accurate
measurement.
Many attempts have been made to make up for these weaknesses of the classical methods. Many
theoretical considerations have been used to develop sophisticated methods that use, as the basis
of tuning, the shape of the frequency response of the return ratio, poles (and zeros) of the closed-
loop transfer function, time-domain performance indices such as ISE, or frequency-domain
performance indices.
It is observed that the response of most of the processes under step change in input yields a
sigmoidal shape
Fig: Process Reaction Curve for Cohen Coon Method
Such sigmoidal shape can be adequately approximated by the response of a first order process with dead time.
(IV.70)
From the approximate response it is easy to estimate the parameters. The controllers are designed
as given in Table IV.5.
Table IV.5: Controller settings using Cohen-Coon design method
Adaptive nature of Artificial Neural Network controllers have made them a major area of interest among
researchers in widespread fields[7-13], mainly because ANN controllers can efficiently learn the unknown or
continuously varying environment and act accordingly. Industrial automation applications prefer PID
(proportional Integral Derivative) controllers because of its simple structure and robustness etc.
A Neural Network tuned PID (NNPID) which has two inputs, one output and three layers which are input
layer, hidden layer and output layer. The input layer has two neurons and the output layer has one and their
neurons are P-neurons. The hidden layer has three neurons and they are P-neuron (H1), I-neuron (H2) and D-
neuron (H3) respectively. The NNPID is shown in Fig.2 In NNPID when suitable connective weights are
chosen, a NNPID becomes a conventional PID controller.
where u is the controller output, KP is the proportional gain, KI is the integral time, KD is the derivative time,
and e is the error between the set point and the process output. For a digital control of ts sampling periods,
we can write
Back-propagation algorithms: In the present control system, the aim of the NNPID algorithms is to tune the
PID parameters in such a way that the mean square error (MSE) is minimum which is given by
The weights of NNPID are changed by Steepest descent in on-line training process. The details of the weight
adjustments used are as given in reference [5]. The training of the neural network has been done by varying
the PID parameters and taking the sample online. The NNPID was trained with a total of 50 sets of PID
parameters each having 360 data points
Fuzzy Logic
Metaheuristics
Chapter 4 Genetic Algorithm
GENETIC ALGORITHM
information. Genetic Algorithms is often seen as function optimizer, although the range of
chromosomes. One then evaluates these structures and allocated reproductive opportunities in
such a way that these chromosomes which will potray a better solution to the target problem are
given more chances to 'reproduce than those chromosomes which are giving poorer solutions.
Evolutionary Cycle
Population: number of individuals present with the same length of chromosome. Fitness: the
value assigned to an individual based on how far or close an individual is from the solution;
Fitness Function: a function that assigns value to the individual it is problem specific.
Crossover :-Taking two fit individuals and then intermingling their chromosome to create two
new individuals.
6.2.1 Encoding: In order to use GA to solve the problems variables(xl,x2....xn) are first encoded
to strings. Binary-coded strings having 1's and O's are mostly used. The length of the string is
usually determined according to the desired solution accuracy. For example, if four bits are used
to code each variable in a two-variable optimization problem, the strings (0000 0000) and (1111
1111) would represents the points (x11, x21) T (xlu, x2u) T respectively, because the substrings
(0000) and
(1111) have the minimum and maximum decoded values, any other eight bit string can be found
6.2.2 GA Operators:
the operation of GAs begins with a population of a random strings representing design or
decision variables. The population is then operated by three main operators; reproduction,
crossover and mutation to create a new population of points. GAs can be viewed as trying to
maximize the fitness function, by evaluating several solution vectors. The purpose of these
operators is to create new solution vectors by selection, combination or alteration of the current
solution vectors that have shown to be good temporary solutions. The new population is further
evaluated and tested till termination. If the termination criterion is not met, the population is
iteratively operated by the above three operators and evaluated. This procedure is continued until
the termination criterion is met. One cycle of these operations and d the operators subsequent
evaluation procedure is known as a generation in GAs terminology. The are described in the
following steps
6.2.3 Reproduction: Reproduction (or selection) is an operator that makes more copies of better
strings in a new population. Reproduction is usually the first operator applied on a population.
Reproduction selects good strings in a population and forms a mating pool. This is one of the
reason for the reproduction operation to be sometimes known as the selection operator. Thus, in
reproduction operation the process of natural selection cause those individuals that encode To
sustain the generation of a new successful structures to produce copies more frequently.
Population, the reproduction of the individuals in the current population is necessary. For better
individuals, these should be from the fittest individuals of the previous population. There exist a
number of reproduction operators in GA literature, but the essential idea in all of them is that the
obese average strings are picked from the current population and their multiple copies are
6.2.4 Crossover
A crossover operator is used to recombine two strings to get a better string. In crossover
combining material from two individuals of the previous generation. In reproduction, good
strings in a population are probabilistic-ally assigned a larger number of copies and a mating
pool is formed. It is important to note that no new strings are formed in the reproduction phase.
In the crossover operator, new strings are created by exchanging information among strings of
The two strings participating in the crossover operation are known as parent strings and the
resulting strings are known as children strings. It is intuitive from this construction that good
sub-strings from parent strings can be combined to form a better child string, if an appropriate
site is chosen. With a random site. the children strings produced may or may not have a
combination of good sub-strings from parent strings, depending on whether or not the crossing
site falls in the appropriate place. But this is not a matter of serious concern. because if good
strings are created by crossover. there st ill be more copies of them in the next mating pool
generated by crossover. It is clear from this discussion that the effect of cross over may he
detrimental or beneficial. Thus. in order to preserve some of the good strings that are already
present in the mating pool, all strings in the mating pool are not used in crossover. A-hen a
crossover probability. Defined here as Pc is used. only 100(1-Pc) per cent strings in -Pc) per cent
of the population the population are used in the crossover operation and 100(1 re • ins as they are
in the current population. A crossover operator is mainly responsible for the search of new string
even though mutation operator is also used for this purpose sparingly.
Fig:1
6.2.5 Mutation:
Mutation adds new information in a random way to the genetic search process and ultimately
helps to avoid getting trapped at local optima. It is an operator that introduces diversity in the
population whenever the population tends to become homogeneous due to repeated use of
reproduction and crossover operators. Mutation may cause the chromosomes of individuals to be
Mutation in a way is the process of randomly disturbing genetic information. They operate at the
hit level; when the bits are being copied from the current string to the new string, them is
probability that each bit may become mutated. This probability is usually a quite small value.
called as mutation probability Pm. A coin toss mechanism is employed; if random number
between zero and one is less than the mutation probability, then the bit is inverted, so that zero
becomes one and one becomes zero. This helps in introducing a bit of diversity to the population
by scattering the occasional points. This random scattering would result in a better optima, or
even modify a part of genetic code that will be beneficial in later operations. On the other hand,
it might produce a weak individual that will never be selected for further operations.
The need for mutation is to create a point in the neighbourhood of the current 1,4tint, theft+)
achieving a local search around the current solution. The mutation is also used to maintain
diversity in the population. For example, the Wowing population having four eight hit strings
may be considered:
A Swarm is a configuration of tens of thousands of individuals that have chosen their own will to converge on a
common goal.
Swarm of bees
Two fundamental concepts that are necessary to obtain swarm intelligent behavior: -
• Self-organization : can be defined as a set of dynamic individuals that work in ally to achieve a common
goal under a set of rules. The rules ensure that the interactions are executed on the basis of purely local
information without any relation to the global pattern.
• Division of Labor: In swarm behavior various tasks are performed simultaneously by specialized individuals
which is specified to as the division of labor. It enables swarm to respond to a changed condition in the
search space specified for them.
Artificial Bee Colony –
ABC has been developed based on the behaviors of real bees on finding nectar and sharing the
information of food sources to the bees in the hive.
Three essential components of this process:
• Food Sources: The value of a food source depends upon its proximity to the hive, its concentration of its
energy and the ease of extracting this energy.
Employed Foragers: They are associated with a particular food source which they are currently exploiting
or are “employed” at. They carry with them information about this particular source, its distance and
direction from the nest, the profitability of the source and share this information with a certain probability
• Unemployed Foragers: They are continually at look out for a food source to exploit. There are two types of
unemployed foragers: scouts, searching the environment surrounding the nest for new food sources and
onlookers waiting in the nest and establishing a food source through the information shared by employed
foragers.
The Scout:
It is responsible for finding new food, the new nectar, sources.
Procedures of ABC:
1. Initialize the scouts (Move the scouts).
3. Move the scouts only if the counters of the employed bees hit the limit.
Explanation
Each cycle of search consists of three steps:
1. Moving the employed and onlooker bees onto the food sources
2. Calculating their nectar amounts
3. Determining the scout bees and directing them onto possible food sources.
Using a probability-based selection process, onlookers are placed on the food sources.
With the increase of food source in nectar, the probability value with which the food source is preferred by
onlookers increases, too.
The scouts are characterized by low search costs and a low average in food source quality. One bee is
selected as the scout bee.
If a solution representing a food source is not improved by a predetermined number of trials, then that
food source is abandoned and the employed bee is converted to a scout.
2. Limit
5. number of scouts: 1
CHAPTER 5:Genetically mutated Bee Colony
(or a distribution over categories) for a given input (image). 3-dimensional arrays of
integers ranging from 0 to 255, is what we call Images. This is in the Width x Height x
3 format. The 3 in the previous statement represents the three-color channels RGB.
4.1.3 Challenges
The detection of visual notion (e.g. cat) is comparatively minor for a
human to complete, but what is worth considering are the difficulties faced
showcased the (an in-exhaustive) list of obstacles below and we know that
10
varied by different methods with respect to the camera.
Scale Modification. Visual images often show change in their size
(size in the real world.)
Occlusion. The concerned objects may not have been blocked.
Occasionally, only a small chunk of an object may be in the vicinity.
Illumination conditions. The fallout of brightness are radical on the
picture element level.
Background clutter. The concerned objects may brew into their
surroundings, making their identification difficult.
Intra-class variation. The concerned classes can often be relatively
expansive, (ex., a chair). There are many disparate objects, each with
their own actualization, but essentially the same.
A sound image classification model must be free from the cross product of all these
into distinct brackets. Therefore, instead of trying to explain and code what each one of
the branches of interest look like directly, we provide the system with diverse and huge
11
data of each type and then develop self- learning algorithms that look at these examples
and learn about the visual appearance of each category. Since this method involves
Here, we would be utilizing the multi-featured training data set xi to predict the target
output yi.
prediction process to output yi when given xi. For example, a general representation is a
, [4.1]
12
A linear composition of weighted input attributes. Different interpretations
can be made from the predicted values which are also dependent on the
parameters from the given training dataset. For this, we define an objective function,
An important aspect about objective functions is that they comprise of two parts:
the model after employed on the training data. A commonly used training
[4.3]
Logistic loss for logistic regression is also a form of a loss represented below:
[4.4]
13
Overfitting is minimized by controlling the complexity of the model
Figure 4.3 s
The answer is marked in red. The need of the hours is that we want
A perceptron takes several binary inputs and produces single binary output.
14
Figure 4.4: Perceptron Input and Output
From the above figure, we can see that here x1, x2, x3 are binary inputs
Output= [4.5]
From the above equation, we can see that the output is dependent on the
in order to overcome this drawback, two notational variations are made to simplify it.
The first one is written in dot product form where w and x are vectors. These
vectors have components which is basically the weights and inputs, respectively.
15
The next variation is to eliminate the threshold by moving it to the other side of
after making these two changes, the equations modify into the following format:
Output= [4.6]
Assigning of weights and thus arriving at the mathematical model of a neuron is done as:
16
Figure 4.6: Network of P
perceptrons - this is to make three decisions. This is done by weighing the input
fundamental and theoretical level than perceptrons present in the first layer. Similarly,
The steps involved here are: neurons performing dot product with the input itself
and the weights assigned, then addition of the bias takes place. Lastly application of the
the non-linearity (or activation function), in this case, the sigmoid is done:
17
Figure 4.7: Sigmoid Function
4.2.3.2.1 SIGMOID
Sigmoids saturate and kill gradients. The gradient at few regions is approximately
0 and this is a very unwanted property of the sigmoid neuron. This is due to the reason
at the time of backpropagation. Hence, if the value of the local gradient is negligible, it
18
will diminish the gradient and hence no sign and will pass through the neuron to
its weights and vice versa to the data. Moreover, cautious approach should be
Sigmoid outputs are not zero-centered. This issue is less severe and has relatively
easy going consequences compared to the saturated activation problem. The neurons in
subsequent layers during processing in a Neural Network receive non zero-centered data.
This leads to introduction of detrimental zig-zag dynamics in the gradient updates for the
weights. However, upon addition of these gradients up across the batch of data the final
4.2.3.2.1.1 RELU
(-) Contrastingly, ReLU units can be delicate while training and can effectively
19
4.2.4 Convolutional Neural Networks (CNNs)
Convolutional Neural Networks are made up of neurons that have learnable
weights and biases. Receiving of some inputs and performing a dot product with
weights, is one of the initial task done by neurons in CNN. Mostly it follows a non-
linearity. Overall whole network results a single differentiable score function: where
the raw image pixels are on one end and class scores at the other end. However CNN
are somehow different from the normal neural network. One assumption has been
considered that the input are most of the times are images, which leads to some
parameters in the network and the increased efficiency of the leading function.
Regular neural nets consist of the hidden layers of neurons, which are
fully connected with the neurons in the previous layer. Different neurons in a
particular layer are completely independent and the weights are unshared. So
admissible but in the cases like if picture is of large size like 500*200*3. The
20
Figure 4.8: Representation of Perceptron including hidden layer
activation of 3-D neurons in each layer. In the above example the input
image in the starting is fed into the 3-D CNN, which takes its dimensions as
height, width and depth would be 3 (Green, Red and Blue Channels).
21
4.2.4.2 LAYERS USED TO BUILD CONVOLUTIONAL NEURAL
Pooling Layer
Fully-Connected Layer
Overall transformation of the original image to the final score includes many hidden
Networks are calculated using gradient descent to keep the output class
scores efficient with the labels in the training set for each image.
which have some parameters they consists of a set of adjustable or learnable filters.
In Convolutional neural network each filter slides across the input volume and maps
Convolution Layer consists of some filters, ehich maps them to 2D matrix, each filter
will produce different 2D matrix, all of these are stacked together and results the
output volume. Convolutional neural network as it resembles with the human brain
neural network, the neurons have different connectivity and arrangement. So now we
22
4.2.4.2.1.1 LOCAL CONNECTIVITY.
to connect neurons to its local region of input volume unlike like normal
The connectivity of a neuron with its input volume is called the receptive field
of the neuron. The depth of the Convolutional Layer can be compared to the actual
depth of the input volume. There is symmetry in the spatial dimensions (height and
23
Figure 4.11: Representation of Perceptron body
with the input layer or any other layer. But how many neurons are in the
hidden layer and the arrangement of them is not defined till now, arrangement
defined as the depth of the layer. Each neuron is assigned some particular
Stride: A filter in a layer after mapping of particular region moves to the next
region after leaving some pixels of the input raw image to do mapping again. This
movement of filter in a layer and the no. of pixels left at each step is known as stride.
24
Zero-padding: Sometimes we want to keep the dimensions of output raw
image is equal to the input raw image, to keep the maximum details of input after
graphical example:
25
4.2.4.2.2 POOLING LAYER
pooling constituted historically but bears of late accrued away of favour equated
26
Figure 4.14: Illustration of Pooling
4.2.4.2.4 BACKPROPAGATION
rendition as simply rootling the slope to the stimulant that delivered the most
of a pooling bed it's basal to go along the track of the indicator of the max
A lot people disfavour the pooling cognitive process and believe that we can break
loose without them. A few advices to cast aside the pooling bed in favour of architecture
that alone comprises of duplicated Convolutional layers. To abridge the sizing by the
27
layer from time to time. Disposing pooling layers causes variations between
carrying out suppression connives abided by in the biologic brain. All the same, these
beds get passed off of favor since in practice their share consumes are minimal.
Neurons in an amply colligated layer bear entire connectives to each and every
It's valuable to mention that the only deviation between Fully Connected and
Convolutional layers constitutes that the neurons in the Convolutional bed are attached
only to a local domain in the input. Overall the configuration of both Fully Connected and
Convolutional layer i.e. taking the dot product of input with the content of the filters is
same. Consequently, it's imaginable to change over betwixt FC and CONV beds.
28
4.2.5 MODELLING OF XGBOOST
4.2.5.1 TREE ENSEMBLE
Classification and regression trees (CART) comprises of the tree ensemble model
The anticipation grades by apiece case-by-case tree is added together to acquire the
concluding score. A crucial fact is that the 2 trees adjudicate to complement to each other.
[4.8]
Obj( )= [4.9]
object function, and optimise it. We could bear the accompanying objective
Obj= [4.9]
This is a great deal more arduous than conventional optimisation trouble wherever we
can take the slope and exit. It is not easygoing to coach altogether the trees at once.
29
and append one fresh tree at once. We compose the forecasting measure
=0
= +
= + [4.10]
It continues to call for, which tree's act do we desire for each one step. An innate
matter constitutes to add together the match that optimises our objective.
= +
+ [4.11]
In the above anatomy it has a 1st decree condition (generally addressed as the
residuary) and a quadratic equation condition. For some other deprivations of concern (as
30
case, logistical departure), it's not so facile to acquire such a courteous
= )+ [4.11]
After we abolish all the constants, the specific objective at step t takes the form:
optimisation destination for the fresh tree. A significant vantage of this definition
constitutes that it only depends upon gi and hi. This is how xgboost can affirm
31
[4.13] Here w is the transmitter
complexness as
a higher degree one way to delineate the complexness, only this particular one and
only does work advantageously in practice. The regularisation embodies one portion
most tree software packages address to a lesser extent cautiously, or just dismiss.
***
32
CHAPTER 5:
Before putting the algorithms to work for our humongous data, we need various
pre-processing techniques to feed data properly to the networks for maximum efficiency.
5.1 Preprocessing
Working with these files can be a challenge, especially given their
on the dataset. A comprehensive overview of useful steps to take before the data hits
33
Before we start, we need to import some packages to determine the available patients.
contain a lot of metadata (such as the pixel size, so how long one pixel is
This pixel size/coarseness of the scan differs from scan to scan (e.g. the
distance between slices may differ), which can hurt performance of CNN approaches.
We write a code to load a scan, which consists of multiple slices, which we simply
save in a Python list. Every folder in the dataset is one scan (so one patient). One
metadata field is missing, the pixel size in the Z direction, which is the slice
thickness. Fortunately, we can infer this, and we add this to the metadata.
measure of radio density. CT scanners are carefully calibrated to accurately measure this.
34
Figure 5.2: Hounsfield Unit of different matter
By default, however, the returned values are not in this unit. Let's fix this.
Some scanners have cylindrical scanning bounds, but the output image is
square. The pixels that fall outside of these bounds get the fixed value -2000. The
first step is setting these values to 0, which currently corresponds to air. Next,
let's go back to HU units, by multiplying with the rescale slope and adding the
35
Figure 5.3: Frequency vs. Hounsfield Unit
Looking at the table from Wikipedia and this histogram, we can clearly see
which pixels are air and which are tissue. These images are used for lung
segmentation. Now, we can begin to iterate through the patients and gather their
respective data. We're certainly going to need to do some preprocessing of this data.
36
We iterate through each patient, we grab their label, we get the full path to that
specific patient (inside THAT path contains ~200 scans which we also iterate over, but
Do note here that the actual scan, when loaded by dicom, is clearly not JUST some
sort of array of values, instead it has got attributes. There are a few attributes here of
arrays, but not all of them. We're sorting by the actual image position in the scan. Later,
we could actually put these together to get a full 3D rendering of the scan.
One immediate thing to note here is those rows and columns 512 x 512.
data. Being 512 x 512, we already expect all this data to be the same size,
We just went ahead and grabbed the pixel array attribute, which is what we
assumed to be the scan slice itself (we will confirm this soon), but immediately we
are surprised by this non-uniformity of slices. This isn't quite ideal and will cause
a problem later. All of our images are the same size, but the slices aren t. In terms
We've got to actually figure out a way to solve that uniformity problem, but also
these images are just way too big for a convolutional neural network to handle without
37
some serious computing power. Thus, we already know out of the gate that we're going to
need to down-sample this data quite a bit, and somehow make the depth uniform.
5.1.2 Resampling
A scan may have a pixel spacing of [2.5, 0.5, 0.5], which means that the distance
between slices is 2.5 millimeters. For a different scan this may be [1.5, 0.725, 0.725], this
can be problematic for automatic analysis (e.g. using Convolutional Neural Networks)
Whilst this may seem like a very simple step, it has quite some edge
38
Figure 5.5: Z-Axis Sweep
39
Figure 5.6: X-Axis Sweep
40
Figure 5.7: Y-Axis Sweep
It is easy to see the 3 different perspectives. Another neat thing is that once
we got the scans we managed to visualize them using very basic python open source
tools - basically numpy and matplotlib. No need for fancy medical imaging tools.
lung cancer, we can try to see if we can detect pulmonary nodules using something
like edge detection. This can be done using a sobel filter (aka hand crafted one-filter
CNN). Let's also take a look at the distribution of the pixels values in an image first:
42
Figure 5.9 Edge Detection 1
The Sobel filter does find the edges but the image is very low intensity. One
thing we can do is to simply threshold the image to see the segmentation better:
43
Figure 5.11 Edge Detection 3
Interesting results, however the issue here is that the filter will also detect the
blood vessels in the lung. So, some sort of 3-D surface detection that differentiates
between spheres and tubes would be more suitable for this situation.
44
5.1.4 3D plotting the scan
For visualization it is useful to be able to show a 3D image of the scan.
Unfortunately, the packages available in the Docker image is very limited in this sense, so
we will use marching cubes to create an approximate mesh for our 3D object, and plot
this with matplotlib. Quite slow and ugly, but this was the best possible way.
Our plot function takes a threshold argument which we can use to plot certain
structures, such as all tissue or only the bones. 400 is a good threshold for
showing the bones only (from the Hounsfield unit table above).
45
5.1.5 Lung segmentation using water sheds
Most suggested Lung Segmentation methods mainly involve thresholding the lung
tissue based on its Hounsfield value and using morphological dilation to include nodules
in border regions. These methods have the severe drawback of also including lots of
tissue that is neither lung, nor a region of interest. We coded an algorithm based on the
one presented in R Shojaii et al [8] with some modifications, that we present here.
The resulting CT Images are in HU and have the same (not necessarily
markers. An internal marker, that is definitely lung tissue and an external marker, that is
definitely outside of our ROI. We're starting by creating the internal marker by
thresholding the Image and removing all regions but the biggest one. The external marker
is created by morphological dilation of the internal marker with 2 different iterations and
46
subtracting the results. A watershed marker is created superimposing the 2 markers with
47
Figure 5.16: Watershed Marked Slice
precise border of the Lung located in the Black strip of the Watershed
marker shown above. In order to do the algorithm, we also need the Sobel-
In order to not miss nodules located next to the border regions a Black Top
Hat Operation is performed to re-include those areas and areas surrounding the lung.
This is the main advantage of this method here over the methods. Only areas that
need re-inclusion get dilated, everywhere else the lung border stays precise.
48
Figure 5.17: Sobel Gradient
49
Figure 5.19: Outline After Re-Inclusion
50
Figure 5.21: Segmented Lung
The resulting images of this code are still in the original dimensions of the
CT Scan and in Hounsfield Units with the filtered areas being assigned -2000.
regions. The main downside is the much longer processing time per patient.
(and usually some tissue around it). It involves quite a few smart steps. It
51
The steps:
For every axial slice in the scan, determine the largest solid connected
component, and set others to 0. This fills the structures in the lungs in the mask.
Keep only the largest air pocket (the human body has other pockets of air).
52
But there's one thing we can fix, it is probably a good idea to include structures
within the lung (as the nodules are solid), we do not only want air in the lungs.
53
Figure 5.24:3D Lung-3 (Difference)
mask in all directions. The air + structures in the lung alone will not contain
all nodules, in particular it will miss those that are stuck to the side of the
54
This segmentation may fail for some edge cases. It relies on the fact that the
air outside the patient is not connected to the air in the lungs. If the patient has a
tracheostomy, this will not be the case, we do not know whether this is present in the
dataset. Also, particularly noisy images (for instance due to a pacemaker in the image
below) this method may also fail. Instead, the second largest air pocket in the body
will be segmented. We can recognize this by checking the fraction of image that the
mask corresponds to, which will be very small for this case. We can then first apply a
morphological closing operation with a kernel a few mm in size to close these holes,
after which it should work (or more simply, we do not use the mask for this image).
5.1.7 Normalization
Our values currently range from -1024 to around 2000. Anything
above 400 is not interesting to us, as these are simply bones with different
that our mean value is 0. To do this we simply subtract the mean pixel value
from all pixels. We found this to be around 0.25 in the LUNA16 dataset.
55
5.2 Running the algorithms
With these steps our images are ready for consumption by our Neural Network,
CNN, XGBoost and/or other ML methods. We can do all these steps offline (one time
and save the result), and we let it run overnight as it took a long time.
Now, the data we have is actually 3D data, not 2D data that's covered in
Now we're set to train the network. When running locally, we make sure our
training data is NOT the sample images, it should be the stage1 images. Our
The additional convolutional layers the dearer (moderately, because for each
one convolutional layer abbreviates the count of stimulant characteristics to the fully
colligated layers). Although afterward around 2 or 3 layers the truth amplification gets
quite belittled and so we call for to accept a tradeoff between generalization truth and
training time. That aforesaid, whole image identification chores are dissimilar and so
56
5.2.1.2 NODES PER HIDDEN LAYER COUNT = 64:
unlike for for each one task that demands to be executed. To bring about a rough
maneuver we broadly keep the count of nodes 2/3 the size of the former layer, holding
the 1st layer 2/3 the size of the concluding feature mappings. This nevertheless is
barely a approximative guide and reckons chiefly upon the dataset. Another ordinarily
Forthwith, we essentially assign the pace with which we glide the filter.
Whilst the pace is 1 then we motion the filters unitary pixel at once. Whilst the
The count by concealed layers called for hinges upon the intrinsical complexness of
our dataset, this could be empathised by considering what apiece layer accomplishes:
This represents as poor for just about all image identification chores.
57
Unitary concealed layer allows for the network to framework an
layers is just about never beneficial only advantageous for peculiarly complex
5.2.2.2 MIN_CHILD_WEIGHT = 9:
instead of an integer.
58
This parameter controls over-fitting. The deeper the tree, the better
The learning rate resolves the impact of each tree on the final result.
This is the ratio of columns (sub-sample) used while constructing the tree.
5.2.2.7 NTHREAD = 8
59
5.2.2.8 SUBSAMPLE = 0.80:
performance on the test dataset does not show any improvisation even
60
CHAPTER 6:
cross entropy between true labels and predictions distribution pattern. It is the extra
unpredictability when one assumes a different distribution than the true distribution,
added to the entropy of the true distribution. We thus maximized the accuracy of our
[6.1]
Figure 6.1: Log Loss v/s probability for a single positive instance
61
6.1.1 Neural Network
25 Iterations were run for the Preprocessed data (muchdata-50-50-
62
Figure 6.3: Results with each Iteration
63
Figure 6.4: System Usage Stats and iterations result
Thus, we see a similar curve to the Neural Network, but lesser Log-Loss,
64
6.1.3 XGBoost
Using a grid search for optimum value of parameters, and printing values after
65
We see a high-variance curve with lesser Log-Loss on average. We achieve a
6.1.4 Comparison
Comparing the errors for the three algorithms, we have:
This shows the Log-loss error comparison for the first 25 iterations
of the said algorithms. We thus have the mean Log-loss errors as:
66
Table 6.1: Log loss Error of Neural Network, CNN and XGBOOST
3. XGBoost 0.57
67
CHAPTER 7:
detection from low dose CT scan data, which can help Radiologists reduce lung
cancer mortality rate. Till date, CNN was considered to be de-facto for image
bettering the output of CNN by a Log-Loss margin of 0.06 for the aforementioned
data. Thus, further scope of improvement in these algorithms will provide ways to
The clinical meaning and learned feature-values from the algorithms is a direction
we plan to seek further research for. Furthermore, other lung-diseases can be recognized,
68
[1] K. Murphy, B. van Ginneken, A. M. R. Schilham, B. J. de Hoop, H. A. Gietema, and
M
CT using local image features and k-nearest-
Analysis, vol. 13, pp. 757 770, 2009.
approach with error detection", Medical Physics, vol. 4236 no. 10, pp. 2934-2947, 2009.
384, 2014
[6] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang,
Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang (2015): MXNet: A
Flexible and Efficient Machine Learning Library for Heterogeneous
Distributed Systems, arXiv:1512.01274 [cs.DC]
69
[7] Mingchen Gao, Ulas Bagci, Le Lu, Aaron Wu, Mario Buty, Hoo-Chang Shin,
Holger Roth, Georgios Z. Papadakis, Adrien Depeursinge, Ronald M.
Summers, Ziyue Xu & Daniel J. Mollura (2016): Holistic classification of CT
attenuation patterns for interstitial lung diseases via deep convolutional
neural networks, Computer Methods in Biomechanics and Biomedical
Engineering: Imaging & Visualization, DOI: 10.1080/21681163.2015.1124249
[8] Arnaud Arindra Adiyoso Setio, Alberto Traverso, Thomas de Bel, Moira S.N.
Berens, Cas van den Bogaard, Piergiorgio Cerello, Hao Chen, Qi Dou, Maria Evelina
Fantacci, Bram Geurts, Robbert van der Gugten, Pheng Ann Heng, Bart Jansen,
Michael M.J. de Kaste, Valentin Kotov, Jack Yu-Hung Lin, Jeroen T.M.C. Manders,
Alexander Sónora-Mengana, Juan Carlos García-Naranjo, Mathias Prokop, Marco
Saletta, Cornelia M Schaefer-Prokop, Ernst T. Scholten, Luuk Scholten, Miranda M.
Snoeren, Ernesto Lopez Torres, Jef Vandemeulebroucke, Nicole Walasek, Guido C.A.
Zuidhof, Bram van Ginneken, Colin Jacobs (2017): Validation, comparison, and
combination of algorithms for automatic detection of pulmonary nodules in computed
tomography images: the LUNA16 challenge, arXiv:1612.08012
[9]
Mitosis detection in breast cancer histology images with deep neural
networks. In International Conference on Medical Image Computing and
Computer-assisted Intervention (pp. 411-418). Springer Berlin Heidelberg.
[10] Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E. and Nielsen, M., 2013,
September. Deep feature learning for knee cartilage segmentation using a triplanar
70
[11] Sharif Razavian, A., Azizpour, H., Sullivan, J. and Carlsson, S., 2014. CNN features off-
[12] Christodoulidis, S., Anthimopoulos, M., Ebner, L., Christe, A. and Mougiakakou,
S., 2017. Multisource Transfer Learning With Convolutional Neural Networks for Lung
Pattern Analysis. IEEE journal of biomedical and health informatics, 21(1), pp.76-84.
[13] Gao, M., Xu, Z., Lu, L., Wu, A., Nogues, I., Summers, R.M. and Mollura,
D.J., 2016, April. Segmentation label propagation using deep convolutional
neural networks and dense conditional random field. In Biomedical Imaging
(ISBI), 2016 IEEE 13th International Symposium on (pp. 1265-1268). IEEE.
[14] Pan, Y., Huang, W., Lin, Z., Zhu, W., Zhou, J., Wong, J. and Ding, Z., 2015,
August. Brain tumor grading based on neural networks and convolutional
neural networks. In Engineering in Medicine and Biology Society (EMBC), 2015
37th Annual International Conference of the IEEE (pp. 699-702). IEEE.
[15] Ciompi, F., de Hoop, B., van Riel, S.J., Chung, K., Scholten, E.T.,
Oudkerk, M., de Jong, P.A., Prokop, M. and van Ginneken, B., 2015.
Automatic classification of pulmonary peri-fissural nodules in computed
tomography using an ensemble of 2D views and a convolutional neural
network out-of-the-box. Medical image analysis, 26(1), pp.195-202.
[16] Anthimopoulos, M., Christodoulidis, S., Christe, A. and Mougiakakou, S., 2014,
August. Classification of interstitial lung disease patterns using local DCT features
and random forest. In Engineering in Medicine and Biology Society (EMBC), 2014 36th
Annual International Conference of the IEEE (pp. 6040-6043). IEEE.
71
[17] Li, Q., Cai, W., Wang, X., Zhou, Y., Feng, D.D. and Chen, M., 2014,
December. Medical image classification with convolutional neural network.
In Control Automation Robotics & Vision (ICARCV), 2014 13th International
Conference on (pp. 844-848). IEEE.
[18] Nogues, I., Yao, J., Mollura, D. and Summers, R.M., Deep Convolutional
Neural Networks for Computer-Aided Detection: CNN Architectures,
Dataset Characteristics and Transfer Learning.
[19] Samala, R.K., Chan, H.P., Hadjiiski, L., Helvie, M.A., Wei, J. and Cha, K., 2016.
[20] Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E. and Greenspan, H., 2015,
[21] Fotin, S.V., Yin, Y., Haldankar, H., Hoffmeister, J.W. and Periaswamy, S., 2016,
March. Detection of soft tissue densities from digital breast tomosynthesis:
Comparison of conventional and deep learning approaches. In SPIE Medical
Imaging (pp. 97850X-97850X). International Society for Optics and Photonics.
[22] Kooi, T., Gubern-Merida, A., Mordang, J.J., Mann, R., Pijnappel, R.,
Schuur, K., den Heeten, A. and Karssemeijer, N., 2016, June. A comparison
between a deep convolutional neural network and radiologists for
classifying regions of interest in mammography. In International Workshop
on Digital Mammography (pp. 51-56). Springer International Publishing.
72
[23] Cha, K.H., Hadjiiski, L.M., Samala, R.K., Chan, H.P., Cohan, R.H., Caoili, E.M.,
Paramagul, C., Alva, A. and Weizer, A.Z., 2016. Bladder cancer segmentation in CT for
treatment response assessment: Application of deep-learning convolution neural
network A pilot study. Tomography: a journal for imaging research, 2(4), p.421.
[24] Shen, W., Zhou, M., Yang, F., Yang, C. and Tian, J., 2015, June. Multi-
scale convolutional neural networks for lung nodule classification. In
International Conference on Information Processing in Medical Imaging
(pp. 588-599). Springer International Publishing.
[25] Mingchen Gao, Ulas Bagci, Le Lu, Aaron Wu, Mario Buty, Hoo-Chang
Shin, Holger Roth, Georgios Z. Papadakis, Adrien Depeursinge, Ronald M.
Summers, Ziyue Xu & Daniel J. Mollura (2016): Holistic classification of CT
attenuation patterns for interstitial lung diseases via deep convolutional
neural networks, Computer Methods in Biomechanics and Biomedical
Engineering: Imaging & Visualization, DOI: 10.1080/21681163.2015.1124249
73
Students involved in the project team include:
74