Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
PROJECT REPORT
Machine Learning and Deep Neural Networks
Course Code EE-264
Name of Students Arman Ahmed Ansari (EE-
053)
M. Haziq Saleem (EE-027)
Ashar Mujeeb (EE-047)
Class S.E
Section A
2
I. INTRODUCTION
w⋅x≡∑jwjxj
and instead of a fixed threshold value we will use perceptron
specific biases which are the negative of the threshold for that
given perceptron. If w is the weight and x is the input value and The shape of this function is similar to the plotted shape of a
b is the bias then step function but different in that it is smoother.
A. Sigmoid Neurons
Binary output is limiting in nature and to avoid such restriction
we do not employ the perceptron model any longer instead
sigmoid neurons are employed.
IV. ARCHITECTURE OF A NEURAL NETWORK output. There exist certain models in which such feedback loops
In order to represent the working of an artificial neural are allowed, these models are called “recurrent neural
network the following architecture is taken into consideration. networks”. In these models the concept of letting the neuron fire
for a limited time duration before becoming inactive is applied.
That firing can activate other neurons to fire for a limited time.
That again causes the next neurons to fire. Therefore, loops do
not create complex problems in this model, as a neuron's output
can only influence its input at some later time, not instantly.
A- Stochastic Gradient Descent One of the problems that may arise is “vanishing gradient
As efficiency in an algorithm of this sort is paramount next to problem”. In some deep neural networks, as we propagate
functionality people devised something known as the stochastic through the hidden layers in the backward direction the gradient
gradient descent which increases the efficiency. It achieves this becomes smaller and smaller and thus when we reach the earlier
result by computing the gradient for arbitrary values from the layers they cannot learn much quickly because of the gradient
training input set and averaging a small sample to get a real vanishing towards them. In contrast, sometimes the gradient
gradient ΔC. can become so large at the earlier layers that it becomes
These arbitrarily chosen training inputs are considered as a unstable. This problem is termed as the “exploding gradient
mini batch and these aid us in obtaining an actual gradient. problem”.
When we have made use of all these training inputs it is One argument may arise that if the gradient is becoming so
considered as one epoch or cycle and a new epoch is chosen small should that not be a good news as we are apparently
for further refinement. reaching an extremum and we do not have to do large
adjustments of the weights and biases in the network. The
B- Backpropagation answer is that we initialized the network with random weights
Backpropagation algorithm is an essential part for the and biases, certainly we cannot get that accurate results so
working of our neural network ; it is crucial because it aids quickly as it appears to be when the gradient is vanishing. So
in our initial calculation of the gradient of the cost function actually our network is not performing well, it is just that the
which is then further used in the Gradient Descent algorithm earlier layers are not able to learn when the gradient is so much
. The core concept of the algorithm is that the expression for small. It seems to them that they are already accurate and do not
partial derivatives of Cost function with respect to any given need any changes so they remain unlearned. These problems
weight or bias ; this tells us the rate of change of the cost due need to be taken care of if we want to train a deep network.
to any given variance in the weights and biases .[5] The main problem with vanishing or exploding gradient is
that the gradient in the early layers is the multiplication of
VI. DEEP NEURAL NETWORK elements emerging from the later layers. When there are several
A simple neural network is that which consists of a single later layers, it creates an unstable condition. The only solution
hidden layer between the input and output layers. However, to balance the learning speeds of all layers is to somehow
manage those multiplication results in such a way that they
a neural network in which multiple hidden layers are present
balance out. Some other evidences also show that the sigmoid
between input and output layers is called a “deep neural
activation function also creates a problem which saturates the
network” and the techniques used to train these networks are
final hidden layer near to zero in early training. So some other
termed as “deep learning”.
activation function is also advised which does not result in such
saturation problem.
[2] https://en.wikipedia.org/wiki/Perceptron
(Accessed: 16 Feb. 2018)
[3] https://en.wikipedia.org/wiki/Expert_system
(Accessed: 17 Feb.2018)