Sei sulla pagina 1di 16

Suppose you have a roommate, who is the perfect roommate and why is he

the perfect roommate, because he cooks every day and he cooks three types
of foods apple pie, burger and chicken.

But he has a rule for what he cooks. He first looks outside at the weather
and it can be sunny or it can be raining and if it's sunny he cooks apple
pie because he's happy and if it's rainy he cooks a burger.

This scenario can be easily modelled by a very simple neural network that
has an input and an output. So, in this case if the input is a sunny day
then the output is an apple pie or if the input is a rainy day then the
output is a burger.
Let's do some math and for the math we're going to introduce vectors.
We will represent the food and the weather by some vectors.
We have three possible foods, so we represent them with vectors of length
3 and we have two types of weather conditions, so we represent them with
vectors of length 2.

So, if the neural network receives a [1 0] vector corresponding to a


sunny day then it returns a [1 0 0] vector which is an apple pie and if
it receives a [0 1] vector corresponding to a rainy day then it returns
a [0 1 0] vector which is a burger.

This neural network is actually just a very simple matrix and this matrix
works like this:
1. If we multiply the vector [1 0] corresponding to a sunny day with
the food vector, we get a vector [1 0 0] corresponding to an apple
pie.
2. If we multiply the vector [0 1] corresponding to a rainy day with
the food vector, we get a vector [0 1 0] corresponding to a burger.

So, this neural network is just a linear map that sends the sunny day to
an apple pie and the rainy day to a burger
We are not used to seeing our networks as a matrix but as a bunch of
nodes with arrows. In the figure shown below, the matrix on the left is
turned to the arrows on the right. The dark arrows are labelled 1 for the
1s in the matrix and the light arrows are labelled 0.

The above Neural Networks works as follow:


· If we take the vector [1 0] corresponding to a sunny day and pass
it inside the first layer of nodes as the input, then the value in
the node gets multiplied by each of the edges and gets added in the
in the right and becomes the vector [1 0 0]. This vector [1 0 0]
is equivalent to the matrix multiplication which we saw just above
and that's why neural networks are really representing a bunch of
matrix multiplications. So, here we get the vector [1 0 0] as
output, which is a vector corresponding to an apple pie.
· Similarly, if we input the vector corresponding to a rainy day,
i.e., [0 1], we will get the vector [0 1 0] which corresponds to a
burger.
Till now we saw a simple neural network, but now let's go to a slightly
more complicated problem. Let's say we still have the perfect roommate
who still cooks every day but now he doesn't base his cooking on the
weather. Now our perfect roommate has become very organized and very
methodical and he just cooks in sequence.

One day if he cooks an apple pie then the next day he cooks a burger and
then the next day he cooks a chicken and then the next day again apple
pie and then burger and then chicken and so on.

So, we can always tell what he's going to cook based on what he cooked
the day before. For example, if on Monday he cooks an apple pie then on
Tuesday he cooks a burger and Wednesday he cooks a chicken and on Thursday
an apple pie and on Friday a burger then on Saturday a chicken and so on
and so forth.

Now, this is not a normal neural network anymore and it’s called a
Recurrent Neural Network.

In this case there's no input for the weather, so the bottom arrow doesn't
come from anywhere but the output goes back in as input.

So, if we had an apple pie yesterday then this apple-pie comes back as
input and then the output is a burger. And today he will cook a burger
and this burger comes back as input and gives the chicken as the output,
which means tomorrow he will cook chicken and so on.
Let's look at a more complicated RNN. Now our perfect roommate’s cooking
rule is going to be a combination of the two previous rules. He is still
very methodical and he still cooks in sequence apple-pie, burger and
chicken. But his decision of what to cook is going to depend on the
weather too.

· If it's sunny he's going to go outside and enjoy the day and he's
not going to be cooking. So, he's just going to give the same thing
as yesterday, the leftover.

· If it's raining then he stays home and he has nothing to do. So,
he cooks the next dish in the list.

So, if it's sunny we get the same thing as yesterday and if it's raining
we get the next thing in the sequence.
Here's an example, let's say on Monday we made an apple pie and on Tuesday
we check the weather and it's sunny and if it's sunny then our roommate
doesn't cook anything new. So, we get an apple pie.

Note: Doesn’t confuse the sunny weather for Tuesday under Monday, it's
just for diagram purposes.

So, when they check the weather if it's rainy then our roommate stays
home and make something different, a burger the next thing on the list.
And on Thursday if it's rainy then he makes chicken because he stayed
home and cooked the next meal. If Friday is sunny then he goes outside
and doesn't cook anything new. So, Friday we get chicken again and if on
Saturday’s raining he cooks the next dish which is an apple pie. If Sunday
it's sunny then we can have a pie again and so on and so forth.

This Recurring Neural Network looks as shown in the figure given below.
There's an input coming from underneath which is the weather and an output
that is the food that comes back as input. So, check this out:

· If yesterday's food was apple pie and today the weather is raining,

· Then these two things feed into a neural network and the output is
a burger, because in a rainy day our roommate would cook the next
food which is a burger.

Again let's have a look at the vectors. Recall that the apple-pie is
vector [1 0 0], burger is [0 1 0] and chicken is [0 0 1] and there are
just two weathers. Sunny weather is vector [1 0] and rainy weather is
vector [0 1].
Now, we will see that the neural network is not just a simple matrix. As
there are more layers in the neural network, therefore it’s a bunch of
matrices and some maps.

Here are two matrices, one is called the food matrix and the other one
is the weather matrix. We add and then merge them with a non-linear
operation.

We will see all these steps one by one. So, let's start by the food
matrix and this is how the food matrix works:

· Food matrix is a 3 by 6 matrix but we have kind of artificially cut


it into two 3 by 3 matrices and it looks like a concatenation of
them.
· If we multiply the food matrix by the vector representing an apple-
pie then we get a concatenation of two vectors where the top vector
is the same food that came as input, in this case apple-pie, because
the top three rows of the matrix are the identity matrix and the
bottom three rows of the matrix are the food for the next day.

Let's look at another example, if we multiply the food matrix by the


burger vector [0 1 0] then we get a vector which is a concatenation of
the vector correspond to the burger which is the food for the same day
and the vector correspond to the chicken which is the food for the next
day.
And finally, if we multiply the food matrix by the vector corresponding
to a chicken then we get the vector concatenation off the vector
correspond to the chicken which is the food for the same day and the
apple pie which is a food for the next day.

So, what this food matrix does is that it takes the vector for the today's
food and it returns the vector for the today's food and the vector for
the tomorrow's food concatenated.

Now let's have a look at the weather matrix. Weather matrix is also a
concatenation of matrices, where the top matrix has three ones in the
first row and the bottom matrix has three ones in the second row.
· If we multiply the weather matrix by the vector corresponding to a
sunny day then we get three 1’s on the top for the same day and
three 0’s in the bottom for the next day.

· If we multiply the weather matrix by vector corresponding to the


rainy day then we get 0’s on top for the same day and 1’s on the
bottom for the next day.

So, this weather vector is kind of telling us that should we cook today's
food or should we cook tomorrow's food based on the input.

Now, magic happens when we add these two and we should get a clear signal
of what we should cook the next day based on:
· What the food for today,
· What the food for tomorrow, and
· Should we cook the food for today or should we cook the food for
tomorrow.
This will be much clearer with an example. Let’s say yesterday we cooked
an apple pie and today the weather is rainy. So, here is the result of
apple-pie vector multiplied by the food matrix:

We get the food for today and the food for tomorrow.

Given below is the result of multiplying the rainy-day matrix by the


weather matrix:

And if we add the results of above done two matrix multiplications, we


will get this vector:

Notice that the largest entry in the final output vector is a 2 and this
2 hints of a burger, because the vector for a burger is [0 1 0]. So,
we're going to try to extract that 2 and for extracting it we use the
merge map.

One very special thing to notice is that the first part tells us what's
the food for today and what's the food for tomorrow and the second kind
of tells us which one to pick, should we go for same-day or should we go
for next day. And these two decisions together form the decision of what
to cook the next day.
To wrap this up and put these two together into the burger vector we need
the merge map. The merge map uses a nonlinear function that takes the
vector and turns the largest entry into a 1 and all the other entries
into a 0.

NOTE:
If you have experience with the neural networks then you can think
of this non-linear map as one hot encoding or as a combination of
some linear map and a sigmoid.

In next step, the merge map takes this vector form by the top three
entries and bottom three entries and just adds them to get the result,
which is a vector [0 1 0] corresponding to a burger. And that's the answer
because if we take an apple pie and a rainy day then we get a burger.

The matrix equivalent of this merge map is shown below:


If we multiply this merge map equivalent matrix by a concatenation of two
vectors, we get the sum of them.

Instead of matrices, if we want to see this Neural Network with all its
nodes and edges then it will look as shown below:

Note: Only edges labelled as 1s are drawn and the edges labelled as 0 are
not drawn.

This Neural Network works as follows:


1. We pass vectors for the apple-pie and rainy day as the input.
2. In the next step we perform the matrix multiplication. Values of
the input vector are passed through the edges, where they get
multiplied by the weight of the edge. To get the value of a node
in the first hidden layer, we add all the values passed on to the
respective node.

This step is equivalent to multiplying our input vectors with the


food matrix and weather matrix.

3. In this step we add together the results of food matrix


multiplication and weather matrix multiplication, received in the
previous step.
4. In this step, we apply the Non-Linear function that converts the
highest value in the vector into 1 and everything else into 0.

5. Our last matrix takes the vector and adds the top three entries
with the bottom three entries and gives the result which is the
vector corresponding to a burger.

And that's how the neural network looks.


Now, the word recurrent is important here. We should notice that:
· The first three nodes in the input layer correspond to the food
coming in, and
· The three nodes in the output layer correspond to the food coming
out.

What really happens is that the food coming out feeds back into the neural
network as input. And this is how the neural network looks:

The output vector of length three goes back into the neural network as
the input. And that's why it's called Recurrent Neural Networks.

RNN are super useful in many things, in particular they're very useful
when our data is sequential. So, whenever our data looks like it forms a
sequence and whenever the next data point depends a lot on the previous
ones then RNN is the best solution to use.

Potrebbero piacerti anche