IKH

Forward Pass- Demonstration

In the previous segment, you saw how the output of the next layer is calculated, given the inputs from the previous layer. In this segment, you will learn about the flow of data through different layers in a step-by-step fashion using an example in which we intend to calculate the price of a house, given its size and the number of rooms in it. You may want to use pen and paper to do the calculations yourself for better understanding.

We saw how the cumulative input is computed for each node and how an activation function is applied to each input to obtain the output for each node in the first layer. Now that we have the output from the first layer, let’s watch the next video to see the flow of this data through the second layer.

To reiterate, the problem statement is to predict the price of houses, given the size of the houses and the number of rooms available. 

Std. Number of RoomsStd. House Size (sq. ft.)Price ($)
31,340313,000
53,6502,384,000
31,930342,000
32,000420,000
41,940550,000
2880490,000

In this case, we first scale the input and output for these 6 observations using the formula (obs−mean)std.deviation . So, we get the table given below.

Std. Number of RoomsStd. House Size (sq. ft.)Price ($)
-0.32-0.66-0.54
1.611.802.03
-0.32-0.03-0.51
-0.32-0.03-0.41
0.65-0.02-0.25
-1.29-1.15-0.32

As you saw in the video, we want to build a neural network that will predict the price of a house, given two input attributes: number of rooms and house size. Let’s start with the structure of the neural network that we will consider for this case. We have an input layer with two input nodes, x1 and x2, one hidden layer with two nodes, a sigmoid activation function and finally an output layer with a linear activation function (since this is a regression problem), as shown below.

Now, to understand how the data moves forward in the network to enable the neural network to make predictions, we will initialise the weights and biases with random values. We recommend that you keep a pen and paper handy for practising the computations that will be performed further. The intention is that as this network gets trained, the weights and biases will be updated as per the data such that the predicted output will eventually be the same or at least similar to the actual output.

Let’s start by initialising the weights and biases to the following values:

Layer1:W1=[w111w112w121w122]=[0.20.150.50.6]b1=[b11b12]=[0.10.25]Layer2:W2=[w221w222]=[0.30.2]b2=[b21]=[0.4]

Remember, the superscript denotes the layer to which it belongs and the subscript denotes the node in that particular layer. 

To showcase the step-by-step computation of the output, let’s take the first example as the input vector: 
 

X1=[x1x2]=[−0.32−0.66]

Let’s compute the output from the first node in layer 1.

Computing the cumulative input for the first node of the hidden layer:

z11=w111x1+w112x2+b11=0.2∗(−0.32)+0.15∗(−0.66)+0.1=−0.063

Applying the sigmoid activation function to obtain the output from the first node:

h11=σ(−0.063)=11+e−z11=11+e−(−0.063)=0.484

Next, let’s compute the output from the second node in layer 1 by following a similar process.

Computing the cumulative input for the second node of the hidden layer:

z12=w121x1+w122x2+b12=0.5∗(−0.32)+0.6∗(−0.66)+0.25=−0.306

Applying the sigmoid activation function to get the output from the second node, we get:

h11=σ(−0.306)=11+e−z12=11+e−(−0.306)=0.424

Each of these individual operations can be done together using matrix multiplication. 
We have the input vector X1, the weight matrix W1 and the bias vector b1 with the following values:

X1=[x1x2]=[−0.32−0.66]

W1=[w111w112w121w122]=[0.20.150.50.6]

b1=[b11b12]=[0.10.25]

We know that:

h1=[h11h12]=σ(W1.xi+b)

h1=σ([w111x1+w112x2+b11w121x1+w122x2+b12])

h1=σ([0.2∗(−0.32)+0.15∗(−0.66)+0.10.5∗(−0.32)+0.6∗(−0.66)+0.25])=σ([−0.063−0.306])

h1=[0.4840.424]

Now that we have the outputs for the two neurons in the hidden layer, we can calculate the final output.

Moving on to the output layer with the linear activation function, we first compute the cumulative input to the neuron:

z21=w211h11+w212h12+b21=0.3∗0.484+0.2∗0.424+0.4=0.63

Since this is a regression problem, we have considered the activation function as the linear activation function, i.e., the input is sent as the output without any modification. Hence, the output is the same as the cumulative input:

h21=z21=0.63

This value of 0.63 is the prediction that the neural network makes in the first forward pass.

The matrix multiplication method will give us the same output as shown below:

h2=(W2h1+b2)=([w211w212][h11h12])+b2

h2=(W2h1+b2)=([0.30.2][0.4840.424])+0.4

h2=[0.63]

Hence, performing the forward pass through the neural network using the input as [-0.32, -0.66] gives us the output as 0.63. The prediction is very different from the actual value of -0.54, but this is to be expected because we initialised the neural network with random weights and biases. As we train the neural network, we will update these parameters and get better predictions through multiple iterations. In the upcoming session, we will cover this process in depth. 

This was a demonstration of how information flows forward in a neural network from the input to the output, i.e., the forward pass to make a prediction. 

In the next segment, we will introduce a concise algorithm that can be used for any feedforward neural network. 

Report an error