Feedforward Algorithm

Having understood how information flows in the network for a regression problem, let’s write the pseudocode for a feedforward pass through the network for a single data point xi.

The pseudocode for a feedforward pass is given below:

We initialise the variable h0 as the input: h0=xi
We loop through each of the layers computing the corresponding output for each layer, i.e., hl.
For l in [1,2,……,L]: hl=σ(Wl.hl−1+bl)
We compute the prediction p by applying an activation function to the output from the previous layer, i.e., we apply a function to hL, as shown below. p=f(hL)

There are some important things to notice here. In both the regression and classification problems, the same algorithm is used till the last step. In the final step, in the classification problem, p defines the probability vector, which gives the probability of the data point belonging to a particular class among different possible classes or categories. In the regression problem, p represents the predicted output obtained, which we will normally refer to as hL.

Let’s discuss the classification problem. We use the softmax output, which we had defined in an earlier session, which gives us the probability vector pi of an input belonging to one of the multiple output classes (c):

⎡⎢⎣pi1.pic⎤⎥⎦

As per our understanding of the softmax function, we know that pij=ewjhL∑ct=1WthL

j = [1,2,……,c] and c = Number of classes.

Note that calculating pij=ewjhL∑ct=1WthLis often called normalising the vector pi.

Hence, the complete feedforward algorithm for the classification problem becomes:

h0=xi
For l in [1,2,….,L]: hl=σ(Wl.hl−1+bl)
pi=eW0.hL
pi=normalise(pi)

The classification feedforward algorithm has been extensively used in industries like finance, healthcare, travel etc. Considering the finance industry, one of the applications of this algorithm is categorising customer applications for credit cards as ‘Good’, ‘Bad’ or ‘Needing further analysis’ by credit card companies. For this, credit card companies consider different factors such as annual salary, any outstanding debts and age. These can be the features in the input vector that is fed into a neural network, which then predicts which category the customer belongs to.

For the regression problem, we can skip the third and fourth steps, i.e., computing the probability and normalising the ‘predicted output vector’ p, because in a regression problem, the output is hL,i.e., the value we obtain from the single output node, and we usually compare the output obtained from the ANN directly with the ground truth. We do not need to perform any further operations on the predicted output to get probabilities in a regression problem.

Note that Wo (the weights of the output layer) can also be written as WL+1.

Comprehension based Questions

Let’s try to implement the same algorithm for a classification problem and answer a few questions. Given below is the representation of an ANN.

We have the last weight matrix W3 as WO. The output layer classifies the input into one of these three labels: 1, 2 or 3. The first neuron outputs the probability for label 1, the second neuron outputs the probability for label 2 and the third neuron outputs the probability for label 3.

Now, answer the questions given below.

The primary goal in machine learning is to get the predicted output to be the same or as close to the ground truth output as possible. We have seen the feedforward algorithm and learnt how to compute each element in an ANN. Now, we want to train the neural network to get the predicted output as close as possible to the actual output. In order to do this, in the next segment, we will discuss the Loss function, which quantifies the difference between the predicted output and the actual output.

Report an error