Flow of Information Between Layers

In the previous session, you learnt about the structure, topology, and hyperparameters of neural networks along with some simplifying assumptions of neural networks. In this segment, you will understand how information flows from one layer to the next one in a neural network.
In artificial neural networks, the output from one layer is used as input to the next layer. Such networks are called feedforward neural networks. This means that there are no loops in the network, i.e., information is always fed forward, never fed backward. Let’s start by understanding the feedforward mechanism between the two layers. For simplicity, in the next video, the professor will use the input and the first layer to demonstrate how information flows between any two layers.

In the video above, you learnt how information flows from one layer to another. In the next video, let’s consider a subset of a network with two layers and hear from Gunnvant as he explains how feedforward propagation is done.

As seen in the video, an image of a subset of the neural network is shown below:

As you learnt in the previous session, the weight matrix between layer 0 (input layer) and layer 1 (the first hidden layer) is denoted by W. The dot product between the matrix W and the input vector xi along with the bias vector b, i.e., W.xi+b, acts as the cumulative input z to layer 1. The activation function is applied to this cumulative input z to compute the output h of layer 1.

Let’s take the above-mentioned example and perform matrix multiplication to get a vectorised method to compute the output of layer 1 from the inputs of layer 0.

Here, the following input is given:

xi=⎡⎢⎣x1x2x3⎤⎥⎦

The dimensions of the input are (3,1).
There are two neurons in the first hidden layer. Hence, the cumulative input z1 will be given as:

z1=[z11z12]

Also, the weight matrix will be of dimension 3×2 and is represented as follows:

W1=[w111w112w113w121w122w123]

NOTE: The notation of a neuron’s weight in a particular layer is represented as:

And, the bias vector can be represented as follows:

b1=[b11b12]

The matrix representation of obtaining z11 is given below.

z11=w11x1+w12x2+w13x3+b1

Here, z11 is obtained by taking a dot product of the input vector and the corresponding weights. The same goes for obtaining the value of z12. Hence, we get:

z12=w21x1+w22x2+w23x3+b2

The two equations can be written as a matrix multiplication as given below:

[z11z12]=[w111w112w113w121w122w123]⎡⎢⎣x1x2x3⎤⎥⎦+[b11b12]=[w111x1+w112x2+w113x3+b11w121x1+w122x2+w323x3+b12]

The next step is to apply the activation function to the z1 vector to obtain the output h1. As mentioned in the video, the activation function is applied to each element of the vector. Thus, the final output h1 of layer 1 is:

h1=[h11h12]=σ(W1.x1+b1)=[σ(w111x1+w112x2+w113x3+b11)σ(w121x1+w122x2+w323x3+b12)]

As Gunnvant mentioned, x is a vector function, i.e., it is applied element-wise to a vector.

This completes the forward propagation of a single data point through one layer of the network.

To summarise, the steps involved in computing the output of the ith neuron in layer l is as follows:

Multiply each row of the weight matrix with the output from the previous layer to obtain the weighted sum of inputs from the previous layer.
Convert the weighted sum into the cumulative input by adding the bias vector.
Apply the activation function σ(x) to the cumulative input to obtain the output vector h.

With this premise, let’s study feedforward in a small neural network in the next segment.

Report an error