You now know what sequences are. In this section, you’ll learn how a normal feedforward network is modified to work with sequences.
Let’s quickly recall the feedforward equations of a normal neural network:
zl=Wlhl−1+blhl=fl(zl)
where,
Wl : weight matrix at layer l
bl : bias at layer l
zl : input into layer l
fl : activation function at layer l
hl : output or activations from layer l
In this session, we’ll use a slightly different notation for the activation-rather than using H for the outputs (or activation) of a layer, we will use a. Thus, the feedforward equation between:
zl=Wlal−1+blal=fl(zl)
Let’s now understand what makes this specialised neutral network ‘recurrent’ .
The main difference between normal neutral nets and RNNs is that RNNs have two ‘dimensions’- time T (i.e. along the sequence length) and the depth L (the usual layers). The basic itself changes from al to alt. In fact, in RNNs it is somewhat incomplete to say ‘the output al layer L’; we rather say ‘the output at layer L and time T’ .
One way to think about RNNs is that the network changes its state with time (as it sees new words in a sentence, new frames in a video etc.). For e.g. we say that the state of alt changes to alt+1 as it sees the next element in the sequence (word, image etc.)
Thus, the output of layer L at time T+1, alt+1, depends on two things;
- The output of the previous layer at the same time step al−1t+1 (this is the depth dimension).
- 2-Its own previous state alt (this is the time dimension)
In other words, alt+1 is a function of al−1t+1 and alt:
alt+1=g(al−1t+1,alt)
We say that there is a recurrent relationship between alt+1 and its previous state alt, and hence the name Recurrent Neural Networks.
These notations and ideas will be more clear going forward. Let’s now look at the feedforward equations of an RNN in the following lecture.
To summarise, the output of layer l at time t+1, alt+1, depends on 1) the output of the previous layer at the same time step al−1t+1 and 2) its own previous state alt:
alt+1=g(al−1t+1,alt)
The feedforward equations are simply an extension of the vanilla neural nets – the only difference being that now there is a weight associated with both al−1t+1 and alt:
alt+1=σ(WF.al−1t+1+WR.alt+bl).
The WF’s are called the feedforward weights and the WR’s are called the recurrent weights.
Question 3/3
Mandatory
Question 1Incorrect
Question 2Correct
Question 3Correct
Report an error