Now that you have studied the structure of an LSTM cell, it’ll be easier to follow the LSTM feedforward equations. It would be helpful to recall the feedforward equations of a standard RNN network.
zlt=Wl[al−1t,alt−1]+bl
The LSTM equation will also be written in the same fashion, that is, using concatenated weight matrices and concatenated activations. Let’s now look at the LSTM feedforward equations.
Here is a detailed architecture of an LSTM cell.
In feedforward, first the previous activations ht-1 and the current input xt get concatenated (shown by the dot operator). The concatenated vector goes into each of the three gates. The ‘x’ denote element-wise multiplication while the ‘+’ denote element-wise addition between two vectors/matrices. Note that the output gate has another tanh function though it is not a gate (there are no weights involved in that operation, as shown in the figure).
The feedforward equations of an LSTM are as follows:
ft=sigmoid(Wf[ht−1,xt]+bf)
it=sigmoid(Wi[ht−1,xt]+bi)
c′t=tanh(Wc[ht−1,xt]+bc)
ct=ftct−1+itc′t
ot=sigmoid(Wo[ht−1,xt]+bo)
ht=ottanh(ct)
In the RNN cell, you had exactly one matrix W (concatenation of the feedforward and the recurrent matrix). In case of an LSTM cell, you have four weight matrices: Wf,Wi,Wc,Wo.
Each of these is a concatenation of the feedforward and recurrent weight matrices. Thus, you can write the weights of an LSTM as:
Wf=[WFf|WRf]
Wi=[WFi|WRi]
Wc=[WFc|WRc]
Wo=[WFo|WRo]
This means an LSTM layer has 4x parameters as compared to a normal RNN layer. The increased number of parameters leads to increased computational costs. For the same reason, an LSTM is more likely to overfit the training data than a normal RNN.
Having said that, most real-world sequence problems such as speech recognition, translation, video processing are complex enough to need LSTMs.
In the next section, you’ll look at a couple of other variants of LSTM cell.
Report an error