LSTM Network: Feedforward Equations

Now that you have studied the structure of an LSTM cell, it’ll be easier to follow the LSTM feedforward equations. It would be helpful to recall the feedforward equations of a standard RNN network.

zlt=Wl[al−1t,alt−1]+bl

The LSTM equation will also be written in the same fashion, that is, using concatenated weight matrices and concatenated activations. Let’s now look at the LSTM feedforward equations.

Here is a detailed architecture of an LSTM cell.

In feedforward, first the previous activations ht-1 and the current input xt get concatenated (shown by the dot operator). The concatenated vector goes into each of the three gates. The ‘x’ denote element-wise multiplication while the ‘+’ denote element-wise addition between two vectors/matrices. Note that the output gate has another tanh function though it is not a gate (there are no weights involved in that operation, as shown in the figure).

The feedforward equations of an LSTM are as follows:

ft=sigmoid(Wf[ht−1,xt]+bf)

it=sigmoid(Wi[ht−1,xt]+bi)

c′t=tanh(Wc[ht−1,xt]+bc)

ct=ftct−1+itc′t

ot=sigmoid(Wo[ht−1,xt]+bo)

ht=ottanh(ct)

In the RNN cell, you had exactly one matrix W (concatenation of the feedforward and the recurrent matrix). In case of an LSTM cell, you have four weight matrices: Wf,Wi,Wc,Wo.

Each of these is a concatenation of the feedforward and recurrent weight matrices. Thus, you can write the weights of an LSTM as:

Wf=[WFf|WRf]

Wi=[WFi|WRi]

Wc=[WFc|WRc]

Wo=[WFo|WRo]

This means an LSTM layer has 4x parameters as compared to a normal RNN layer. The increased number of parameters leads to increased computational costs. For the same reason, an LSTM is more likely to overfit the training data than a normal RNN.

Having said that, most real-world sequence problems such as speech recognition, translation, video processing are complex enough to need LSTMs.

In the next section, you’ll look at a couple of other variants of LSTM cell.

Report an error