In the previous session, we had discussed the problem of vanishing and exploding gradients that RNNs face. Let’s revisit this problems once and then we’ll ready to study how LSTMs solve it .
In the case of RNNs, the main problems is the vanishing gradients problems.
To solve the vanishing gradients problem, many attempts have been made to tweak the vanilla RNNs such that the gradients don’t die when sequences get long. The most popular and successful of these attempts has been the long, short-term memory network, or the LSTMs have proven to be so effective that they have almost replaced vanilla RNNs.
The main drastic improvement that LSTMs have brought is because of a novel change in the structure of a neuron itself. In the case of LSTMs, the neurons are called cells, and an LSTM cell is different from a normal neuron in many ways.
Let’s study LSTM networks in the following lecture.
Thus, one of the fundamental differences between an RNN and an LSTM is that an LSTM has an explicit memory unit which stores information relevant for learning some task. In the standard RNN, the only way the network remembers past information is through updating the hidden states over time, but it does not have an explicit memory to store infrmation.
On the other hand, in LSTMs, the memory units retain pieces of information even when the sequences get really long. In the next section, we’ll look at another important feature of the LSTM cell – gating mechanisms.
Additional Reading
Report an error