IKH

Structure of an LSTM Cell

In the previous segment, we had discussed the three main characteristics of an LSTM cell;

  1. Presence of an explicit ‘memory’ called the cell state
  2. Gating mechanisms
  3. Constant error carousel

In this lecture, you’ll study the LSTM cell in detail.

Before that, let’s look at some standard notations used in the LSTM cell. There are a few changes in the notations that you’ve seen till now. In the following sections, we have used ‘h’ and ‘a’ interchangeably to denote the activations coming out of a layer.

Now, let’s look at the structure of an LSTM cell.

To summarise, an LSTM cell is analogous to a neuron in an RNN – each LETM layer contains multiple LSTM cells. The cell receives the following inputs:

  • The output of the previous time step ht -1 (a vector)
  • The current input xt (a vector)
  • The previous cell state ct -1 (usually a scalar)

Note that we are referring to an LSTM cell in a certain layer ‘I’ and are not denoting the variables explicitly as clt,hlt, etc. but simply ct, ht, etc. for simplicity.

The cell produces two outputs:

  • The current cell state ct (a scalar)
  • The current state output ht (a scalar)

Each cell in the LSTM layer will produce an output ht which will then be combined to form a vector and fed to the next LSTM layer.

Before moving ahead to study the LSTM cell in further detail, it will be useful to recall the common activation functions used in neural networks. Recollect that the sigmoid function outputs a value between 0 and 1 while the tanh outputs a value between -1 and +1. Both these functions are shown below.

The following graph shows the tanh function. Note that the values of the function lie between -1 and +1.

Now, let’s look at how the gating mechanism of the LSTM cell.

Note : Output gate should have tanh gate but by mistake tanh gate is missing in this below video.

You saw the structure of the LSTM cell. You also saw the three gating mechanisms – the forget gate, the update gate and the output gate.

Let’s understand the intuition of each gate with a specific example. Let’s say you’re working on a video tagging problem where you need to tag the action that takes place in each frame of the video. Let’s look at the function of each gate in the context of this problem:

  • Forget gate: This gate controls how much information needs to be discarded from the previous cell state (ct−1) depending on the new input. In the video tagging problem, whenever a new action takes place, this gate needs to decide how much information to retain from the previous frames. If the same action is happening over and over again, then very less information should be discarded. When the action changes, the forget gate ‘forgets’ a lot of information.
  • Update gate: This gate makes an update to the previous the cell state by writing a new piece of information to it. In the video tagging problem, when the action changes, this gate will update the cell state with information relevant to the new action. In case the action is the same as the previous frame, negligible information will be written to the cell state. If the scene changes drastically, the update will be drastic too.

The new cell state ct is the cumulative result of the information discarded from ct−1 by the forget gate and the new information freshly updated to ct−1 by the update gate.

  • Output gate: This gate controls how much information needs to be passed on to the next LSTM layer based on the current cell state.

Report an error