Types of RNNs – I

In the previous few segments, you studied the architecture of RNNs. You saw that there is an input sequence fed to the input layer, and an output sequence coming out from the output layer. The interesting part is that you can change the size and types of the input – output layer s for different types of tasks. Let’s discuss some commonly used RNN architectures:

In this architecture, the input is a sequence while the the output is a single element. We have already discussed an example of this type – classifying a sentence as grammatically correct/incorrect . The figure below show the many-to-one architecture:

Note that each element of the input sequence xi is a numeric vector. For words in a sentence, you can use a one – hot encoded representation, use word embeddings etc. You’ll learn these techniques in the next session. Also, note that the output is produced after the last timestep T (after the RNN has seen all the inputs).

Some other examples of many-to-one problems are:

Predicting the sentiment score of a text (between -1 to 1). For e.g., you can train an RNN to assign sentiment scores to customer reviews etc. Note that this can be framed as either a regression problem (where the output is a continuous number) or a classification problem (e.g. when the sentiment is positive/neutral/negative)
Classifying videos into categories. For example, say you want to classify YouTube videos into two categories ‘contains violent content / does not contain violence’. The output can be a single softmax neuron which predicts the probability that a video is violent.

You’ll use this architecture in the third session where you’ll generate C programming code using an RNN. Now, in the following lecture, you’ll learn how to use a many-to-one architecture. So that’s how you use a many-to-one architecture. Now, let’s look at the second architecture.

Many-to-many RNN: Equal input and output length

You’re already familiar with this type of architecture. In this type of RNN, the input (X) and output (Y) both are a sequence of multiple entities spread over timesteps. The following image shows such an architecture.

In this architecture, the network spits out an output at each timestep. There is a one-to-one correspondence between the input and output at each timestep. You can use this architecture for various tasks. In the third session of this module, you’ll look at how this architecture for various tasks. In the third session of this module, you’ll look at how this architecture is used to build a part-of-speech tagger where each word in the input sequence is tagged with its part-of-speech at every timestep.

Many-to-many RNN: RNN: Unequal input and output lengths

In the previous many-to-many example of POS tagging, we had assumed that the lengths of the input and output sequences are equal. However, this is not always the case. There are many problems where the lengths of the input and output sequences are different. For example, consider the task of machine translation – the length of a Hindi sentence can be different from the corresponding English sentence.

Let’s see how such problems can be solved using RNNs.

To summarise, the encoder-decoder architecture is used in tasks where the input and output sequences are of different lengths. The architecture is shown below:

The above architecture comprises of two components – an encoder and a decoder both of which are RNNs themselves. The output of the encoder, called the encoded vector ( and sometimes also the ‘context vector’), captures a representation of the input sequence. The encoded vector is then fed to decoder RNN which produces the output sequence.

You can see that the input and output can now be of different lengths since there is no one-to-one correspondence between them anymore. This architecture gives the RNNs much-needed flexibility for real-world applications such as language translation.

In the next segment, you will learn how the network is trained.

Report an error