Bidirectional RNNs

In the previous session, we had discussed various types of tasks that can be solved using RNNs – some examples are POS tagging, sentiment classification, machine translation, speech recognition, video classification, etc.

Now, there is a fundamental difference between these tasks – in some tasks, the entire sequence is available to the network before it stars making predictions, while in some other tasks the network has to make predictions continuously as new inputs come in.

For example, when you want to assign a sentiment score to a piece of text (say a customer review), the network can see the entire review text before assigning them a score. On the other hand, in a task such as predicting the next word given previous few typed words, the network does not have access to words in the future time steps while predicting the next word.

These two types of tasks are called offline and online sequence processing respectively.

Now, there is a neat trick you can use with offline tasks – since the network has access to the entire sequence before making predictions, why not use this task to make the network ‘look at the future elements in the sequence’ while training, hoping that this will make the network learn better?

This is the idea exploited by what are called bidirectional RNNs.

In the following lecture, professor Raghavan explains the two types of tasks and how bidirectional RNNs boost the performance of offline processing tasks.

Thus, there are two types of sequences:

Online sequence: Here, you don’t have access to the entire sequence before you start processing it. The network has to make predictions as it sees each input coming in.
Offline sequence: The entire sequence is available before you start processing it .

A bidirectional RNN can only be applied to offline sequences.

Ajay Shukla explains how to use bidirectional RNNs in the following lecture.

By using bidirectional RNNs, it is almost certain that you’ll get better results. However, bidirectional RNNs take almost double the time to train since the number of parameters of the network increase. Therefore, you have a tradeoff between training time and performance. The decision to use a bidirectional RNN depends on the computing resources that you have and the performance you are aiming for.

In the nest session, you will learn to implement bidirectional RNNs in python. Next, you’ll study LSTMs and how they help get rid of vanishing gradients problem.

Report an error