You’ve gone through all the preprocessing steps that were required to prepare the date. All the words have an integer representation for now. However, note that while building the model, the word vector of each word will be used instead of the integer. So to summarise, the workflow to feed text to a neural network model is – first change the words to integers and then each integer is used to fetch the word vector that belongs to that word. While preprocessing, we only need to convert the text to integers. The second step, fetching word vectors is automatically done by Keras in the Embedding layer() while the model trains. You will go through the code shortly.
It’s time to create the CNN-RNN model. But before that, let’s see how does a convolution operation takes place over text and how is it different from what you had seen in the case of images.
1D CNN is almost the same as a 2D CNN. The only difference is that the convolution takes place from left to right only, that is, it goes in one direction. It doesn’t make sense to convolve over the text as you would convolve in an image. And that’s why it’s called 1D convolution. It is important to note that the number of rows in the filter in 1D CNN, denoted by ‘d’, is same as the length of the embedding size.
The hyperparameters in case of 1D CNN are:
- Filter size (w): It’s denoted by ‘w’ because you can think of the filter as a moving window convolving in one direction over the text.
- Padding size (p)
- Stride (s)
- Pooling size (m)
- Number of filters (k)
Convolving over the text using a single filter gives a row vector. If you use ‘k’ filters, you’ll get ‘k’ such feature vectors that are stacked on top of each other in a row-wise manner.
Note: For each filter, there will be only 1 bias term. In the above video, biases with 3 different values have been shown for a given one filter. Instead, there should be a bias term with a single value. Ex. it could be either (-11,-11,-11) or (66,66,66) or (12,12,12).
Now that you’ve learnt how does convolution takes place on text, let’s see how to create a CNN-RNN model in Keras.
The architecture is chosen upon experimentation but you can play around with a lot of hyperparameters and create a better model with better metrics.
In the next segment, we will train the model.
Report an error