Let’s now see how to preprocess the data. the preprocessing steps are almost the same as those we did in the case of POS tagging. The only change to be mindful of is that the target variable is not a sequence anymore, it’s a single token.
Now that the data is in a numeric format and has been divided into X and y, let’s build and train an LSTM model.
The model architecture is ready to be trained. Let’s train it and see the results.
The model accuracy is approximately 69%. But accuracy is not a reliable metric in text generation. To really see how the model is performing, we need to generate the text and see it for ourselves. In the next segments, we’ll generate the C code using the trained LSTM model.
report an error