Till now, we have covered rule-based models, unigram and bigram models for predicting the IOB labels. In this segment, you’ll learn to build a machine learning model to predict IOB tags of the words.
Just like machine learning classification models, we can have features for sequence labelling task. Features could be the morphology (or shape) of the word such as whether the word is upper/lowercase, POS tags of the words in the neighbourhood, whether the word is present in the gazetteer (i.e. word_is_city, word_is_state), etc.
In this segment, we’ll take up Naive Bayes Classifier to predict labels of the words. We’ll take into account features such as the word itself, POS tag of the word and POS tag of the previous word, whether the word is the first or last word of the sentence, whether the word is in the gazetteer etc. Let’s now go through the Python implementation of the extracting the features:
So, the function npchunk_features() will return the following features for each word in a dictionary:
- POS word of the tag.
- Previous POS tag
- Current word
- Whether the word is in gazetteer as ‘City’.
- Whether the word is in gazetteer as ‘State’.
- Whether the word is in gazetteer as ‘County’.
Let’s look at word features of some sample sentences:
Now, each word contains all the above features. We’ll build a Naive Bayes classifier using these features. It’s recommended that you read the code of the Naive Bayes Classifier. Also, note that we are using the Naive Bayes implementation of the NLTK library rather than sklearn.
So, the Naive Bayes classifier performed better than the rule-based, unigram or bigram chunker models. These results improved marginally when more features are created.
Before moving to the next segment, we recommend that you go through the code of the Naive Bayes classifier in the notebook.
Additional Reading
- You can also use the Naive Bayes classifier to predict POS tags for the word in a sentence. Refer to section 1.6 of the following link: https://www.nltk.org/book/ch06.html
- Read more on why Naive Bayes perform well even when the features are dependent on each other.