In this segment , you will learn to use word embeddings for classification tasks. Although while you are going through this exercise, keep in mind that you can do almost any other typical ML task using word embeddings-clustering, PCA+ classification etc. that you do with normal numeric features.
Let’s first look at how a normal bag-of -words approach performs on a text classification problem. you will later see whether word embeddings give better results.
In the following lecture, you will use Bernoulli Naive Bayes to make the Reuters dataset. This dataset contains news articles from 8 categories , and the task is to classify of an article. Recall that Bernoulli Naive Bayes uses the presence of a term in a document (i.e. incidence) as the feature, i.e. 1 if present, 0 if not.
Note:
1:59, professor mistakely told train as test and test as train.
you saw how Bernoulli Naive Bayes perform on the dataset. Let’s now use word embeddings to make predictions using Naive Bayes. We will try using two options:
- Use pre-trained Glove vectors
- Train our own vectors using the corpus
Now, let’s make predictions using logistic regression. later, we will also train our own word vectors and make predictions using them.
While working with embeddings, you have two options.
- Training your own word embeddings: This is suitable when you have a sufficiently large dataset (a million words at least) or when the task is from a unique domain (e.g. healthcare). In specific domains, pre-trained word embeddings may not be useful since they might not contain embeddings for certain specific words (such as drug names).
- Using pre-trained embeddings: Generally speaking, you should always start with pre-trained embeddings. They are an important performance benchmark trained on billions of words. You can also use the pre-trained embeddings as the starting point and then use your text data to further train them. This approach is known as ‘transfer learning’, which you will study in the neural networks course.
Note that although we have used a classification example to demonstrate the use of word embeddings, word embeddings themselves are applicable to a wide variety of ML tasks. For example, say you want to cluster a large number of documents (document clustering), you can create a ‘document vector’ for each document and cluster them using (say) k-means clustering.
You can even use word embeddings in more sophisticated applications such as conversational systems (chatbots), fact-based question answering etc.
Word embeddings can also be used in other creative ways. In the optional session (Social Media Opinion Mining – Semantic Processing Case Study), you will see that word vectors have been modified to contain a ‘sentiment score’ (the last element of each vector represents ‘document sentiment’), which are then used to model opinions in tweets.
With that, you have completed the section on distributional semantics. In the next few segments, you will learn another important area of semantic processing – topic modelling.