In this session, you will take a look at another type of semantic processing method known as Topic Modelling, which is used to extract topics from a corpus of text. In the next video, you will get the basic idea of topic modelling.
Suppose various pieces of text are being sent from different sources. These are unstructured and unlabeled. Topic modelling can help determine the major themes or labels of this text.
An ideal example of this is customer review data. For example, reviews for a restaurant can vary according to food items, length, sentiment, etc. The job of a topic modelling algorithm is to skim through each review and figure out the major themes and keywords. These can be later displayed under a ‘Review Highlight’ section as shown in the image given below.
If a customer visits the site, it becomes easier for them to find relevant reviews. Suppose you are on a budget and want to know if a certain dish is affordable. ‘Worth the Money’ would be the ideal label in this case or you are interested in ordering the ‘Fiery Paneer Tikka Wrap’; however, you are not sure if it will be any good. You can easily figure this out by clicking on the corresponding food label.
How does topic modelling work? Before understanding various algorithms, you will learn how a human would infer a topic from a given sentence.
Suppose you are given this sentence:
Croatia fought hard before succumbing to France’s deadly attack, lost the finals 2 goals to 4.
On reading the first part of the sentence ‘Croatia fought hard before succumbing to France’s deadly attack’, one would infer that the context of the sentence is war. This inference is primarily due to the usage of ‘fought hard’ and ‘deadly attack’. However, on reading the next part ‘lost the finals 2 goals to 4’, the context changes
to sports, specifically soccer. This is due to the usage of the words ‘finals’ and ‘goals’.
Although the topic of a sentence that is related to soccer or some sport is not explicitly mentioned in the sentence, we were able to understand that this was the latent topic.
The goal of this exercise was to impart the same thinking in a machine learning algorithm. This domain of computer science is known as ‘information retrieval’ and involves the identification of dominating themes from a sample of text.
In the next video, you will understand the concept of topic modelling intuitively.
Let’s take a look at a specific example to understand the working of topic modelling.

Suppose the topic modelling algorithm has analysed a corpus of text and clustered groups of words based on their context in a document. As you can see, the words have been divided into major themes. Topic 1 is ‘Data Science’, topic 2 is ‘Medicine’ and topic 3 is about ‘Culinary Arts’.
If you are given a sentence such as this one
‘Machine learning can aid and improve early detection of breast cancer.’
Which topics from the table given above does this sentence fall under? The words ‘machine learning’ belong to Topic 1, and the words ‘breast cancer’ belong to Topic 3. This means that the sentence contains both topics.
This segment covered a brief intuition of topic modelling. Now, let’s dive deeper into algorithms.