IKH

Comprehension – Multinomial Distribution in Topic Modelling

You saw that we define topics as a distribution over terms and documents as a distribution over topics. More formally, we are representing topics and documents as multinomial distributions over terms and topics respectively.

Let’s first understand how terms in a topic are modelled using a multinomial distribution.

The multinomial distribution is a generalisation of the binomial distribution. An example experiment which generates a binomial distribution is tossing a coin N times, where each coin toss results in a head or a tail. If the probability of a head is p, then the probability of getting k heads in n trials is given by:

Here, k is a random variable representing the number of successes in N trials. All coin tosses are considered independent of each other. The parameters of the binomial distribution are n and p.

In a multinomial distribution, each ‘toss’ (still independent of other tosses) can have more than two outcomes. An example experiment which generates a multinomial distribution is rolling a k-sided die (with digits from 1,2,…k) N times. Now, each roll of the die can result in k outcomes with probabilities p1,p2…pk . An example is a k-sided die of a ‘document’ with each side representing one ‘topic’ (along with the ‘probability of the topic’ in the document). 

Since in each roll of the die one of the k numbers should appear, it is evident that:

When you do N rolls of the die, you will get N digits from 1-k as the outcomes – digits ‘i’ which have a higher probabilities pi will appear more frequently than those with low pi’s.


Let’s now see how the multinomial distribution is represented mathematically and  used in the context of topic modelling.

Consider our k-sided die and say that you roll it N times. Let the random variable Xi denote the number of times the digit ‘i’ appears, i.e. X1 represents the number of times 1 appears, X2 represents the frequency of 2 and so on.


The probability that the digit 1 appears x1 times, 2 appears x2 times…. digit k appears x2 times is given by the expression below. Also, note that since there are total N tosses, the number of times each digit (each side of the die) appears should add up to N, i.e.:

The random variable (now a vector) X=(X1,X2,X3…Xk) is said to follow a multinomial distribution. The probability of this random variable taking the values (x1,x2….xk) is given by:

To take an example, say you have a 3 sided die (k=3) with the letters A, B and C on each side which occur with probabilities (0.2, 0.5, 0.3) respectively. If you roll the die N=4 times, the probability of A, B and C appearing 1, 1 and 2 times respectively is given by:

$$////$$

You can now see how the topic-term distribution can be modelled as a multinomial distribution* (more accurately, it is modelled as a Dirichlet distribution – explained later).

Say you have k terms representing a topic c = ‘magic’ (terms such as ‘magic’, ‘black’, ‘cards’ … etc.) – you can imagine a k-sided die with each side containing a term w and having some probability P(w|c). Also, say you want to generate N terms from this topic.

This can be done by rolling the k-sided die N times – each roll generates a term (according to the probability P(w|c)).

The random variable X is a vector whose values represent the number of times a particular word appears in a topic c (say ‘magic’), i.e. 

You can now see how the topic-term distribution can be modelled as a multinomial distribution* (more accurately, it is modelled as a Dirichlet distribution – explained later).

Say you have k terms representing a topic c = ‘magic’ (terms such as ‘magic’, ‘black’, ‘cards’ … etc.) – you can imagine a k-sided die with each side containing a term w and having some probability P(w|c). Also, say you want to generate N terms from this topic.

This can be done by rolling the k-sided die N times – each roll generates a term (according to the probability P(w|c)).

The random variable X is a vector whose values represent the number of times a particular word appears in a topic c (say ‘magic’), i.e. 

You can now see how the topic-term distribution can be modelled as a multinomial distribution* (more accurately, it is modelled as a Dirichlet distribution – explained later).

Say you have k terms representing a topic c = ‘magic’ (terms such as ‘magic’, ‘black’, ‘cards’ … etc.) – you can imagine a k-sided die with each side containing a term w and having some probability P(w|c). Also, say you want to generate N terms from this topic.

This can be done by rolling the k-sided die N times – each roll generates a term (according to the probability P(w|c)).

The random variable X is a vector whose values represent the number of times a particular word appears in a topic c (say ‘magic’), i.e. 

You can now see how the topic-term distribution can be modelled as a multinomial distribution* (more accurately, it is modelled as a Dirichlet distribution – explained later).

Say you have k terms representing a topic c = ‘magic’ (terms such as ‘magic’, ‘black’, ‘cards’ … etc.) – you can imagine a k-sided die with each side containing a term w and having some probability P(w|c). Also, say you want to generate N terms from this topic.

This can be done by rolling the k-sided die N times – each roll generates a term (according to the probability P(w|c)).

The random variable X is a vector whose values represent the number of times a particular word appears in a topic c (say ‘magic’), i.e. 

You can now see how the topic-term distribution can be modelled as a multinomial distribution* (more accurately, it is modelled as a Dirichlet distribution – explained later).

Say you have k terms representing a topic c = ‘magic’ (terms such as ‘magic’, ‘black’, ‘cards’ … etc.) – you can imagine a k-sided die with each side containing a term w and having some probability P(w|c). Also, say you want to generate N terms from this topic.

This can be done by rolling the k-sided die N times – each roll generates a term (according to the probability P(w|c)).

The random variable X is a vector whose values represent the number of times a particular word appears in a topic c (say ‘magic’), i.e. X=(magic=6,black=4,…wk=xk).

Similarly, a document can be seen as a multinomial distribution over the topics.

Report an error