IKH

Probabilistic Latent Semantic Analysis (PLSA)

Now , let’s study the PLSA model. Recall that we had briefly discussed PLSA in the previous session- PLSA can be represented as a graphical model having the random variables documents d, topics c and words w,

Please note that, in the video, it is given that

$$\\P(w/d)=p(d)[p(c/d)*p(w/c)]\\$$

Which is wrong.

It should have been P(w,d) =  p(d)* ∑ [p(c|d)*p(w|c)]

PLSA can be summarised as follows:

 
Say there are M documents (represented by the outer plate in the figure below), and for simplicity, assume that there are N words in each document (the inner plate). Also, let’s assume you have k topics (k is not represented in the figure).

Each document contains each topic with some probability (document-topic distribution), and each topic contains the N words with some probability (topic-term distribution). The inference task is to figure out the M x k document-topic probabilities and the k x N topic-term probabilities. In other words, you want to infer Mk + kN parameters.

The basic idea used to infer the parameters, i.e. the optimisation objective, is to maximise the joint probability p(w, d) of observing the documents and the words (since those two are the only observed variables)Notice that you are doing something very clever (and difficult) here – using the observed random variables (d, w) to infer the unobserved random variable (c).

Using the Bayes’ rule, you can write p(w, d) as:

p(w,d) = p(d) x p(w|d)

The term p(w|d) represents the probability of a word w being generated from a document d. But our model assumes that words are generated from topics, which in turn are generated from documents, so we can write p(w|d) as p(w|c). p(c|d) summed over all k topics: 

 P(w|d) =  ∑ p(c|d) x p(w|c)

So, we have

P(w,d) =  p(d) x ∑ [p(c|d) x p(w|c)]

You can estimate the Mk + kN parameters using the expectation maximisation (EM) algorithm. However, the EM algorithm is an optional part of this course and we recommend you to go through the optional content on EM algorithm provided at the bottom of this page.

Coming back to PLSA – in the following short lecture, professor Srinath will discuss an interesting way to use PLSA by extending the idea of documents, topics and words to ‘users’ reading certain documents as well.

To summarise, PLSA models documents as a distribution over topics and topics as a distribution over terms. The parameters of PLSA are all the probabilities of associations between documents-topics and topics-terms which are estimated using the expectation maximisation algorithm.

Drawbacks of PLSA

You see that PLSA has lots of parameters (Mk + kN) which grow linearly with the documents M. Although estimating these parameters is not impossible, it is computationally very expensive. 

For e.g. if you have 10, 000 documents (say Wikipedia articles), 20 topics, and each document has an average 1500 words, the number of parameters you want to estimate is 1500*20 + 20*10k = 230, 000.

In the next few lectures, you will study an alternate topic model called LDA and see how it solves this problem.

Please refer to the plate diagram above (PLSA plate notation) to solve the following questions.

In the next lecture, let’s study the expectation maximisation algorithm to estimate the PLSA parameters. 

Report an error