In the previous segment, you learnt how the POS tagging problem can be modelled using an HMM. You saw that the sequence of words are the observations while the POS tags are the hidden states. Also, the HMM is represented by its initial state probabilities (i.e. the probability of transition into any state from the initial state), the transition and the emission probabilities.
These probabilities are usually learnt from a training corpus. For now, let’s assume that you have already learnt these probabilities. Then, the explanation problem, also called the decoding problem, is as follows: Given a sequence of words/observations and an HMM model (i.e. transition, emission and start state probabilities), find the tag/state sequence which maximises the probability of observing the observed words.
To summarise, since the explanation problem is exponential in the number of tags, we need to find a more efficient algorithm to solve it. You already know that the Viterbi algorithm solves this problem.
Next, Prof. Srinath will explain how to visualise the HMM as a trellis and solve the explanation problem using the Viterbi algorithm.
Thus, we have used the same phrase for tagging ‘The high cost’ and have assumed that we have only three possible tags – DT, JJ, NN. We have also assumed some emission (P(‘the’|DT), P( ‘the ‘|NN), P(‘high’|JJ), etc.) and transition probabilities (P(NN|JJ), P(JJ|DT), P(DT|JJ), etc.). You’ll learn how to calculate these probabilities from a tagged corpus in the next segment.
In the upcoming lecture, Prof. Srinath will demonstrate how to calculate the most probable tag sequence using the Viterbi Heuristic.
(Note: At 4:18, Prof says P(DT|”high”) = 0.50.2, it should be 0.10.2).
To summarise, given a Hidden Markov Model defined by the initial state, emission and the transition probabilities, you assign a tag Tj to an observation Oi such that it maximises the likelihood:
$$//P(T_j\backslash O_i)=P(O_I\backslash T_J)\ast P(T_J\backslash T_{J-1})//$$
Note that the Viterbi algorithm, as demonstrated in the previous lecture, is an example of a dynamic programming algorithm. In general, algorithms which break down a complex problem into subproblems and solve each subproblem optimally are called dynamic programming algorithms.
Until now, we had assumed some initial state, transition and emission probabilities and used them to compute the optimal tag sequence. But how do you actually learn these probabilities? This task is done using some tagged corpus, such as the Penn Treebank, and is called the learning problem.
In the next segments, you’ll learn to solve the learning problem and write an algorithm Python to train your own HMM using the Viterbi algorithm.
Additional References:
- Likelihood problem of predicting the next possible word can be solved using forward algorithm. You can read about it in more detail from this link: https://web.stanford.edu/~jurafsky/slp3/9.pdf (Refer to the section 9.3)