In this segment, you’ll learn to compute the emission & transition probabilities from a tagged corpus. This process of learning the probabilities from a tagged corpus is called training an HMM model.
The following lecture will help you understand how that is done.
To summarise, the emission and the transition probabilities can be learnt as follows:
Transition Probability of tag t1 followed by tag t2:
P(t2|t1) = Number of times t1 is followed by tag t2/ Number of times t1 appears.
Now, let’s practice some questions. Let’s assume you have the following POS tagged corpus. Now answer the questions in the quiz section. (Text is in form of word/tag).
[Twitter/NN is/VB the/DT most/JJ open/JJ social/JJ media/NN platform/NN, which/WDT is/VB partly/RB why/WRB it/PRP is/VB used/VB by/IN so/RB many/JJ politicians/NN, celebrities/NN, journalists/NN, technocrats/NN, and/CC experts/NN working/VB on/IN pacy/JJ topics/NN.
As/IN we/PRP learned/VB over/IN the/DT past/JJ year/NN, openness/NN of/IN Twitter/NN was/VB exploited/VB by/IN adversarial/JJ governments/NN trying/VB to/TO influence/VB elections/NN.
Twitter/NN is/VB marketing/VB itself/PRP as/IN /DT news/NN platform/NN.]
The table below summarises the counts of the above text:
| # | # | ||
| NN | 17 | VB | 12 |
| JJ | 7 | DT | 3 |
| Twitter appearing as NN | 3 | Was appearing as VB | 1 |
| JJ followed by NN | 5 | Exploited appearing as VB | 1 |
| NN followed by VB | 4 | VB followed by VB | 3 |