In the previous segment, you have learnt that the optimal weights w can be computed by maximising the log-likelihood. Once the weights w are obtained, the next task is to assign the sequence y for any given word sequence x,i.e. assigning IOB tags to each word in the sentence x.
Ashish will now explain how the assignment of labels can be done efficiently.
Correction:
Ashish has defined the third feature function fdict as fdict=[xi in citydict], thought, it should be fdict =[xi in citydict] & [yi = C]
Thus, the inference task to assign the label sequence y* to x which maximises the score of the sequence, i.e.
$$//\;\;\;Y^\ast=argmax(w.f(x,y))\\//$$
The naive way to get y* is by calculating w.f(x,y) for every possible label sequence y , and then choose the label sequence that has maximum (w.f(x,y)) value. However, there are an exponential number of possible labels (tn for a tag set of size t and a sentence of length n), and this task is computationally heavy.
Previously, Ashish had defined three feature functions with weights as follows.Also, for simplicity, assume that there are only possible labels ‘O’ for outside an entity and ‘C’ for city:
- Wtrans = 3;Ftrans : yi-1 = O ,XI is in citydict ,
- Wtrans = 2; & yi = O
- Wdict = 5; fdict = xi is in citydict & yi =C
The first feature will return 1 if the label is C, the previous label is O (i.e. it transitions from O to C), and if the word is in the city dictionary. Since the weight of the first feature function is 3, it ‘fires’ 3 if these conditions are satisfied.
Ashish will demonstrate this using an example.
[Slight correction: At 3:20, Ashish has written the weight 2 on the O-C edge, though, it should be on the O-O edge.]
You’ll learn more about dynamic programming in the next lecture. Let’s first solve some questions to understand how CRFs assign IOB tags to input sequences using weights and feature functions. The solution to these questions is provided below as a downloadable file.
Solutions to the Ouestions
The following attached file will elucidate the solution of the questions.
NOTE:
In the solution above while calculating the path value of flight_cost-> O -> O -> city_name : in the last image where it is mentioned.
over the path = flight_cost(6+4).
It should be city_name(6+4).
In the next lecture, Ashish will give you an intuition of how to identify problems that require a dynamic programming algorithm. Dynamic programming applied to sequence prediction problems such as HMMs, CRFs etc. are commonly known as the Viterbi algorithm.
(Note: At 1:09, the end state is E, so the sequence should be B-O-O-O-E and not B-O-O-O-C).
In the next segment, Ashish will demonstrate how to build and evaluate CRFs in python on the airlines reservation dataset.