You have already seen that Z(x) is the sum of scores of all possible label sequences. In this segment, let’s understand the mathematical form of the normalising constant Z(x) and some examples of feature functions.
Since the probability of a tag sequence y is given by sequence_score(y|x)
$$//={\textstyle\prod_{i=1}^n}(exp(w.\;f(y_i,\;x_i,\;y_{i-1},\;i)))=exp({\textstyle\sum_1^n}(w.\;f(y_i,\;x_i,\;y_{i-1},\;i)))=exp(w.\;f(x,\;y)//$$
Then, Z(x), if there are N possible sequences (N=tn), the sum of scores of all possible N sequences is given by:
$$//Z(x)={\textstyle\sum_1^N}(exp(w.\;f(x,\;y))//$$
You’ll also learn that there are two types of feature functions – state features and transition features.
You saw an example of a state feature. Let’s now see an example of a transition feature which uses the previous label:
Additional Reading
- In the next section, we will use concepts taught in logistic regression (specifically, maximum likelihood estimation). In case you are not familiar with those concepts, we highly recommend going through the optional module here (you will not be tested on any optional content).
- You can read more on the architecture of CRFs in detail from the following document: