You have already learnt some probabilistic model used for sequence prediction and bigram models,conventional ML models(Naive Bayes, decision trees etc.)and HHMs.
You will now study another family of models commonly used for sequence prediction tasks-Conditional Random Fields(CRFs).
CRFs are used in a wide variety of sequence labelling tasks across various domains – POS tagging, speech recognition, NER, and even in computational biology for modelling genetic patterns etc. In this section, you will learn the architecture of CRFs in the context of entity recognition, though the same can be applied to any of the tasks mentioned above.
CRFs are commonly used as an altemative to HMMs and, in some application, have empirically proven to be significantly more accurate than HMMs.
CRFs are discriminative probabilistic classifiers (often represented as undirected graphical models in some texts). In the following lecture, you will study the distinction between discriminative and generative classifiers as well.
Broadly speaking there are two types of classifiers in ML:
- Discriminative classifiers learn the boundary between classes by modelling the conditional probability distribution P(y|x), where y is the vector of class labels and x represents the input features. Examples are Logistic Regression, SVMs etc.
- Generative classifiers model the joint probability distribution P(x,y). Examples of generative classifiers are Naive Bayes, HMMs etc.
To summarise, CRFs model the conditional probability P(Y|X), where Y is the vector of output sequence (IOB labels here) and X is the input sequence(words to be tagged).
Ashish also mentioned that rather than feeding the input sequence X as it is, we usually derive features from the input sequence and feed the features. For now, just remember this sentence – you will shortly see what kind of ‘features’ CRFs generally use.
Additional Reading
- A concise explanation of the difference between discriminative and generative models.