IKH

Different Approaches to POS Tagging

Now that you are familiar with the commonly used POS tags, we will discuss techniques and algorithms for POS tagging. We will look at the following four main techniques used for POS tagging:

  • Lexicon-based
  • Rule-based
  • Probabilistic (or stochastic) techniques.
  • Deep learning techniques.

This session will cover the first three tagging approaches in detail and the basics of deep learning-based taggers. Deep learning-based models will be covered in detail in the Neural Networks course.

Prof. Srinath mentioned four approaches of POS tagging. Let’s summarize these approaches:

The lexicon-based approach uses the following simple statistical algorithm: for each word, it assigns the POS tag that most frequently occurs for that word in some training corpus. Such a tagging approach cannot handle unknown/ambiguous words. For example:

  • I went for a run/NN.
  • I run/VB in the morning.

Lexicon tagger will tag ‘run’ basis the highest frequency tag. In most contexts, ‘run’ is likely to appear as a verb, implying that ‘run’ will be wrongly tagged in the first sentence.

But if there’s a rule that is applied to the entire text, such as, ‘replace VB with NN if the previous tag is DT’, or ‘tag all words ending with ing as VBG’, the tag can be corrected. Rule-based tagging methods use such an approach.

Probabilistic taggers don’t naively assign the highest frequency tag to each word, instead, they look at slightly longer parts of the sequence and often use the tag(s) and the word(s) appearing before the target word to be tagged.

We’ll go through the details of probabilistic tagging approaches in the upcoming segments.

Report an error