Markov models are probabilistic (or stochastic) models that were developed to model sequential processes. In a Markov process, it is usually assumed that the probability of each event (or state) depends only on the probability of the previous event. This simplifying assumption is a special case which is known as the Markovian, one-Markov and the first-order Markov assumption.
The following lecture will introduce you to Markov processes more formally.
Let’s summarise the theory of Markov processes and HMMs.
A Markov chain is used to represent a process which performs a transition from one state to other. This transition makes an assumption that the probability of transitioning to the next state is dependent solely on the current state. Consider the figure below:
Here, ‘a’, ‘p’, ‘i’, ‘t’, ‘e’, ‘h’ are the states and the numbers mentioned on the lines are transition probabilities. For e.g. the probabilities of transitioning from the state ‘t’ to the states ‘i’, ‘a’ and ‘h’ are 0.3, 0.3, 0.4 respectively.
The start state is a special state which represents the initial state of the process (e.g. the start of a sentence).
Markov processes are commonly used to model sequential data, such as text and speech. For e.g., say you want to build an application which predicts the next word in a sentence. You can represent each word in a sentence as a state. The transition probabilities (which can be learnt from some corpus, more on that later) would represent the probability that the process moves from the current word to the next word. For e.g. the transition probability from the state ‘San’ to ‘Franciso’ will be higher than to the state ‘Delhi’.
The Hidden Markov Model (HMM) is an extension to the Markov process which is used to model phenomena where the states are hidden(or latent) and they emit observations. For example, in a speech recognition system (a speech-to-text converter), the states represent the actual text words which you want to predict, but you do not directly observe them (i.e. the states are hidden). Rather, you only observe the speech (audio) signals corresponding to each word, and you need to infer the states using the observations.
Similarly, in POS tagging, what you observe are the words in a sentence, while the POS tags themselves are hidden. Thus, you can model the POS tagging task as an HMM with the hidden states representing POS tags which emit observations, i.e. words.
The hidden states emit observations with a certain probability. Therefore, along with the transition and initial state probabilities, Hidden Markov Models also have emission probabilities which represent the probability that an observation is emitted by a particular state.
The figure below illustrates the emission and transition probabilities for a hidden Markov process having three hidden states and four observations.
In the previous segment, you had used the transition and the emission probabilities for finding the most probable tag sequence for the sentence “The high cost”. The probabilities P(NN|JJ), P(JJ|DT) etc. are transition probabilities, while the P(high|JJ), P(cost|NN) etc. are the emission probabilities.
You’ll learn to compute these probabilities from a tagged corpus in a later segment.