Until now, we have defined multiple CFGs. You would have noticed that some CFGs are of the form A > B where both A and B are non-terminals (e.g. Nom > Noun), some are of the form A > B where A is a non-terminal while B is a terminal (Det > the), etc.
It is often useful to use a standardised version of production rules by converting a grammar to what is called the Chomsky Normal Form (CNF). The CNF, proposed by the linguist Noam Chomsky, is a normalized version of the CFG with a standard set of rules defining how production rule must be written.
Prof. Srinath will explain the rules required to convert a CFG to the corresponding Chomsky Normal Form.
Let’s summarise the three forms of CNF rules as follows:
- A -> B C
- A -> a
- S -> ε
A, B, C are non-terminals (POS tags), a is a terminal (term), S is the start symbol of the grammar and ε i
s the null string.
The table below shows some examples for converting CFGs to the CNF:
CFG | VP -> V NP PP | VP -> V |
CNF | VP -> V (NP1) NP1 -> NP PP | VP -> V (VP1) VP1 -> ε |