IKH

Constituency Grammars

To deal with the complexity and ambiguity of natural language, we first need to identify and define commonly observed grammatical patterns.

The first step in understanding grammar is to divide a sentence into groups of words called constituents based on their grammatical role in the sentence.

To start with, let’s take an example sentence: “The fox  ate the squirrel.”

Each underlined group of words represents a grammatical unit or a constituent – ‘The fox’ represents a noun phrase, ‘ate’ represents a verb phrase, ‘the squirrel’ is another noun phrase.

In the upcoming few lectures, you will study how constituency parsers can ‘parse’ the grammatical structure of sentences. Let’s first understand the concept of constituents.

Let’s understand the concept of constituencies in a little more detail. Consider the following two sentences:

  • ‘Ram   read   an article on data science’
  • ‘Shruti   ate   dinner’

The underlined groups of words form a constituent (or a phrase). The rationale for clubbing these words in a single unit is provided by the notion of substitutability, i.e., a constituent can be replaced with another equivalent constituent while keeping the sentence syntactically valid.

For example, replacing the constituency ‘an article on data science’ (a noun phrase) with ‘dinner’ (another noun phrase) doesn’t affect the syntax of the sentence, though the resultant sentence “Ram read dinner” is semantically meaningless.

Most common constituencies in English are Noun Phrases (NP), Verb Phrases (VP), and Prepositional Phrases (PP). The following table summarises these phrases:

Type of PhrasesDefinitionExamples
Noun Phrase
Has a primary noun and other words that modify it
A crazy white cat, the morning flight, a large elephant
Verb PhraseStarts with a verb and other words that syntactically depend on itsaw an elephant, made a cake, killed the squirrel
Prepositional PhraseStarts with a preposition and other words (usually a Noun Phrase) that syntactically depend on iton the table, into the solar system, down the road, by the river

There are various other types of phrases, such as an adverbial phrase, a nominal (N), etc., though in most cases you will need to work with only the above three phrases along with the nominal (introduced in a later lecture).

Context-Free Grammars

The most commonly used technique to organize sentences into constituencies is Context-Free Grammars or CFGs. CFGs define a set of grammar rules (or productions) which specify how words can be grouped to form constituents such as noun phrases, verb phrases, etc.

In the following lecture, the professor will explain the elements of a context-free grammar.

To summarise, a context-free grammar is a series of production rules. Let’s understand production rules using some examples. The following production rule says that a noun phrase can be formed using either a determiner (DT) followed by a noun (N) or a noun phrase (NP) followed by a prepositional phrase (PP). :

NP -> DT N | NP PP

Some example phrases that follow this production rule are:

  • The/DT man/N.
  • The/DT man/N over/P the/DT bridge/N.

Both of the above are noun phrases NP. The man is a noun phrase that follows the first rule:

NP -> DT N.

The second phrase (The man  over the bridge) follows the second rule:

NP -> NP PP

It has a noun phrase (The man) and a prepositional phrase (over the bridge).

In this way, using grammar rules, you can parse sentences into different constituents. In general, any production rule can be written as A -> B C, where A is a non-terminal symbol (NP, VP, N etc.) and B and C are either non-terminals or terminal symbols (i.e. words in vocabulary such as flight, man etc.). 

Some other examples of commonly observed production rules in English grammar are provided in the table below. Note that a nominal (Nom) refers to an entity such as morning, flight etc. which commonly follows the rule Nominal > Nominal Noun. There is a subtle difference and a significant overlap between a nominal (Nom) and a noun (NN), you may read more about it here, though you need not worry much about these nuances in this course.

The symbol S represents an entire sentence.

Production Rules

Production Rule Example
S > NP VP
he + swam 
NP > Pronoun | NP PP | DT Nom
she | a man + across the river | a + river 
VP > VP PP | VBD | VP NPswam + across the river | enjoyed | ate + the squirrel

Further, the professor mentioned two broad approaches for parsing sentences using CFGs:

  • Top-down: Start from the starting symbol S and produce each word in the sentence.
  • Bottom-up: Start from the individual words and reduce them to the sentence S.

You’ll learn both approaches in detail in the next segments.

Report an error