In the previous module on lexical processing, you learnt about frequency-based methods such as TfiDf or the bag-of-words approach for creating word vectors.
Let us revise the bag of words representation of creating word vectors
Bag of Words is a representation of text that describes the occurrence of words within a corpus of text, treating each word independently.
For example, the following text extract from A Tale of Two Cities authored by Charles Dickens can be converted into a bag-of-words representation:
“It was the best of times,
it was the worst of times.
It was the age of wisdom,
it was the age of foolishness.
It was the season of Light,
it was the season of Darkness.
It was the spring of hope,
it was the winter of despair”
| best | worst | wisdom | foolishness | hope | despair | spring | winter | light | sesson | time | age | |
| It was | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | |
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
The above table represents the one-hot encoded vector for the text extract. If a word is present in the sentence, a value of ‘1’ is assigned; otherwise, ‘0’ is assigned.
Note that attention is not given to the meaning, sequence or context of the words in this representation. In the next video, Jaidev will discuss the limitations of this technique.
To understand the relationship between words, cosine similarity can be applied on the one-hot encoded representation of this text corpus. For example, on taking the words ‘best’ and ‘worst’ as shown below and applying cosine similarity, we get 0. This means that the words are completely unrelated.

S=cos(x,y)=x.y||x||||y||
S(best,worst)=01X1=0
However, ‘best’ and ‘worst’ are antonyms, and their cosine similarity should be negative as you learnt earlier.
Similarly:
S(wisdom, foolishness) = 0
S(wisdom, winter) = 0
S(winter,light) = 0
S(winter, season) = 0
S(spring, season) = 0
The cosine similarity for unrelated words, related words and antonyms is zero.
This proves that the bag-of-words representation is not the most accurate way to convert words into its vectors. The vectors captured by Bag of Words do not capture the meaning of words.
Report an error