In the previous segement,you recall forward pass in neural networks.
The two different approaches for Word2Vec are Skip -gram model and Continuous Bag of Words(CBOW) model.In the next video we willdiscuss the different betweeen them.
The Skip-gram model predicts words within a certain before and after input word.
Continuous Bag of Words predicts the middle word based on the surrounding words.
Let us test our understanding:
Now, you know that two models can create word embeddings. You will gain a detailed understanding of the CBOW model by assessing the training data, understanding the architecture of neural networks and learning how to extract word embeddings from weight matrices. Finally, you will take a look at the case and understand how Skip-gram differs from CBOW.
In the next video, you will learn how to make the training data for the CBOW model.
We assumed that we have only one sentence in our corpus, which is ‘My experience with upGrad has been wonderful’. The vocabulary size is the number of unique words in the corpus, which is 7 in our case.
The context size determines the number of words that are present before and after a given word would be included as the context words of the given word. This is a parameter that we need to decide before creating training data. For the sake of simplicity, we have considered the context size to be 1.
The training data for the CBOW model for our sentence with a context size of 1 looks like this.
We can represent our training data as follows.
| TRAINNG SAMPLES(X , Y) | X | Y |
| (experience , My) | [experience] | My |
| ([my , with], experience) | [my , with] | experience |
| ([experience, upGrad] ,with) | [experience, upGrad] | with |
| ([with , has],upGrad) | [with , has ] | upGrade |
| ([upGrad, been], has) | [upGrad ,been] | has |
| ([has ,wonderful],been) | [has , wonderful] | been |
| ([been], wonderful) | [been] | wonderful |
Based on your understanding of the topic so far, please attempt the following question.
We have created the training samples that need to be fed into a neural network, but neural networks do not take words as inputs. We need to encode these words into a numeric input. In the next video, Jaidev will demonstrate it.
One Hot Encoding is used to convert words into a numeric format.The words can be arranged in alphabetical order or based on frequency or any other heuristic.One hot enconding of the in our example can be represented as follows
| been | experince | Has | My | upGrad | with | wonderful | |
| been | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| experince | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Has | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| My | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| upGrad | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| with | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| wonderful | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
From this table, the one hot encoding of ‘experience’ is as follows. The position of the ‘1’ is in the second position as the mapping has been done according to the table above.
| 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Based on your understanding of the topic so far, please attempt the following question:
You learnt how to create training data for a given corpus and for a decided context size. In any neural network, you need input data that you have created using the methods shown above. In the next segment, you will understand the architecture of the neural network.
Report an error