Apart from the occurrence context matrix, the other way to create a distributed representation of words is the term-term co-occurrence matrix (or simply the co-occurrence matrix), which you’ll study next.
Unlike the occurrence-context matrix, where each column represents a context (such as a document), now the columns also represent a word. Thus, the co-occurrence matrix is also sometimes called the term-term matrix.
Let’s discuss the co-occurrence matrix now.
There are two ways of creating a co-occurrence matrix:
- Using the occurrence context (e.g. a sentence):
- Each sentence is represented as a context (there can be other definitions as well). If two terms occur in the same context, they are said to have occurred in the same occurrence context.
- Skip-grams (x-skip-n-grams):
- A sliding window will include the (x+n) words. This window will serve as the context now. Terms that co-occur within this context are said to have co-occurred.