IKH

Co-occurrence Matrix

Apart from the occurrence context matrix, the other way to create a distributed representation of words is the term-term co-occurrence matrix (or simply the co-occurrence matrix), which you’ll study next.

Unlike the occurrence-context matrix, where each column represents a context (such as a document), now the columns also represent a word. Thus, the co-occurrence matrix is also sometimes called the term-term matrix.

Let’s discuss the co-occurrence matrix now.

There are two ways of creating a co-occurrence matrix:

  1. Using the occurrence context (e.g. a sentence):
    • Each sentence is represented as a context (there can be other definitions as well). If two terms occur in the same context, they are said to have occurred in the same occurrence context.
  2. Skip-grams (x-skip-n-grams):
    • A sliding window will include the (x+n) words. This window will serve as the context now. Terms that co-occur within this context are said to have co-occurred.

Report an error