Variance as Information

Let’s take a look at a simple example that will help us intuitively understand how variance in the data is equivalent to information we can extract out of the data.

As you saw in the example, the first image didn’t have much information in it. Speaking of it in the ways the pixels are arranged, it is the same colour throughout. However, there is a lot of things that you could distinguish easily in the second image and therefore that image has a lot to offer in terms of information. The pixels have a lot of variety and therefore that image has more variance and equivalently, more information.

So the key takeaway from the above lecture is to measure the importance of a column by checking its variance values. If a column has more variance, then this column will contain more information.

Geometrical Interpretation of Variance

In the above example you saw that the variance of height was only 14, whereas that of weight was 311.14. This gave you an idea that Weight is a more important column than Height. Now, there is another elegant way of looking at variance geometrically. Take a look at the following image.

The red line on the Height and Weight axes show the spread of the projections of the vectors on those axes. As you can see here, the spread of the line is quite good on the Weight axis as compared to the Height axis. Hence you can say that Weight has more variance than Height. This idea of the spread of the data being equivalent to the variance is quite an elegant way to distinguish the important directions from the non-important ones.

Report an error