IKH

Directions of Maximum Variance

So you saw that when the variances are unequally distributed among the original features or columns i.e. some columns have much less variance than others, it is easier to remove those columns and do dimensionality reduction.

But what about the scenario when the variances are pretty similar? For example, take a look at the following image containing the height and weight information of a different group of patients.

As you can see, the spread along both the axes is quite comparable and therefore, you can’t directly go and say that one direction is more useful than the other. What to do now?

Let’s look at the next lecture to further understand this problem and appreciate how PCA solves this problem smartly.

Note: At 0:36 the correct spelling is Variance 

After going through the above lecture, you have more or less understood what PCA does. It changes the basis vectors in such a way that the new basis vectors capture the maximum variance or information. In the next video, we’ll get to know how this happens visually.

Basically, the steps of PCA for finding the principal components can be summarised as follows.

  • First, it finds the basis vector which is along the best- fit line that maximises the variance. This is our first principal component or PC1.
  • The second principal component is perpendicular to the first principal component and contains the next highest amount of variance in the dataset.
  • This process continues iteratively, i.e. each new principal component is perpendicular to all the previous principal components and should explain the next highest amount of variance.
  • If the dataset contains n independent features, then PCA will create n Principal components.

For a 2-D dataset that has the representation as shown in the image below:

the principal components can be visually represented as shown in the image below (click on the image to enlarge it):

Also, once the Principal Components are found out, PCA assigns a %age variance to each PC. Essentially it’s the fraction of the total variance of the dataset explained by a particular PC. This helps in understanding which Principal Component is more important than the other and by how much. This is shown in the images below.

Since 100% of the total variance or information of the entire dataset is present in only one of the columns (PC1) we can safely drop PC2 and still be assured of losing no information.

Report an error