Until now, you know the in and out of PCA and how to implement in Python and hence, you should be aware when to apply PCA. Let’s now look at some practical considerations that need to be kept in mind while applying PCA.
Those were some important points to remember while using PCA. To summarise:
- Most software packages use SVD to compute the principal components and assume that the data is scaled and centred, so it is important to do standardisation/normalisation.
- PCA is a linear transformation method and works well in tandem with linear models such as linear regression, logistic regression etc., though it can be used for computational efficiency with non-linear models as well.
- It should not be used forcefully to reduce dimensionality (when the features are not correlated).
In the next short lecture, Rahim will talk about some shortcomings of PCA.
You learnt some important shortcomings of PCA:
- PCA is limited to linearity, though we can use non-linear techniques such as t-SNE as well (you can read more about t-SNE in the optional reading material below).
- PCA needs the components to be perpendicular, though in some cases, that may not be the best solution. The alternative technique is to use Independent Components Analysis.
- PCA assumes that columns with low variance are not useful, which might not be true in prediction setups (especially classification problem with a high class imbalance).
If you are interested in reading about t-SNE (t-Distributed Stochastic Neighbor Embedding) or ICA, you can go through the additional reading provided below.
This brings us to the end of this segment.