Here’s a summary of what you’ve learnt so far.
First, you implemented PCA in python on the iris dataset. In that demonstration, you understood the basic steps that you need to follow in PCA – on how to perform PCA, find the Principal Components, choose a particular number of components using the scree-plot, transform your data and then visualise the data.
After that, you saw an implementation where you wanted to improve the model efficiency in a Logistic Regression Setup. Here you were able to see that with PCA, you’re able to maintain the same level of efficiency without going through all the iterative feature elimination procedures. You also saw how to perform PCA faster by just giving it how much variance you need to be explained.
Here’s a list of useful functions that use after importing the PCA function from sklearn libraries.
- pca.fit() – Perform PCA on the dataset.
- pca.components_ – Explains the principal components in the data
- pca.explained_variance_ratio_ – Explains the variance explained by each component
- pca.fit(n_components = k) – Perform PCA and choose only k components
- pca.fit_transform – Transform the data from original basis to PC basis.
- pca(var) – Here ‘var’ is a number between 0-1. Perform PCA on the dataset and choose the number of components automatically such that the variance explained is (100*var)%.