Feature Selection using Cross-Validation

Suppose your model has many features. Now, do you think it is a good idea to build a model on all of these features, or is it better that you use only those features that are key predictors of the target variable?

Let’s watch the following video and try and understand this better.

Feature selection is a very important aspect of the model building process. It is always advisable to remove unnecessary features from the model that do not contribute to the prediction of the target variable. Adding irrelevant features to the model only leads to an increase in model complexity and inefficiency.

But how is this done?

In your previous modules, you have selected features manually using the p-value and VIF criteria. But that is not always a good way to proceed with. Instead, you can use the automated method, known as recursive feature selection (RFE), which removes unnecessary features by just specifying the number of best features that you want in your model. But how do you choose this number? Let’s find out in the next video.

So, as you saw in the video, another utility of cross-validation is to use it for feature selection. You can evaluate the model performance for each value of the number of features in the data set using a cross-validation approach and decide the optimum number of features for the prediction.

But in cases where a model has a large number of features, this method of looping over each feature manually in a loop is not feasible. In such cases, RFECV is the best technique to choose the optimum number of best features that are also important for model building.

Report an error