IKH

Manual Feature Elimination

Recall that you had used RFE to select 15 features. But as you saw in the pairwise correlations, there are high values of correlations present between the 15 features, i.e. there is still some multicollinearity among the features. So you definitely need to check the VIFs as well to further eliminate the redundant variables. Recall that VIF  calculates how well one independent variable is explained by all the other independent variables combined. And its formula is given as:

where ‘i’ refers to the ith variable which is being represented as a combination of rest of the independent variables.

Let’s see Rahim talk about eliminating the insignificant variables based on the VIFs, and the p-values.

To summarise, you basically performed an iterative manual feature elimination using the VIFs and p-values repeatedly. You also kept on checking the value of accuracy to make sure that dropping a particular feature doesn’t affect the accuracy much. 

This was the set of 15 features that RFE had selected which we began with:

And this is the final set of features which you arrived at after eliminating features manually:

As you can see, we had dropped the features ‘PhoneService’ and ‘TotalCharges’ as a part of manual feature elimination.

Interpreting the Model

Refer to the above image, i.e. the final summary statistics after completing manual feature elimination. Now suppose you are a data analyst working for the telecom company, and you want to compare two  customers, customer A and customer B. For both of them, the value of the variables tenure, PhoneService, Contract_One year, etc. are all the same, except for the variable PaperlessBilling, which is equal to 1 for customer A and 0 for customer B.

In other words, customer A and customer B have the exact same behaviour as far as these variables are concerned, except that customer A opts for paperless billing, and customer B does not. Now use this information to answer the following questions.

Now that we have a final model, we can begin with model evaluation and making predictions. We’ll start doing that in the next session.

Coming Up

For now, let’s summarise your learnings in this session in the next segment. We’ll start with model evaluation separately in the next session.

Report an error