Feature Importance in Decision Trees

Feature importance plays a key role in contributing towards effective prediction, decision-making and model performance. It eliminates the less important variables from a large data set and helps in identifying the key features that can lead to better prediction results.

In this video, let’s understand the notion of variable importance in decision trees.

Note that in the video below, reduction in Gini Impurity based on Gender and cholesterol is 0.06<0.18 instead of 0.06<0.018.

Decision trees help in quantifying the importance of each feature by calculating the reduction in the impurity for each feature at a node. The feature that results in a significant reduction in the impurity is the important variable, and the one that results in less impurity reduction is the less important variable.

In the previous example, the variable ‘cholesterol’ resulted in higher reduction in impurity than ‘gender’. This implies that cholesterol will help distinguish between the two classes better than gender. This is intuitive as well: splitting on cholesterol is expected to be more important and informative than gender; this is because medically, people with high cholesterol have higher chances of developing heart disease than the ones with low cholesterol.

Report an error