In this session, you learnt about the learning algorithms behind decision trees. In a step-by-step manner, you pick an attribute and a rule to split data into multiple partitions to increase the homogeneity of the data set.
You also learnt about the various ways in which you can measure the homogeneity of a data set, such as the Gini index, entropy and MSE.
Now, let’s summarise your learnings so far:
- A decision tree first decides on an attribute to split on.
- To select this attribute, it measures the homogeneity of the nodes before and after the split.
- You can measure homogeneity in various ways with metrics like Gini index and entropy.
- The attribute that results in the increase of homogeneity the most is then selected for splitting.
- Then, this whole cycle is repeated until you obtain a sufficiently homogeneous data set.