So far, you have learnt about linear and logistic regression for classification and regression problems and are able to make predictions using these models. So, what is the need for tree models? Aren’t linear models enough for this purpose?
There are certain cases where you cannot directly apply linear regression to solve a regression problem. Linear regression fits only one model to the entire data set; however, you may want to divide the data set into multiple subsets and apply decision tree algorithm in such cases to handle non-linearity.
Let’s understand why exactly we need tree models and what are their advantages in the following video.
Let’s summarise the advantages of tree models one by one in the following order:
- Predictions made by a decision tree are easily interpretable.
- A decision tree is versatile in nature. It does not assume anything specific about the nature of the attributes in a data set. It can seamlessly handle all kinds of data such as numeric, categorical, strings, Boolean, etc.
- A decision tree is scale-invariant. It does not require normalisation, as it only has to compare the values within an attribute, and it handles multicollinearity better.
- Decision trees often give us an idea of the relative importance of the explanatory attributes that are used for prediction.
- They are highly efficient and fast algorithms.
- They can identify complex relationships and work well in certain cases where you cannot fit a single linear relationship between the target and feature variables. This is where regression with decision trees comes into the picture.
In regression problems, a decision tree splits the data into multiple subsets. The difference between decision tree classification and decision tree regression is that in regression, each leaf represents the average of all the values as the prediction as opposed to a class label in classification trees. For classification problems, the prediction is assigned to a leaf node using majority voting but for regression, it is done by taking the average value. This average is calculated using the following formula:
$$//y_{t\;=\;{\textstyle\frac1{N_{t\;}}}\;{\textstyle\sum_{ieD_{t\;}}}Y\left(i\right)\;,\;Whee\;Y_{t\;\;}’s\;represent\;the\;observations\;in\;anode.}\;\;//$$
For example, suppose you are predicting the sales your company will have based on various factors such as marketing, no. of products, etc. Now, if you use a decision tree to solve this problem (the process of actually building a regression tree is covered in the next session), and if one of the leaf nodes has, say, 5 data points, 1 Cr, 1.3 Cr, 0.97 Cr, 1.22 Cr, 0.79 Cr. Now, you will just take the average of these five values which comes out to be 1.07 Cr, and that becomes your final prediction.
Decision tree classification is what you’ll most commonly work on. However, remember that if you get a data set where you want to perform regression, decision tree regression is also a good idea.
This module includes an optional demonstration on regression trees for those who want to explore and understand the process in detail.
Additional Readings: