Now that you have learnt how decision trees can be used to solve regression problems, let’s understand how regression trees are built in python. For this , you will use the same housing data set that you used in multiple linear regression to predict house prices on various such as area, number of bedroom, parking space etc.
Essentially, the aim is to:
- Identify the variables affecting house prices, e.g., the area, the number of rooms, bathrooms, etc.
- Create a linear model that quantitatively relates house prices with variables, such as the number of rooms, area, number of bathrooms; and
- Know the variables that significantly contribute towards predicting house prices.
You can download the Python code file used in the next video to practise along. Please ensure that you go through the initial Python code file that you used in multiple linear regression to recall the initial steps and model building that you did earlier. This will help you have a better understanding of the further steps that this segment will cover.
Let’s build the model using a DecisionTreeRegressor() with some arbitrary parameters for the sake of simplicity and more accurate prediction results.
In order to see what the model looks like, let’s plot our decision tree and try to interpret what it conveys about house prices. We also need to evaluate how our decision tree performs. From the video given below, let’s understand model interpretation and evaluation.
So now, you have a good understanding of how decision trees can be used for decision-making whenever you have continuous target variables. As an exercise, you can definitely perform more hyperparameter tuning here and improve the performance of the models that you built right now.