Before proceeding further please revisit the content on Ensemble models explained by Rahim.
For the purposes of this module we will be focusing on Boosting, but we will start with the difference between bagging & boosting.
“The strength of unity lies in the diversity”, this saying holds the same meaning in the world of machine learning. Ensemble models bring us that flavour of diversity to create powerful models that can handle complex problems. It is a combination of models each of which are trained to solve the same problem to provide the best result at the end.
For a machine learning task ( classification or regression), we need to have a model which identifies the necessary patterns in the data and does not overfit. In other words ‘the models should not be so simple as to not identify even the important patterns present in the data on one hand and on the other hand, they should not be so complex so as to even learn the noise present in the dataset’.
We can arrive at this solution either through a single model or an ensemble i.e., a collection of models. By combining several models, ensemble learning methods create a strong learner thus reducing the bias and/or variance of the individual models.
Bagging is one such ensemble model which creates different training subsets from the training data, with replacement. Then, an algorithm with the same set of hyperparameters is built on these different subsets of data.
In this way, the same algorithm with a similar set of hyperparameters is exposed to different subsets of the training data, resulting in a slight difference between the individual models. The predictions of these individual models are combined by taking the average of all the values for regression or a majority vote for a classification problem. Random forest is an example of the bagging method.
Bagging works well when the algorithm we use to build our model has high variance. By this we mean the model built changes a lot even with slight changes in the data. As a result this, these algorithms overfit easily if not controlled. Recall, that decision trees are prone to overfitting if we don’t tune the hyperparameters well. Bagging works very well for high-variance models like decision trees.
Boosting is an another popular approach to ensembling. This technique combines individual models into a strong learner by creating sequential models such that the final model has higher accuracy than the individual models. Let us understand this.
These individual models are connected in such a way that the subsequent models are dependent on errors of the previous model and each subsequent model tries to correct the errors of the previous models.