The objective of this segment is to learn how to implement the XGBoost algorithm in python.

But before that, let’s look at some of the hyperparameters used in XGBoost.

**Hyperparameters – Learning Rate, Number of Trees and Subsampling**

λt, the **learning rate, **is also known as **shrinkage. **It can be used to regularize the gradient tree boosting algorithm. λt typically varies from 0 to 1. Smaller values of λt lead to a larger value of a number of trees T (called *n_estimators* in the Python package XGBoost). This is because, with a slower learning rate, you need a larger number of trees to reach the minima. This, in turn, leads to longer training time.

On the other hand, if λt is large, we may reach the same point with a lesser number of trees (*n_estimators*), but there’s the risk that we might actually miss the minima altogether (i.e. cross over it) because of the long stride we are taking at each iteration.

Some other ways of regularization are explicitly specifying the **number of trees T** and doing **subsampling**. Note that you shouldn’t tune both λt and number of trees T together since a high λt implies a low value of T and vice-versa.

**Subsampling** is training the model in each iteration on a fraction of data (similar to how random forests build each tree). A typical value of subsampling is 0.5 while it ranges from 0 to 1. In random forests, subsampling is critical to ensure diversity among the trees, since otherwise, all the trees will start with the same training data and therefore look similar. This is not a big problem in boosting since each tree is any way built on the residual and gets a significantly different objective function than the previous one.

**γ** ,**Gamma **is a parameter used for controlling the pruning of the tree. A node is split only when the resulting split gives a positive reduction in the loss function. Gamma specifies the minimum loss reduction required to make a split and makes the algorithm conservative. The values can vary depending on the loss function and should be tuned.

Apart from the above-mentioned hyperparameters, there are other parameters of decision trees like the depth of the tree, the minimum number of samples required for split etc.

#### Note

The mathematical understanding behind XGboost is provided in the optional section.

For a detailed understanding, please attend the live session on Saturday.

With all this understanding let’s move to the next video where Snehanshu will give you a walkthrough on how to do classification with XGBoost.

Now, that you have gone through classification, let’s dive into the regression problem.

Snehanshu will now explain how we approach regression problem wiith XGBoost through a code walkthrough.

You can implement the same by downloading the notebook attached below:

Download and implement the code in the following notebook to get an understanding of both XGBoost classification & regression.