The negative gradients at an iteration t act as a target variable to fit a model hm and for a regression setting with a square loss function, this target variable was the residue.
To explain this in just words, at each iteration we add an incremental model, which fits on the negative gradients of the loss function evaluated at current target values. This incremental model can be a linear regression model or a decision tree or any other model.
We stop when we see that the gradients are very close to zero. For a regression setting, this means that when the residuals are very close to zero, we stop iterating. Here, γm, known as the step multiplier, is typically between 0 and 1.
In other words, introducing step multiplier results in a small step in the right direction.
γ corresponds to the hyperparameter α in terms of the utility (both determine by what amount an update should be made), but here γ is trainable while α is a hyperparameter.
Here at each iteration, a new model(weak learner) is added and with each new addition, we can observe a reduction in the pseudo residuals which is an indication that we are moving in the right direction of the target values.
Refer to the original documentation for both Regression & Classification,
You can try out GBM for classification using the notebook below.
The dataset used for the Attrition prediction can be downloaded here.
You can try out GBM for regression using the notebook below.
The dataset used for the Sales prediction can be downloaded here.
Additional Reading
For practical implementation please go through this blog.
Visit this kernel to see how each parameter is can be tuned for model improvement.
To check how we can do gradient boosting from scratch, you can check this article.
To play with different hyperparameters in Gradient Boosting, visit this website.