In the previous segment, you learnt that Ridge regression retains all the variables that are present in the data. Now, when the number of variables is very large and the data may have unrelated or noisy variables, we may not want to keep such variables in the model. Lasso regression helps us here by performing feature selection. Let’s watch the forthcoming video and understand how it does so.
The primary difference between Lasso and Ridge regression is their penalty term. The penalty term here is the sum of the absolute values of all the coefficients present in the model. As with Ridge regression, Lasso regression shrinks the coefficient estimates towards 0. However, there is one difference. With Lasso, the penalty pushes some of the coefficient estimates to be exactly 0, provided the tuning parameter, λ, is large enough.
Hence, Lasso performs feature selection. Choosing an appropriate value of lambda is critical here as well. Because of this, it is easier to interpret models generated by Lasso as compared with those generated by Ridge. Also, just like with Ridge regression, standardisation of variables is necessary for Lasso as well. Now, in the forthcoming video, we will see how Lasso is implemented using Python.
So, to summarise:
- The behaviour of Lasso regression is similar to that of Ridge regression.
- With an increase in the value of lambda, variance reduces with a slight compromise in terms of bias.
- Lasso also pushes the model coefficients towards 0 in order to handle high variance, just like Ridge regression. But, in addition to this, Lasso also pushes some coefficients to be exactly 0 and thus performs variable selection.
This variable selection results in models that are easier to interpret.
In the forthcoming video, Anjali explains variable selection feature in Python.
Correction:
Anjali meant to say lambda value of 0.001 instead of 0.01 at 0:27 and 0:58.
Generally, Lasso should perform better in situations where only a few among all the predictors that are used to build our model have a significant influence on the response variable. So, feature selection, which removes the unrelated variables, should help. But Ridge should do better when all the variables have almost the same influence on the response variable.
It is not the case that one of the techniques always performs better than the other – the choice would depend upon the data that is used for modelling.
In the next segment, we will look at the python implementation of ridge and lasso regression.