As we have discussed earlier, we need to minimise the residual sum of squares (RSS) to obtain the best linear regression model. For this, we need to find the optimal b0 and b1 model coefficients so that our model has the least RSS value. Let’s go ahead and watch the forthcoming video and see how we can do this.
So, in the video, we have discussed two methods to obtain our model coefficients by minimising RSS:
- Using an optimisation algorithm, such as gradient descent
Gradient descent is an iterative optimisation algorithm to find the minimum of a cost function; this means we apply a certain update rule over and over again, and following that, our model coefficients or betas would gradually improve according to our objective function.
To perform gradient descent, we initialise the weights to some value (e.g., all zeros) and repeatedly adjust them in the direction that decreases the cost function. We repeat this procedure until the betas converge, or stop changing much. Ultimately, the final betas would be close to the optimum. Note that in the above video, we have tried to provide an intuitive understanding of the gradient descent method. To gain a better mathematical understanding, please revise the content on gradient descent.
- Using normal equations to solve for model coefficients
Solving normal equations requires a sound knowledge of derivatives. Therefore, refer to this link to revise the basics of derivatives. In the following video, we will look at the same in detail.
In the above video, you learnt how derivatives help with calculating the value of x at which the function is at its minimum. Similarly, in normal equations, we calculate the model coefficients b0 and b1 at which our cost function, i.e., RSS, is minimum by using derivatives. In order to do this:
- Take the derivative of the cost function w.r.t. b0 and b1,
- Set each equation to 0 and.
- Solve the two equations to get the best values for the parameters b0 and b1.
Using calculus, we have calculated the model coefficients using the following formulae:
Please note that we have not derived the equations here. If you wish to know how to expand the cost function and get the coefficients using partial derivatives, please go through this link.
The results computed using these normal equations and the gradient descent approach are generally the same; just the methods are different.
Now, it’s time to find out whether normal equations in Python provide us with the same answer. Let’s watch the next video and find it out with Anjali.
In the video, we saw that, the coefficients we get after applying normal equations in Python are same as we obtained from building model using scikit-learn. In next segment, we will see how we can represent the simple linear regression equation in matrix form.