Let’s start implementing the **cost function **for linear regression, which is also known as **MSE** (Mean Squared Error). MSE computes the ‘goodness of fit’ for a fitted line. The cost function takes two values, i.e. (m,c), where m is the coefficient and c is the intercept. The cost function iterates through each point in the given dataset and then computes the sum of the square distance between each point and the line.

Our goal is to minimise the cost function, which will result in lower error values. If we minimise the cost function, we will get the best fit line to our data.

Gradient descent is an optimisation algorithm used to find the values of the parameters (coefficients) of a function (f) that minimises a cost function. To run gradient descent on the error function above, you first need to compute its gradient. To calculate it, you will need to differentiate your error function. Since the function is defined by two parameters (m and c), you need to compute a partial derivative for each.

Now, we will iterate and update **m** and **c** to get the lower error value than the value from the previous iteration.

Please download the dataset from here. Download the Python code file from the link given below:

Let’s now implement the concept above in Python:

The ‘learning_rate’ variable controls the steps we take in downward direction in each iteration. If ‘learning_rate’ is too low, the algorithm may take longer to reach the minimum value. On the other hand, if it is high, the algorithm may overstep the minimum value.

The ‘m_gradient’ variable corresponds to ∂J∂m , and ‘c_gradient’ corresponds to ∂J∂c. Moreover, m_current and c_current correspond to the steps in which we are updating the values.

The graph above shows how the cost decreases as we increase the number of iterations. The graph above explains the concept of gradient descent, which is to find out the values of the coefficient by minimising the cost function.

In the next segment, you will go through the Python code for implementing multiple linear regression using gradient descent.