The term ‘**multiple**‘ in multiple linear regression is self-explanatory; it represents the relationship between two or more independent input variables and a response variable. Multiple linear regression is needed when one variable is not sufficient to create a good model and make accurate predictions.

Let’s hear Rahim talk about it.

You saw that multiple linear regression proved to be useful in creating a better model, as there was a significant change in the value of the R-squared. Recall that the R-squared for simple linear regression using ‘TV’ as the input variable was 0.816. When you have two variables as input, namely ‘Newspaper’ and ‘TV’, the R-squared increases to 0.836. Using ‘Radio’ along with ‘TV’ increased its value to 0.910. So, it seems that adding a new variable helps explain the variance in the data better.

It is recommended that you check the R-squared after adding these variables to see how much the model has improved.

Let’s now look at the formulation of multiple linear regression; it is just an extension of simple linear regression. Hence, the formulation is largely the same.

Most of the concepts in multiple linear regression are quite similar to those in simple linear regression. The formulation for predicting the response variable now becomes this:

Y=β0+β1X1+β2X2+…+βpXp+ϵ

However, some other aspects still remain the same:

- The model now fits a hyperplane instead of a line.
- Coefficients are still obtained by minimising the sum of squared errors, the least squares criteria.
- For inference, the assumptions from simple linear regression still hold: zero mean, independent and normally distributed error terms with constant variance.