Before going into the implementation part of linear regression, let’s have a quick recap of the basics of the Linear Regression algorithm. Go through the following video where Sajan will take you through the concepts involved in linear regression algorithm.
Note:
At 1:03, the Unsupervised learning method point in the slide has a typo. It should be ” are not available” instead of ”is not available”.
As explained by Sajan, machine learning algorithms are classified into:
- Supervised Learning: Linear regression, Logistic Regression
- Unsupervised Learning: Clustering
As explained, the Linear Regression model attempts to explain the relationship between a dependent variable and an independent variable using a straight line. For example, the Sales Prediction of a company based on the marketing budget where:
- Sales prediction is a dependent variable.
- Marketing budget is an independent variable.
Now the first step can be visualising the historic data points present and drawing a scatter plot.
Now what you want to do is fit a linear straight line through these data points which can learn the behaviour from these data points and predict the value of actual sales given a marketing budget. So a simple linear line can be fitted as:
Now the equation of any straight line can be written as:
Y=βo+β1X
Where,
βo= Intercept
β1= Slope
Now all the points in a dataset can’t lie on the linear regression line. So the difference between the actual value and the predicted value is residual. In the next video let’s hear from Sajan about the residuals and finding the best fit line.
As explained above, residuals can be considered as between the actual value and the predicted value. The residuals or error is written as:
ei=yi−ypred
Now the sum of squared error is written as:
e21+e22+….+e2n (Residual Sum of Squares)
RSS=(Y1−βo−β1X1)2+(Y2−βo−β1X2)2+….+(Yn−βo−β1Xn)2
So, RSS=∑ni=1(Yi−β0−β1Xi)2
RSS is also known as Residual Sum of Squares. Now in order get the best fit line, the RSS value should be minimised to get the optimal values of Bo and B1. Now the RSS can be minimised using the following methods:
- Gradient descent
- Differentiation
Gradient Descent is the most common approach used in the industry.
As explained by Sajan, the drawback of RSS is that it looks at the absolute number. In order to mitigate this drawback, the strength of the Linear Regression model is mainly explained using R2
R2=1−(RSS_TSS)
Where,
RSS=Residual Sum of Square.
TSS=Total Sum of Squares.
TSS is a measure of how a data set varies around a central number (for example, mean).
As discussed by Sajan in the video, higher is the value of R2, better is the model.
In the subsequent segments, you will learn about the model-building techniques and then linear regression in little more detail.
Report an error