In this session, you built a simple linear regression model in Python using the advertising dataset. You also saw some more theoretical aspects in between. Here’s a brief of what you learnt in this session.
- A quick recap of simple linear regression
- Assumptions of simple linear regression
- Linear relationship between X and y.
- Normal distribution of error terms.
- Independence of error terms.
- Constant variance of error terms.
3. Hypothesis testing in linear regression
- To determine the significance of beta coefficients.
- H0:β1=0;HA:β1≠0.
- T-test on the beta coefficient.
- t score=^βiSE(^βi).
4. Building a linear model
- OLS (Ordinary Least Squares) method in statsmodels to fit a line.
- Summary statistics
- F-statistic, R-squared, coefficients and their p-values.
5. Residual Analysis
- Histogram of the error terms to check normality.
- Plot of the error terms with X or y to check independence.
6.Predictions
- Making predictions on the test set using the ‘predict()’ function.
7. Linear Regression using SKLearn
- A second package apart from statsmodels for linear regression.
- A more hassle-free package to just fit a line without any inferences.
Rahim has also answered some common doubts surrounding linear regression in the following additional segment. You can go through the segment here. This part has also been included in the notebook provided to you at the beginning of the session.
Coming Up
In the next session, you will move from simple linear regression to multiple linear regression wherein you will use multiple independent variables to explain a single dependent variable.