IKH

Identifying Nonlinearity in Data

The linear regression model assumes that there is a linear relationship between the predictors and the response variable. However, if the true relationship is nonlinear, then virtually all of the conclusions that we draw from the fit do not hold much credibility. In addition, the prediction accuracy of the model can be reduced significantly. In the forthcoming video, Anjali explains how we can identify nonlinearity in data.

So, in the video, you learnt how to identify nonlinearity in data for simple linear regression and multiple linear regression: 

  • For Simple Linear Regression
    • Plot the independent variable against the dependent variable to check for nonlinear patterns.
  • For Multiple Linear Regression, since there are multiple predictors, we, instead, plot the residuals versus the predicted values, ^yi. Ideally, the residual plot will show no observable pattern. In case a pattern is observed, it may indicate a problem with some aspect of the linear model. Apart from that:
    • Residuals should be randomly scattered around 0.
    • The spread of the residuals should be constant.
    • There should be no outliers in the data

If nonlinearity is present, then we may need to plot each predictor against the residuals to identify which predictor is nonlinear.

How to handle nonlinear data?

Once we have the residual plots showing nonlinearity, we might need to make some changes, either to the model or to the data. In the upcoming video, you will learn how to do that.

In the above video, you learnt that there are three methods to handle nonlinear data:

  • Polynomial regression
  • Data transformation
  • Nonlinear regression

In the next segment, we will deal with the first method to handle non-linearity in data that is Polynomial Regression.

Report an error