In the previous segment, you learnt about polynomial regression, which is one of the methods to handle nonlinear data. In this segment, you will learn about the other method that can be used to handle nonlinearity and still satisfy the assumptions of linear regression by transforming the data. Let’s try and understand data transformation through code in the forthcoming video.

Note: In order to explore some of the commonly seen shapes, please refer to the following link.

We saw in the code, that the relationship between the predictor and response variable was non-linear. We also saw that using Linear Regression did not give good results since the actual fit should not be a straight line but should instead follow the data points along the non-linear curve which here resembles a logarithmic function.

However, linear regression assumes that the relationship between the response and predictor variables is linear. In order to continue using the linear regression framework, we transformed the data so that the relationship does become linear.

If the residual plot indicates the presence of nonlinear relations in the data, then a simple approach is to use nonlinear transformations of the predictors. For instance, for a predictor x, these transformations can be log(x), sqrt(x), exp(x), etc., in the regression model.

We need to remember that although log is the most commonly used function for transformations, it is not the only one that we can use. There are a few other functions that we can use depending upon the shape of the data.

Let’s take a look at some of these functions.

ln(y)=β0+β1x

We know that both the response and the predictor variables can be transformed. If the data looks something like above, then you may want to try taking a log transform of the response variable. Here, you can fit the model with ln(y) as the response variable and x as the predictor variable.

y=β0+β1e−x

In this case, when the y values rise quickly for smaller values of x and start flattening out for larger values of x, then we can use an exponential transformation. Fit the model with y as the response variable and e−x as the predictor variable.

Note: The solid blue line is when β1 is less than 0 and the dotted line is when β1 is greater than 0.

#### How do we decide when and what to transform?

Essentially, to handle nonlinear data, we may have to try different transformations on the data to determine a model that fits it well. Hence, we may try polynomial models or transformations of the x-variable(s) or the y-variable, or both. These transformations can be square root, logarithmic or reciprocal transformations, although this is not an exhaustive list. Generally, one of these transformations helps. In the forthcoming video, Anjali explains when and how to perform data transformation.

#### Transform the predictors or the response?

When we transform the response variable, we will change both its errors and variance. Hence, we should transform the response variable only if the error terms are not normal or if the residuals exhibit non-constant variance, as seen in the residual plots. Transformations that can be applied on the response variable can include natural log, square root or inverse, i.e., ln(y), √y or 1/y, respectively.

However, if nonlinear trends are observed between the predictor and response variables, then we should first try to transform the predictor variable(s). Despite these guidelines, the transformations may not always work. In the next segment, you will learn about Nonlinear regression.

Report an error