IKH

Model Evaluation – Regression Metrics

Now that you have learnt different evaluation metrics for a classification problem, let’s look into some of the commonly used regression evaluation metrics. The formulas have just been provided for your reference; you don’t need to know them by heart, only what they represent.

Root Mean Square Error(RMSE):

RMSE is the most commonly used regression metrics. It represents the sample standard deviation between the actual and the predicted values. It is calculated using the formula:

$${RMSE=\sqrt{\frac1n{\textstyle\sum_{i=1}^n}}\left(y_i-\widehat y\right)^2}$$

Where yi and ^y are the actual and predicted values respectively and n is the number of observations.

Mean Absolute Error(MAE):

MAE is the average of the absolute difference between the actual and the predicted values. It is calculated using the formula:

$${MAS=\frac1n{\textstyle\sum_{i=1}^n}\begin{vmatrix}y_i-\widehat y\end{vmatrix}}$$

Where yi and ^y are the actual and predicted values respectively and n is the number of observations. 

Mean Absolute Percentage Error(MAPE):

MAPE is the percentage equivalent of Mean Absolute Error. It can be defined as the percentage ratio of the residue over the actual value. It is in the form of percentage which makes it easier for people to comprehend, hence it is also a commonly used metric. It is calculated using the formula:

$$$$

Where yi and ^y are the actual and predicted values respectively and n is the number of observations. 

R-Squared (R²)

R², also known as the coefficient of determination, measures the proportion of variation in the dependent variable that is explained by all the independent variables in the model. Higher the value of R² better is the model. It is calculated using the formula:

Where 1n∑ni=1(yi−^y)2 is the Mean Square Error(MSE) and 1n∑ni=1(yi−¯y)2 is the variance in the y values.

Adjusted R-Squared

Adjusted R² is the modified version of R² and it adjusts for the number of independent variables in a model. It is a proportion of variation that is explained by the independent variables actually affecting the dependent variable. It is calculated using the formula:

Where n is the total number of observations and k is the number of independent variables in the model.

The value of adjusted R² is always less than or equal to the R² value. R² keeps increasing on the addition of independent variables into the model irrespective of how well they are correlated with the dependent variable whereas adjusted R² penalises the model for the inclusion of insignificant variables which do not help in improving the existing model. Hence, it is recommended to assess the performance of a multiple linear regression model by using only adjusted R² instead of a R² value because it is quite misleading.

Please refer to the link provided below to understand which metric you should choose and the reasons behind it, in case of a regression problem.

Report an error