Error Metrics for regression tasks
In the machine learning pipeline an important role can be assigned to the evaluation of the performance of the model built.
Within a regression task, the model is trained on the input features X^{nxm} to predict the continuous target variable y . The model outputs the prediction \hat{y} .
Mean Squared Error (MSE)
A popular error metric for regression tasks is the Mean Squared Error (MSE). The idea behind this error metric is simple. The metric takes average of the squared difference between the real target value y and the predicted value \hat{y} :
MSE = \frac{1}{n} \sum{(y-\hat{y})}^{2}
Because of the fact that the metric squares the difference between the actual target value and the prediction, the metric pays more attention to bigger differences between the actual and predicted value and penalizes them in a severe way. A big advantage of this error metric is that it is easily differentiable and can be used for various models in optimization. On the other side the metric is not on the same scale as the target variable and therefore is difficult to interpret.
Mean Squared Error (MSE)
The Root Mean Squared Error is closely related to the MSE and can be considered as the most popular metric for regression tasks. Technically, the MSE is calculated and afterwards the square root is calculated:
RMSE = \sqrt{\frac{\sum{(y-\hat{y})}^{2}}{n}}
By calculating the square root of the MSE, the metric takes on the same level as the target variable and the comparison as well as the interpretation facilitated.
Mean Absolute Error (MAE)
After having learned about the MSE and the RMSE, the Mean Absolute Error is easy to understand. Instead of squaring the difference between the actual and the predicted value, the MAE simply evaluates the absolute difference of the actual and the predicted value
MSE = \frac{1}{n} \sum{|y-\hat{y}|}
The MAE is thus a linear error metric and penalizes greater differences between the actual and the predicted value less severely as the MSE or RMSE. Therefore the metric can be considered being more robust to outliers than the metrics mentioned before.
R-Squared
Another popular error measure is the so-called R^{2} . In contrast to the error measures presented before, the R-Squared takes a baseline and compares it to the prediction. There are two ways to calculate the measure:
R^{2} = \frac{\sum_{i=1}^{N} (\hat{y}_{i} – \bar{y})^{2}}{\sum_{i=1}^{N} (y_{i} – \bar{y})^{2}} = \frac{explained \, variation}{integral \, variation} or
R^{2} = 1 – \frac{\sum_{i=1}^{N} (\hat{y}_{i} – y_{i})^{2}}{\sum_{i=1}^{N} (y_{i} – \bar{y})^{2}} = 1 – \frac{unexplained \, variation}{integral \, variation}The R-Squared metric measures thus how much percent of the integral variation can be explained by the model. Normally the measure is between 0 and 1. If the measure is near 1, the model explains the data quite well and if the measure is near 0, the model performs more poorly. The R^{2} is quite popular because it is easy to interpret.
One problem can arise using R-Squared: The measure can be artificially boosted close to 1 by adding multiple (even non-relevant) features. The model would be that way very large, complex and unstable, but the R^{2} measure suggests that the model performs really well. Therefore the R-Squared metric takes a penalization for to many independent variables (p) into account:
R_{corr}^{2} = 1- (1 – R^{2}) * \frac{n-1}{n-p-1}The corrected R-Square punishes the metric more severe the more features are taken into account to calculate the prediction.
There are several more error metrics for continuous variables, but this overview should give an idea about the most common metrics used for continuous variable evaluation in Machine Learning.
Sources
- Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome H. (2017): The elements of statistical learning. Data mining, inference, and prediction. Second edition, corrected at 12th printing 2017. New York, NY: Springer (Springer series in statistics).
- Minaee, Shervin (2019): 20 Popular Machine Learning Metrics. Part 1: Classi cation & Regression Evaluation Metrics. Hg. v. medium.com. Online verfügbar unter https://towardsdatascience.com/20-popular-machine-learning-metrics-part-1-classification-regression-evaluation-metrics-1ca3e282a2ce.
- Pflieger, Verena (2014): Bestimmtheitsmaß R² – Teil 2: Was ist das eigentlich, ein R²? Hg. v. Intw-Statistic. Online verfügbar unter https://www.inwt-statistics.de/blog-artikel-lesen/Bestimmtheitsmass_R2-Teil2.html.
- Srivastava, Tavish (2016): 11 Important Model Evaluation Metrics for Machine Learning Everyone should know. Hg. v. analyticsvidhya.com. Online verfügbar unter https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/.
- Wu, Songhao (2020): 3 Best metrics to evaluate Regression Model? Hg. v. medium.com. Online verfügbar unter https://towardsdatascience.com/what-are-the-best-metrics-to-evaluate-your-regression-model-418ca481755b.