← Back to Lessons
Open in Colab

Model Evaluation

Evaluation of a model

The evaluation of a model is one of the most important steps in the Machine Learning process, since it will let us know how good our model is, how much it has learned from the training sample (train) and how it will perform for never-before-seen or new data (test and/or validation).

To evaluate a model, there are certain sets of metrics that are distinguished according to whether a model allows classification or regression.

Metrics for classification models

A classification model is used to predict a category or the class of an observation. For example, we might have a model that predicts whether an email is spam (1) or not spam (0), or whether an image contains a dog, a cat, or a bird. Classification models are useful when the output variable is categorical.

Metrics that can be applied to these types of models are as follows:

  • Accuracy. Measures the percentage of predictions that the model got right with respect to the total it made. For example, how many emails did the model manage to classify well?
  • Recall. Measures the proportion of true positives that the model was able to identify. For example, how many emails that are actual spam did the algorithm manage to identify well, removing the non-spam emails that it misclassified?
  • F1 score: This is the average of precision and recall. It is useful when classes are unbalanced.
  • Area Under the Curve (AUC): Describes the probability that a model classifies a randomly chosen positive instance higher than a randomly chosen negative one.

Metrics for Regression Models

A regression model is used to predict a continuous value. For example, we might have a regression model that predicts the price of a house based on characteristics such as its size, number of bedrooms, and location. Regression models are useful when the output variable is continuous and numeric.

Metrics that can be applied to this type of model are as follows:

  • Mean Absolute Error (MAE). Mean absolute difference between predictions and actual values.
  • Mean Squared Error (MSE). Similar to above, but squares the differences before performing the division.
  • Root Mean Squared Error (RMSE). It is the square root of the MSE.
  • Coefficient of determination (R2R^2). Proportion of variation in the target that is predictable from the characteristics.

The scikit-learn package makes it easy to apply these functions to models. The documentation is available here.