Explore our extensive collection of courses designed to help you master various subjects and skills. Whether you're a beginner or an advanced learner, there's something here for everyone.


Learn live

Join us for our free workshops, webinars, and other events to learn more about our programs and get started on your journey to becoming a developer.

Upcoming live events

Learning library

For all the self-taught geeks out there, here is our content library with most of the learning materials we have produced throughout the years.

It makes sense to start learning by reading and watching videos about fundamentals and how things work.

Search from all Lessons

LoginGet Started
← Back to Lessons
Edit on Github

Summary of Supervised Learning models

The following is a brief review of the different models studied and when and why they are used, as a practical guide to always knowing how to choose the best option:

Classifiers and returners

Depending on the nature of the model and its mathematical definition, they could be used for classification, prediction (regression) or both:

Logistic Regression
Linear Regression
Regularized Linear Regression
Decision Tree
Random Forest
Naive Bayes
K Nearest Neighbors

In addition, we can easily implement it using the following functions:

Logistic Regressionsklearn.linear_model.LogisticRegression-
Linear Regression-sklearn.linear_model.LinearRegression
Regularized Linear Regression-sklearn.linear_model.Lasso
Decision Treesklearn.tree.DecisionTreeClassifiersklearn.tree.DecisionTreeRegressor
Random Forestsklearn.ensemble.RandomForestClassifiersklearn.ensemble.RandomForestRegressor
Naive Bayessklearn.naive_bayes.BernoulliNB
K Nearest Neighborssklearn.neighbors.KNeighborsClassifiersklearn.neighbors.KNeighborsRegressor

Description and when to use

Knowing what the role of each model is and when we can/should use it is vital to performing our work efficiently and professionally. Below is a comparison chart that addresses this information:

ModelUtilityRecommended useExamples of use cases
Logistic RegressionUsed to classify binary or multiclass (less common) events.Useful when the relationship between the characteristics and the target variable is linear. It requires the features to be linearly independent.Classification of emails as spam or non-spam. Disease detection based on symptoms and medical tests. Predicting a customer's probability of buying a product.
Linear RegressionUsed to predict continuous numerical values.Useful when the relationship between the characteristics and the target variable is linear. It requires that the characteristics have a significant correlation with the target variable to obtain good results.Predicting the price of a house based on its size, number of rooms and location. Estimation of a student's academic performance based on his or her hours of study and previous grades.
Regularized Linear RegressionSimilar to linear regression but including a parameter to avoid overfitting.Useful when there is multicollinearity between characteristics or to avoid over-fitting of the traditional model.Prediction of the price of a car based on characteristics such as year of manufacture, make, model, and applying regularization to avoid overfitting. Estimation of an employee's salary based on work experience and education level, with regularization to reduce the influence of irrelevant characteristics.
Decision TreeUsed to classify or predict continuous numerical values.Useful when the relationships between the characteristics and the target variable are nonlinear or complex. Can handle numerical and categorical features without the need for standardization.Prediction of customer loyalty based on purchase history. Classification of movies according to their genre and characteristics. Fraud detection in financial transactions.
Random ForestUsed to classify or predict continuous numerical values. It combines multiple decision trees.Useful when the dataset is large and complex, avoiding over-fitting and improving accuracy.Image classification for target recognition. Prediction of housing prices based on multiple features. Diagnosis of diseases based on multiple medical tests.
BoostingUsed to classify or predict continuous numerical values. It combines multiple decision trees created sequentially to correct for errors in previous models.Useful when more accurate models than individual models are desired and sufficient computational power is available.Sentiment analysis in text to classify opinions as positive or negative. Detection of anomalous behavior in security systems. Predicting customer revenue based on multiple factors.
Naive BayesUsed to classify binary or multiclass events.Useful when there is conditional independence between features (since it is the foundation of the model). It works well when the data set contains categorical features or represents word frequencies.Text classification problems and categorization tasks. Classification of product reviews as positive or negative.
K nearest neighborsUsed to classify or predict continuous numerical values.Useful when you have a dataset with non-linear relationships and when the local structure of the data is important. The data set must be standardized.Recommendation of similar products on an e-commerce site. Classification of diseases based on symptoms and medical history. Predicting the price of a house based on similar prices of nearby properties.

Apart from the example cases and their definitions, the developer's and professional's criteria prevails, and depending on the use case, the data and its characteristics, sometimes even models that are not optimized for that purpose can be useful.