Explore our extensive collection of courses designed to help you master various subjects and skills. Whether you're a beginner or an advanced learner, there's something here for everyone.
Join us for our free workshops, webinars, and other events to learn more about our programs and get started on your journey to becoming a developer.
For all the self-taught geeks out there, here is our content library with most of the learning materials we have produced throughout the years.
It makes sense to start learning by reading and watching videos about fundamentals and how things work.
Data Science and Machine Learning - 16 wks
Full-Stack Software Developer - 16w
Search from all Lessons
Curated list of small interactive and incremental exercises you can take to get better at any coding skill.
Curated section of projects to build while learning with simple instructions, videos, solutions, and more.
Guides on different topics related to the technologies that we teach in our courses
Follow the instructions below:
Once you have finished solving the exercises, be sure to commit your changes, push to your repository and go to 4Geeks.com to upload the repository link.
Naive Bayes models are very useful when we want to analyze sentiment, classify texts into topics or recommendations, as the characteristics of these challenges meet the theoretical and methodological assumptions of the model very well.
In this project you will practice with a dataset to create a review classifier for the Google Play store.
The dataset can be found in this project folder under the name
playstore_reviews.csv. You can load it into the code directly from the link (
https://raw.githubusercontent.com/4GeeksAcademy/naive-bayes-project-tutorial/main/playstore_reviews.csv) or download it and add it by hand in your repository. In this dataset you will find the following variables:
package_name. Name of the mobile application (categorical)
review. Comment about the mobile application (categorical)
polarity. Class variable (0 or 1), being 0 a negative comment and 1, positive (numeric).
In this case, we have only 3 variables: 2 predictors and a dichotomous label. Of the two predictors, we are really only interested in the comment part, since the fact of classifying a comment as positive or negative will depend on its content, not on the application from which it was written. Therefore, the
package_name variable should be removed.
When we work with text as in this case, it does not make sense to do an EDA, the process is different, since the only variable we are interested in is the one that contains the text. In other cases where the text is part of a complex set with other numeric predictor variables and the prediction objective is different, then it makes sense to apply an EDA.
However, we cannot work with plain text, it must first be processed. This process consists of several steps:
Once we have finished we will have the predictors ready to train the model.
Start solving the problem by implementing a model of which you will have to choose which of the three implementations to use:
BernoulliNB, according to what we have studied in the module. Try now to train it with the two other implementations and confirm if the model you have chosen is the right one.
After training the model in its three implementations, choose the best option and try to optimize its results with a random forest, if possible.
Store the model in the appropriate folder.
Which other models of the ones we have studied could you use to try to overcome the results of a Naive Bayes? Argue this and train the model.