This project aims to build a supervised classification model that, based on demographic and socioeconomic data of an adult (age, education level, occupation, marital status, country of origin, etc.), predicts whether the person will earn more or less than $50,000 per year.
Based on the model's results, students must develop an interpretative recommendation system capable of suggesting possible strategies or changes to increase the likelihood of surpassing that income threshold.
Follow these instructions:
Load the dataset. We will use the Adult Income Dataset, also known as "Census Income". This information was collected by the U.S. Census Bureau and downloaded by the academy to store it in this project folder under the name adult-census-income.csv
. Alternatively, you can load it directly in your code from the following link:
1https://raw.githubusercontent.com/4GeeksAcademy/predicting-your-future-with-data/main/adult-census-income.csv
This dataset includes variables such as:
Data preprocessing. Clean null or misencoded data, transform categorical variables, and normalize numerical variables.
Define the recommendation problem. Plan how you will structure your recommendation system:
Build the recommendation system. Use one of the following approaches:
Content-based filtering. Represent each user as a vector and calculate similarities between users and recommendations.
Collaborative filtering. Simulate a user vs. trajectory matrix. Apply k-NN, Pearson correlation, or matrix factorization.
Hybrid system. Combine both approaches.
Test with simulated cases. Build simulated profiles of hypothetical users and observe what trajectories (education, occupation, etc.) the system would recommend to improve their estimated income.
1# Example: 25-year-old user, high school graduate, works part-time 2user_profile = {...}
Once you have completed the practical case, make sure to commit your changes, push them to your repository, and go to 4Geeks.com to submit the repository link.