Difficulty
easy
Average duration
2 hrs
Technologies
Data Science
Numpy
sklearn
matplotlib
Pandas
Machine Learning
Python
SQL Alchemy
nltk
Difficulty
easy
Average duration
2 hrs
Technologies
Data Science
Numpy
sklearn
matplotlib
Pandas
Machine Learning
Python
SQL Alchemy
nltk
Follow these instructions:
Train a K-Nearest Neighbors (KNN) model to predict the quality of red wine based on its chemical properties. Could AI help you choose a sommelier-worthy wine?
We will use the following red wine dataset extracted from Wine Quality Data Set - UCI
1https://raw.githubusercontent.com/4GeeksAcademy/k-nearest-neighbors-project-tutorial/refs/heads/main/winequality-red.csv
Each row represents a wine. The columns describe its chemical composition:
fixed acidity, volatile acidity, citric acid
residual sugar, chlorides
free sulfur dioxide, total sulfur dioxide
density, pH, sulphates, alcohol
The target column is label:
0 = Low quality
1 = Medium quality
2 = High quality
Load the data. Load the CSV with Pandas and explore its structure.
Train the KNN model:
Separate the independent variables (X) from the target (y).
Split into training and testing sets (80/20).
Scale the data if necessary (highly recommended with KNN!).
Train the model with an initial k value.
Evaluate performance using:
accuracy_score
confusion_matrix
classification_report
Optimize k. Create a loop to test different k values (e.g., from 1 to 20).
Save the results in a list.
Plot accuracy vs k to find the best value.
Create a function that takes numerical values and predicts the quality:
1predict_wine_quality([7.4, 0.7, 0.0, 1.9, 0.076, 11.0, 34.0, 0.9978, 3.51, 0.56, 9.4]) 2>>> "This wine is likely of medium quality 🍷"
Note: We also provide solution samples in
./solution.ipynb
, which we honestly suggest you only use if you're stuck for more than 30 minutes or if you've already finished and want to compare it with your approach.
You worked with a real dataset from the UCI Machine Learning Repository, applied supervised classification models, analyzed chemical features, and developed a function that simulates a sommelier's judgment using AI. That deserves to be shared!
"Can artificial intelligence predict the quality of wine? 🍷 I trained a KNN model with real data from the UCI ML Repo and achieved 73% accuracy in classifying wines as low, medium, or high quality using only their chemical composition. The data doesn't lie: alcohol and sulfate are more revealing than a label! 😉 #MachineLearning #DataScience #WineLovers #AI #scikitLearn"
Once you have finished solving the case study, make sure to commit your changes, push to your repository, and go to 4Geeks.com to submit the repository link.
Difficulty
easy
Average duration
2 hrs
Technologies
Data Science
Numpy
sklearn
matplotlib
Pandas
Machine Learning
Python
SQL Alchemy
nltk
Difficulty
easy
Average duration
2 hrs
Technologies
Data Science
Numpy
sklearn
matplotlib
Pandas
Machine Learning
Python
SQL Alchemy
nltk
Difficulty
easy
Average duration
2 hrs
Technologies
Data Science
Numpy
sklearn
matplotlib
Pandas
Machine Learning
Python
SQL Alchemy
nltk
Difficulty
easy
Average duration
2 hrs
Technologies
Data Science
Numpy
sklearn
matplotlib
Pandas
Machine Learning
Python
SQL Alchemy
nltk
Difficulty
easy
Average duration
2 hrs
Technologies
Data Science
Numpy
sklearn
matplotlib
Pandas
Machine Learning
Python
SQL Alchemy
nltk
Difficulty
easy
Average duration
2 hrs
Technologies
Data Science
Numpy
sklearn
matplotlib
Pandas
Machine Learning
Python
SQL Alchemy
nltk