You will not be forking this time, please take some time to read this instructions:
Once you are finished creating your decision tree model, make sure to commit your changes, push to your repository and go to 4Geeks.com to upload the repository link.
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes.
Pregnancies: Number of times pregnant
Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
BloodPressure: Diastolic blood pressure (mm Hg)
SkinThickness: Triceps skin fold thickness (mm)
Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index (weight in kg/(height in m)^2)
DiabetesPedigreeFunction: Diabetes pedigree function
Age: Age (years)
Outcome: Class variable (0 or 1), Class Distribution: (class value 1 is interpreted as "tested positive for diabetes")
(a) Original owners: National Institute of Diabetes and Digestive and Kidney Diseases (b) Donor of database: Vincent Sigillito (email@example.com) Research Center, RMI Group Leader Applied Physics Laboratory The Johns Hopkins University
Go to the following online dataset (
https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv) and download the data.
Save it yor in your project's 'data/raw' folder. Time to work on it!
Use the explore.ipynb notebook to find patterns and valuable information that will help on your cleaning process.
Don't forget to write your observations.
Use the app.py to create your cleaning pipeline. Save your clean data in the 'data/processed' folder.
Now that you have a better knowledge of the data, in your exploratory notebook create a first decision tree model with your clean data.
Change your decision tree to use 'entropy' as criterion.
Hypertune your model using GridSearch to find the best hyperparameters.
Train your model with the optimal hyperparameters.
Again use the app.py to create your final machine learning model.
Save your final model in the 'models' folder.
In your README file write a brief summary of your cleaning and modeling process.
The most efficient way to learn: Join a cohort with classmates like yourself, live streamings, coding jam sessions, live mentorships with real experts and keep the motivation.
From zero to getting paid as a developer, learn the skills of the present and future. Boost your professional career and get hired by a tech company.
Start a career in data science and analytics. A hands-on approach with interactive exercises, chat support, and access to mentorships.
Keep your motivation with this 30 day challenge. Join hundreds of other developers coding a little every day.
Start with Python and Data Science, Machine Learning, Deep Learning and maintaining a production environment in A.I.