This project dataset has a lot of features related to socio demographic and health resources data by county in the United States, right before the Covid-19 pandemic started (data from 2018 and 2019).
We want to discover if there is any relationship between health resouces and socio demographic data. Choose one target variable (related to health resources), and use the LASSO model to reduce features to the most important ones for your target.
Find the parameters for your linear regression between your selected features and your chosen target.
You will not be forking this time, please take some time to read this instructions:
Once you are finished creating your model, make sure to commit your changes, push to your repository and go to 4Geeks.com to upload the repository link.
U.S.A. county level sociodemographic and health resource data (2018-2019)
There is a 'data-dictionary' (click here to open) that explains the meaning of each feature. You need to select one of the features related to health resources as your target variable and then use the LASSO regression to discover which features are the most important as factors to explain your target variable.
The dataset can be found in this project folder as 'dataset.csv' file. You are welcome to load it directly from the link (https://raw.githubusercontent.com/4GeeksAcademy/regularized-linear-regression-project-tutorial/main/dataset.csv), or to download it and add it to your data/raw folder. In that case, don't forget to add the data folder to the .gitignore file.
Time to work on it!
Use the explore.ipynb notebook to find correlations between features or between feature and your chosen target.
Don't forget to write your observations.
Consider doing feature scaling before applying LASSO.
Now that you have a better knowledge of the data, apply the LASSO model which already includes feature selection to obtain the most important features that influence in your target variable.
We are not going to predict anything, but don't forget to drop all the features related to health resources from your X (features) dataset, and define your chosen target as your 'y'.
Use ordinary least squares regression to choose the parameters that minimize the error of a linear function.
Use the app.py to create your pipeline that selects the most important features.
Save your final model in the 'models' folder.
In your README file write a brief summary.
The most efficient way to learn: Join a cohort with classmates like yourself, live streamings, coding jam sessions, live mentorships with real experts and keep the motivation.
From zero to getting paid as a developer, learn the skills of the present and future. Boost your professional career and get hired by a tech company.
Start a career in data science and analytics. A hands-on approach with interactive exercises, chat support, and access to mentorships.
Keep your motivation with this 30 day challenge. Join hundreds of other developers coding a little every day.
Start with Python and Data Science, Machine Learning, Deep Learning and maintaining a production environment in A.I.