← Back to Projects

Regularized Linear Regression Project Tutorial

Goal

4Geeks Coding Projects tutorials and exercises for people learning to code or improving their coding skills

Difficulty

beginner

Repository

Click to open

Video

Not available

Live demo

Not available

Average duration

2 hrs

Technologies

  • This project dataset has a lot of features related to socio demographic and health resources data by county in the United States, right before the Covid-19 pandemic started (data from 2018 and 2019).

  • We want to discover if there is any relationship between health resouces and socio demographic data. Choose one target variable (related to health resources), and use the LASSO model to reduce features to the most important ones for your target.

  • Find the parameters for your linear regression between your selected features and your chosen target.

🌱 How to start this project

You will not be forking this time, please take some time to read this instructions:

  1. Create a new repository based on machine learning project by clicking here.
  2. Open the recently created repostiroy on Gitpod by using the Gitpod button extension.
  3. Once Gitpod VSCode has finished opening you start your project following the Instructions below.

πŸš› How to deliver this project

Once you are finished creating your model, make sure to commit your changes, push to your repository and go to 4Geeks.com to upload the repository link.

πŸ“ Instructions

U.S.A. county level sociodemographic and health resource data (2018-2019)

There is a 'data-dictionary' (click here to open) that explains the meaning of each feature. You need to select one of the features related to health resources as your target variable and then use the LASSO regression to discover which features are the most important as factors to explain your target variable.

Step 1:

The dataset can be found in this project folder as 'dataset.csv' file. You are welcome to load it directly from the link (https://raw.githubusercontent.com/4GeeksAcademy/regularized-linear-regression-project-tutorial/main/dataset.csv), or to download it and add it to your data/raw folder. In that case, don't forget to add the data folder to the .gitignore file.

Time to work on it!

Step 2:

Use the explore.ipynb notebook to find correlations between features or between feature and your chosen target.

Don't forget to write your observations.

Consider doing feature scaling before applying LASSO.

Step 3:

Now that you have a better knowledge of the data, apply the LASSO model which already includes feature selection to obtain the most important features that influence in your target variable.

We are not going to predict anything, but don't forget to drop all the features related to health resources from your X (features) dataset, and define your chosen target as your 'y'.

Use ordinary least squares regression to choose the parameters that minimize the error of a linear function.

Step 4:

Use the app.py to create your pipeline that selects the most important features.

Save your final model in the 'models' folder.

In your README file write a brief summary.

Solution guide: https://github.com/4GeeksAcademy/regularized-linear-regression-project-tutorial/blob/main/solution_guide.ipynb

Goal

4Geeks Coding Projects tutorials and exercises for people learning to code or improving their coding skills

Difficulty

beginner

Repository

Click to open

Video

Not available

Live demo

Not available

Average duration

2 hrs


Subscribe for more!


COMPANY

ABOUT

CONTACT

MEDIA KIT

SOCIAL & LIVE LEARNING

The most efficient way to learn: Join a cohort with classmates like yourself, live streamings, coding jam sessions, live mentorships with real experts and keep the motivation.

INTRO TO CODING

From zero to getting paid as a developer, learn the skills of the present and future. Boost your professional career and get hired by a tech company.

DATA SCIENCE

Start a career in data science and analytics. A hands-on approach with interactive exercises, chat support, and access to mentorships.

30DAYSOFGEEKCODING

Keep your motivation with this 30 day challenge. Join hundreds of other developers coding a little every day.

A.I. & MACHINE LEARNING

Start with Python and Data Science, Machine Learning, Deep Learning and maintaining a production environment in A.I.


Β©4Geeks Academy LLC 2019

Privacy policies


Cookies policies


Terms & Conditions