← Back to Lessons# Regularized Linear Models

## Regularized Linear Models

## Overcoming overfit with regularization

## What hyperparameters can be tuned in regularized linear models?

Before explaining regularized linear models, let's recap some important information about linear regression.

Every machine learning problem is basically an optimization problem. That is, you wish to find either a maximum or a minimum of a specific function. The function that you want to optimize is usually called the loss function (or cost function). The loss function is defined for each machine learning algorithm you use, and this is the main metric for evaluating the accuracy of your trained model.

This is the most basic form of a loss for a specific data-point, that is used mostly for linear regression algorithms:

$l = ( Ŷi- Yi)^2$

Where :

Ŷi is the predicted value

Yi is the actual value

The loss function as a whole can be denoted as:

$L = ∑( Ŷi- Yi)^2$

This loss function, in particular, is called quadratic loss or least squares. We wish to minimize the loss function (L) as much as possible so the prediction will be as close as possible to the ground truth.

Remember, every machine learning algorithm defines its own loss function according to its goal in life

We finished the last lesson talking about the importance of avoiding overfitting. One of the most common mechanisms for avoiding overfit is called regularization. Regularized machine learning model, is a model that its loss function contains another element that should be minimized as well. Let’s see an example:

$L = ∑( Ŷi- Yi)^2 + λ∑ β2$

**Ridge Regression**- linear regression that adds L2-norm penalty/regularization term to the cost function. The λ parameter is a scalar that should be learned as well, using cross validation. A super important fact we need to notice about ridge regression is that it enforces the β coefficients to be lower, but it does not enforce them to be zero. That is, it will not get rid of irrelevant features but rather minimize their impact on the trained model.

**Lasso**- linear regression that adds L1-norm penalty/regularization term to the cost function. The only difference from Ridge regression is that the regularization term is in absolute value. But this difference has a huge impact on the trade-off we’ve discussed before. Lasso method overcomes the disadvantage of Ridge regression by not only punishing high values of the coefficients β but actually setting them to zero if they are not relevant. Therefore, you might end up with fewer features included in the model than you started with, which is a huge advantage.

Elastic Net - linear regression that adds mix of both L1- and L2-norm penalties terms to the cost function.

You can tune the weight of the regularization term for regularized models (typically denoted as alpha), which affect how much the models will compress features.

-alpha = 0 ---> regularized model is identical to original model.

-alpha = 1 ---> regularized model reduced the original model to a constant value.

**Regularized models performance**

Regularized models tend to outperform non-regularized linear models, so it is suggested that you at least try using ridge regression.

Lasso can be effective when you want to automatically do feature selection in order to create a simpler model but can be dangerous since it may be erratic and remove features that contain useful signal.

Elastic net is a balance of ridge and lasso, and it can be used to the same effect as lasso with less erratic behaviour.

Source:

https://medium.com/hackernoon/practical-machine-learning-ridge-regression-vs-lasso-a00326371ece

The most efficient way to learn: Join a cohort with classmates like yourself, live streamings, coding jam sessions, live mentorships with real experts and keep the motivation.

From zero to getting paid as a developer, learn the skills of the present and future. Boost your professional career and get hired by a tech company.

Start a career in data science and analytics. A hands-on approach with interactive exercises, chat support, and access to mentorships.

Keep your motivation with this 30 day challenge. Join hundreds of other developers coding a little every day.

Start with Python and Data Science, Machine Learning, Deep Learning and maintaining a production environment in A.I.

©4Geeks Academy LLC 2019