Before explaining regularized linear models, let's recap some important information about linear regression.
Every machine learning problem is basically an optimization problem. That is, you wish to find either a maximum or a minimum of a specific function. The function that you want to optimize is usually called the loss function (or cost function). The loss function is defined for each machine learning algorithm you use, and this is the main metric for evaluating the accuracy of your trained model.
This is the most basic form of a loss for a specific data-point, that is used mostly for linear regression algorithms:
$l = ( Ŷi- Yi)^2$
Ŷi is the predicted value
Yi is the actual value
The loss function as a whole can be denoted as:
$L = ∑( Ŷi- Yi)^2$
This loss function, in particular, is called quadratic loss or least squares. We wish to minimize the loss function (L) as much as possible so the prediction will be as close as possible to the ground truth.
Remember, every machine learning algorithm defines its own loss function according to its goal in life
We finished the last lesson talking about the importance of avoiding overfitting. One of the most common mechanisms for avoiding overfit is called regularization. Regularized machine learning model, is a model that its loss function contains another element that should be minimized as well. Let’s see an example:
$L = ∑( Ŷi- Yi)^2 + λ∑ β2$
Elastic Net - linear regression that adds mix of both L1- and L2-norm penalties terms to the cost function.
You can tune the weight of the regularization term for regularized models (typically denoted as alpha), which affect how much the models will compress features.
-alpha = 0 ---> regularized model is identical to original model.
-alpha = 1 ---> regularized model reduced the original model to a constant value.
Regularized models performance
Regularized models tend to outperform non-regularized linear models, so it is suggested that you at least try using ridge regression.
Lasso can be effective when you want to automatically do feature selection in order to create a simpler model but can be dangerous since it may be erratic and remove features that contain useful signal.
Elastic net is a balance of ridge and lasso, and it can be used to the same effect as lasso with less erratic behaviour.
The most efficient way to learn: Join a cohort with classmates like yourself, live streamings, coding jam sessions, live mentorships with real experts and keep the motivation.
From zero to getting paid as a developer, learn the skills of the present and future. Boost your professional career and get hired by a tech company.
Start a career in data science and analytics. A hands-on approach with interactive exercises, chat support, and access to mentorships.
Keep your motivation with this 30 day challenge. Join hundreds of other developers coding a little every day.
Start with Python and Data Science, Machine Learning, Deep Learning and maintaining a production environment in A.I.