4Geeks logo
4Geeks logo
About us

Learning library

For all the self-taught geeks out there, here our content library with most of the learning materials we have produces throughout the years.

It makes sense to start learning by reading and watching videos about fundamentals and how things work.

Machine Learning Engineering (16 weeks)

Full-Stack Software Developer

Search from all Lessons

Social & live learning

The most efficient way to learn: Join a cohort with classmates just like you, live streams, impromptu coding sessions, live tutorials with real experts, and stay motivated.

From zero to getting paid as a developer, learn the skills of today and tomorrow. Boost your professional career and be hired by a technology company.

Start Coding

← Back to Lessons
Edit on Github

Model Hyperparameters Optimization

What is a model hyperparameter?

What is a model hyperparameter?

A model hyperparameter is the parameter whose value is set before the model starts training. They cannot be learned by fitting the model to the data.

Examples of model hyperparameters in different models:

  • Learning rate in gradient descent

  • Number of iterations in gradient descent

  • Number of layers in a Neural Network

  • Number of neurons per layer in a Neural Network

  • Number of clusters(k) in k means clustering

Difference between parameter and hyperparameter

A model parameter is a variable of the selected model which can be estimated by fitting the given data to the model. For example in linear regression, the slope and the intercept of the line are two parameters estimated by fitting a straight line to the data by minimizing the RMSE.


And, as we already mentioned, a model hyperparameter value is set before the model start training and they cannot be learned by fitting the model to the data.

The best part is that you get a choice to select these for your model. Of course, you must select from a specific list of hyperparameters for a given model as it varies from model to model.

Often, we are not aware of optimal values for hyperparameters which would generate the best model output. So, what we tell the model is to explore and select the optimal model architecture automatically. This selection procedure for hyperparameter is known as Hyperparameter Tuning.

What are two common ways to automate hyperparameter tuning?

Hyperparameter tuning is an optimization technique and is an essential aspect of the machine learning process. A good choice of hyperparameters may make your model meet your desired metric. Yet, the plethora of hyperparameters, algorithms, and optimization objectives can lead to an unending cycle of continuous optimization effort.

  1. Grid Search - test every possible combination of pre-defined hyperparameter values and select the best one.


1 2#import libraries 3from sklearn import svm, datasets 4from sklearn.model_selection import GridSearchCV 5 6#load the data 7iris = datasets.load_iris() 8 9#establish parameters 10parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} 11 12#choose the model 13svc = svm.SVC() 14 15#Search all possible combinations 16clf = GridSearchCV(svc, parameters) 17clf.fit(iris.data, iris.target) 18 19#get the hyperparameter keys 20sorted(clf.cv_results_.keys()) 21

See the complete scikit-learn documentation about GridSearchCV:


  1. Randomized Search - randomly test possible combinations of pre-defined hyperparameter values and select the best tested one.


1 2#import libraries 3from sklearn.datasets import load_iris 4from sklearn.linear_model import LogisticRegression 5from sklearn.model_selection import RandomizedSearchCV 6from scipy.stats import uniform 7 8#load the data 9iris = load_iris() 10 11#choose the model 12logistic = LogisticRegression(solver='saga', tol=1e-2, max_iter=200, random_state=0) 13 14#establish possible hyperparameters 15distributions = dict(C=uniform(loc=0, scale=4), penalty=['l2', 'l1']) 16 17#Do a random search in possible combination between the established hyperparameters 18clf = RandomizedSearchCV(logistic, distributions, random_state=0) 19search = clf.fit(iris.data, iris.target) 20 21#Get the best hyperparameter values 22search.best_params_ 23 24{'C': 2..., 'penalty': 'l1'} 25

See the complete scikit-learn documentation about RandomizedSearchCV:


What are the pros and cons of grid search?


Grid Search is great when we need to fine-tune hyperparameters over a small search space automatically. For example, if we have 100 different datasets that we expect to be similar, like solving the same problem repeatedly with different populations. We can use grid search to automatically fine-tune the hyperparameters for each model.


Grid Search is computationally expensive and inefficient, often searching over parameter space that has very little chance of being useful, resulting it being extremely slow. It's especially slow if we need to search a large space since it's complexity increases exponentially as more hyperparameters are optimized.

What are the pros and cons of randomized search?


Randomized search does a good job finding near-optimal hyperparameters over a very large search space relatively quickly and doesn't suffer from the same exponential scaling problem as grid search.


Randomized search does not fine-tune the results as much as grid search does since it tipically does not test every possible combination of parameters.

Examples of questions that hyperparameter tuning will answer for us

  • What should be the value for the maximum depth of the Decision Tree?

  • How many trees should I select in a Random Forest model?

  • Should use a single layer or multiple layer Neural Network, if multiple layers then how many layers should be there?

  • How many neurons should I include in the Neural Network?

  • What should be the minimum sample split value for Decision Tree?

  • What value should I select for the minimum sample leaf for my Decision Tree?

  • How many iterations should I select for Neural Network?

  • What should be the value of the learning rate for gradient descent?

  • Which solver method is best suited for my Neural Network?

  • What is the K in K-nearest Neighbors?

  • What should be the value for C and sigma in Support Vector Machine?