For all the self-taught geeks out there, here is our content library with most of the learning materials we have produced throughout the years.
It makes sense to start learning by reading and watching videos about fundamentals and how things work.
Data Science and Machine Learning - 16 wks
Full-Stack Software Developer - 16w
Search from all Lessons
Curated list of small interactive and incremental exercises you can take to get better at any coding skill.
Curated section of projects to build while learning with simple instructions, videos, solutions, and more.
Guides on different topics related to the technologies that we teach in our courses
Social & live learning
The most efficient way to learn: Join a cohort with classmates just like you, live streams, impromptu coding sessions, live tutorials with real experts, and stay motivated.
Next we will see how we can implement this model in Python. To do so, we will use the
To exemplify the implementation of a boosting algorithm for classification we will use the same dataset as for the case of decision trees, random forest and boosting.
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split X, y = load_iris(return_X_y = True, as_frame = True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42) X_train.head()
|sepal length (cm)||sepal width (cm)||petal length (cm)||petal width (cm)|
The train set will be used to train the model, while the test will be used to evaluate the effectiveness of the model. Furthermore, it is not necessary for the predictor variables to be normalized, since these models are based on Bayes' theorem and make specific assumptions about the distribution of the data, but are not directly affected by the scale of the features.
from sklearn.naive_bayes import GaussianNB model = GaussianNB() model.fit(X_train, y_train)
GaussianNB()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
The training time of a model will depend, first of all, on the size of the dataset (instances and features), and also on the model type and its configuration.
Once the model has been trained, it can be used to predict with the test data set.
y_pred = model.predict(X_test) y_pred
array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2, 0, 2, 2, 2, 2, 2, 0, 0])
With raw data it is very difficult to know whether the model is getting it right or not. To do this, we must compare it with reality. There are a large number of metrics to measure the effectiveness of a model in predicting, including accuracy, which is the fraction of predictions that the model made correctly.
from sklearn.metrics import accuracy_score accuracy_score(y_test, y_pred)
The model is perfect!
Once we have the model we were looking for (presumably after hyperparameter optimization), to be able to use it in the future it is necessary to store it in our directory.
from pickle import dump dump(model, open("naive_bayes_default.sav", "wb"))
Adding an explanatory name to the model is vital, since in the case of losing the code that has generated it we will know what configuration it has (in this case we say
default because we have not customized any of the hyperparameters of the model, we have left the ones that the function has by default).