Module Map

The following lessons explain different concepts of programming and have been published by members of 4Geeks

The Pre-work was about Python, Numpy, Pandas and Matplotlib but we understand that was a lot, let's go over some of the concepts with the rest of your cohort and mentors.

Linear Algebra's Matrix and Vectors are heavily used in Machine Learning, these structures are mainly used to store manipulate high amounts of information. Also, L. algebra functions "f(x)" help understand the relationship between target variables "y" and their predictors "x".

A machine learning algorithm is basically all about probability, you have to predict how probable an event or data is to happen, that is why we need to go over the basic probability functions to better create our future models.

When you have a big dataset, you cannot calculate very specific values because there is too much data, what you can do is get central tendency values like the mean, median and standard deviation and others.

In order to predict better, we first need to understand how our dataset values are distributed, based on that we can better identify outliers, fill missing values and do better data mining in general.

Before creating an algorithm/model with your assumptions, it's recommended to use Hypothesis Testing (defining a null and alternative hypothesis) and test agains it.

Algorithms need time and space to run, as a machine learning engineer you should learn how to optimize your code to take the lowest amount of space without taking too much time to execute.

If you are going to be deploying your machine learning models you need to know the command line, python package manager (PIP) and the cookie cutter boilerplate (at least). Today you will get the first Machine Learning Operations lesson.

SQL is the language for data, all the famous database engines use it to query or manipulate the data inside the database, lets get familiar with the most basic concepts, instructions and connect to our first real database.

Scraping is one of the most used sources of gathering data in the world of machine learning. This technology allows you to download almost anything publicly available on the internet, even without an API! Let's do our first scraping and learn the basics of it.

The last wait to fetch or retrieve data that we will be learning during the courses is API integrations, that are millions of public API's on the internet with very valuable information. As a Machine Learning Engineer sometimes you will need to use the python request package to connect to an API and get the extra data you need.

Find patterns in your data in order to get insights and valuable information. Use that information to make decisions and generate better predictions. If your data is garbage, the output will be garbage: Clean your data to avoid poor quality outputs.

During this module you will learn the basics of machine learning, the evaluation metrics and how to optimize your ML algo. We will start our journey with logistic regressions.

Read the linear regression theory and run the code in the exploring linear regression notebook to practice. Then go to your project and predict the cost of a medical insurance using Linear Regression

It is very important to avoid overfitting, so in this lesson you will learn about regularized linear regression models, which are a common way to avoid it.

This is one of the most used algorithms in the industry. Decision Tree's are used for both classification and regression problems. This algorithm makes decisions by building trees with nodes, leaves and branches to make decisions.

In this module we will add some randomness to our trees and build machine learning models using Random Forest.

In this lesson, we will learn about boosting techniques, specifically about gradient descent algorithm and XGBoost (extreme gradient descent).

Were you wondering when are you going to apply Bayes Theorem? Now it's the time. The Naive Bayes algorithm is one of the fastest algorithm and its based in the bayes theorem. We will use it for classification and also as a brief and simple introduction to NLP, which we'll learn deeper in another module.

In this module we will learn the basics of a new algorithm: Support Vector machine and we will also have an intro to Natural Language Processing. We will combine both by practicing with an email spam classifier on the exploring NLP notebook, and then you will work on a URL spam classifier in your project.

In this module we will learn about the k-nearest neighbors algorithm and we will dive into a very simple recommender system built with k-nearest neighbors.

In this module we will learn about a couple of unsupervised algorithms but we will focus on k-means for clustering with a very simple project to help you understand how to group data in clusters.

In this lesson, we will learn how to recognize and deal with time series when they are present in our datasets. This lesson's project will be a real time competition so get all your skills ready!

This is our last module on algorithms. We will close this part with a brief introduction to deep learning and neural networks so that we can learn a more complex but efficient model.

In this lesson, we will learn how to build a machine learning web application using one of your best models and the Flask tool. Then we will deploy it to Heroku so that our model can be showed to the world.

In this lesson, we will learn how to build a machine learning web application using one of your best models and the Streamlit tool. Then we will deploy it to Heroku so that our model can be showed to the world.

This is our last lesson, a brief introduction to cloud computing resources for machine learning. When using large datasets, it is important to know what resources are available for us in the cloud. They can help us reducing training times. Read this lesson at home and discuss it in class. Feel free to dive deeper into your preferred platform with the learning links mentioned in the lesson.