One of the most widely used models in Machine Learning are the Artificial Neural Networks (ANN), which are models inspired by the structure and function of the human brain.
Deep Learning is a branch of Artificial Intelligence that focuses on building systems based on Deep Neural Networks (DNN). These neural networks are called "deep" because they have many layers of artificial neurons, or "nodes", that can learn and represent very complex data patterns. Simpler networks, with fewer layers, may be able to learn simpler patterns, but deep networks are capable of learning patterns that are too complex for humans to design manually.
Deep learning techniques have driven many advances in AI over the past decade, particularly in areas such as speech recognition, image recognition, natural language processing, and autonomous gaming. For example, deep learning techniques are the basis for voice recognition systems such as Siri and Alexa, Netflix and Amazon's recommendation systems, and Tesla's autonomous driving software.
Deep learning requires large amounts of data and computational power to train models efficiently. This is because deep neural networks have many parameters that need to be tuned, and these parameters are iteratively tuned through a process called backpropagation, which requires large amounts of mathematical computation.
Despite its complexity and resource requirements, deep learning has proven to be an extremely powerful tool for solving complex AI problems and is expected to continue to drive many advances in AI in the future.
An Artificial Neural Network (ANN) is a machine learning model inspired by the structure and function of the human brain. It consists of a large number of interconnected processing units called neurons. These neurons are organized in layers: an input layer that receives the data, one or more hidden layers that process the data, and an output layer that produces the final prediction or classification.
A neuron is a basic processing unit of the network. It is also known as a "node" or "unit". Its name comes from the neurons in the human brain, which were the inspiration for the concept of neural networks.
A neuron in a neural network takes a set of inputs (), performs a computation on them, and produces an output (). The computation a neuron performs typically involves taking a weighted combination of the inputs (i.e., multiplying each input by a weight and summing the results), and then applying an activation function to the result. The weights () are the main components that the network changes during training to learn from the data. They are adjusted so that the output of the network is as close as possible to the desired result.
The activation function introduces nonlinearity into the model. This allows the neural network to model complex relationships between inputs and outputs, beyond what it could do with only linear combinations of the inputs. Common examples of activation functions include the sigmoid function, the hyperbolic tangent function and the rectified linear unit (ReLU).
One of the most commonly used techniques for optimizing the hyperparameters of a network (which after all are the weights of the neuron connections) is one that we mentioned at the beginning of the course, the gradient descent. Here we analyze it in more detail.
Gradient descent is an optimization algorithm used in neural networks (and many other machine learning algorithms) to minimize a cost or loss function. The cost or loss function measures how far the model's prediction is from the true value for a training data set. The goal of training a neural network is to find the values of the weights that minimize this cost function.
Gradient descent does this iteratively. It starts with random initial values for the parameters and then, at each iteration, calculates the gradient of the cost function with respect to each parameter. The gradient at a point is a vector pointing in the direction of the largest slope at that point, so moving in the opposite direction (i.e., "gradient descent") reduces the cost function.
The algorithm then updates the parameters by moving a small amount in the direction of the negative gradient. This process is repeated until the algorithm converges to a minimum of the cost function, i.e., a point where the cost function cannot be further reduced by moving the parameters in either direction.
Artificial neural networks generally consist of three main types of layers: the input layer, the hidden layers and the output layer.
It is usually said that when we use networks with a single hidden layer, we are doing Machine Learning, and when we use more than one hidden layer, we are doing Deep Learning.
Due to the variety of problems and challenges to be solved, there are a large number of models; most of them are networks of neurons that allow us to address and solve them. Some of the best known and most versatile models are:
Model | Description | Typical use |
---|---|---|
Fully Connected Neural Networks (FCNN) | Basic deep learning model with all neurons in one layer connected to all neurons in the next layer. | Classification, regression |
Convolutional Neural Networks (CNN) | Model designed to process data with a topological grid structure, such as an image. | Image processing and classification |
Recurrent Neural Networks (RNN) | Model designed to process sequences of data, taking into account the order of the data. | Natural language processing, time series |
Autoencoders | Model that learns to copy its inputs to its outputs. It is used to learn the representation of the data. | Dimensionality reduction, generation of new images. |
Generative adversarial networks (GAN) | System of two competing neural networks: one network generates new data and the other evaluates its authenticity. | Generation of new images, super-resolution |
Transformers | Attention-based model that processes input data in parallel rather than sequentially, improving efficiency. | Natural Language Processing (e.g., BERT, GPT) |
Deep learning models can be implemented using various libraries and frameworks. Some of the most popular are TensorFlow
and Keras
.
TensorFlow and Keras are two popular and powerful tools for implementing deep learning models. TensorFlow offers more flexibility and control, while Keras is easier to use and faster to prototype. Since TensorFlow version 2.0, Keras is officially integrated into TensorFlow, which means you can use the high-level features of Keras while maintaining the powerful capabilities of TensorFlow.
In addition to TensorFlow and Keras, there are other libraries and frameworks such as PyTorch, Caffe, MXNet, etc., which are also very popular for implementing deep learning models.