Applied AI concepts

Artificial Intelligence, Training, and Models

LLMs, the Revolution That Brought Us Here

Artificial Intelligence, Training, and Models

Artificial Intelligence (AI) is a technology that allows machines to learn, "reason" (I'll tell you why I put that in quotes later), and make decisions autonomously.

You could say it's about training a machine to accomplish a specific task.

The training process consists of collecting data about the task you want to train and using mathematical and statistical algorithms so the machine can learn to solve it.

But wait, what do you mean by "learn"? Can machines actually learn?

Well, not really. Machines don't have brains on their own. A computer does nothing more than process zeros and ones—it doesn't think, it doesn't analyze, it just executes.

But we can make a machine behave as if it had a brain by making it run a special type of calculator called an artificial intelligence model.

An artificial intelligence model is a set of adjusted mathematical functions. It can range from a single function, like Linear Regression, to a set of functions that run one after another, like a Neural Network.

A Neural Network is a type of artificial intelligence model based on the structure of biological neurons. Data processed with this model enters through one neuron or node in the network and exits through another.

So how is it possible that a mathematical function can understand things like a person does?

Not all artificial intelligence models are the same, and not all artificial intelligence models are capable of understanding things or producing text like GPT-5 from OpenAI, the company behind ChatGPT.

However, thanks to advances in artificial intelligence models, today there exists a type of model called: LLMs (Large Language Models).

LLMs, the Revolution That Brought Us Here

Think of AI as a very large branch of science. It spans from models used in medicine to the financial sector, physics, chemistry, or human language. Theoretically, you could train a model to do anything if you have the right data—the key word is data.

Large Language Models (LLMs) in particular are models trained with enormous amounts of text like emails, posts, conversations, books, documents, code, mathematics, and any other type of text that can be found on the internet and that might be relevant for the model to better understand the structure of human language.

It was in 2017 that marked the beginning of the LLM era, with a paper written by Google called: Attention is all you need.

This paper presented a new artificial intelligence model architecture based on attention to process text. Attention significantly improved the performance of artificial intelligence models and became the standard for language artificial intelligence models.

Attention is a concept used in natural language processing to give more importance to certain parts of a text.

One thing led to another, companies started training larger and larger models, realizing that these understood human language better and could use them to solve tasks that weren't possible before. They seemed to "comprehend" things like a person would.

It's important to keep in mind that this happens thanks to adjusting the model parameters, and these parameters are adjusted with training data. And the more parameters (the bigger the model), the better they seem to comprehend. But at the core, a model is still just a mathematical algorithm.

They can have hallucinations (we'll talk about this later), present incorrect data, and make decisions that aren't the most appropriate, due to their nature of receiving text and giving a response accordingly.

Let's keep moving forward to understand the mystery of Artificial Intelligence.