Next we will see how we can implement ANN (Artificial Neural Networks) in Python. For this, we will use the keras
library over tensorflow
(which is the most common).
We are going to use the Pima Indian diabetes onset dataset. This is a standard Machine Learning dataset from the UCI Machine Learning repository. It describes the medical record data of Pima Indian patients and whether they had an onset of diabetes within five years.
import pandas as pd
from sklearn.model_selection import train_test_split
total_data = pd.read_csv("https://raw.githubusercontent.com/4GeeksAcademy/machine-learning-content/master/assets/clean-pima-indians-diabetes.csv")
X = total_data.drop("8", axis = 1)
y = total_data["8"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
X_train.head()
The train set will be used to train the model, while the test set will be used to evaluate the effectiveness of the model. In addition, it is generally a good practice to normalize the data before training an artificial neural network (ANN). Two types can be applied: from 0 to 1 or from -1 to 1.
Models in Keras are defined as a sequence of layers. We create a sequential model and add layers one by one until we are satisfied with our network architecture.
The input layer will always have as many neurons as predictor variables. In this case, we have a total of 8 (from 0 to 7). Next, we add two hidden layers, one of 12 neurons and one of 8. Finally, the fourth layer, the output layer, will have a single neuron since the problem is dichotomous. If it were of n
classes, the network would have n
outputs.
Note: We have created a default network with random hidden layers and neurons in each hidden layer. Normally you would start this way and then do a hyperparameter optimization.
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import set_random_seed
set_random_seed(42)
model = Sequential()
model.add(Dense(12, input_shape = (8,), activation = "relu"))
model.add(Dense(8, activation = "relu"))
model.add(Dense(1, activation = "sigmoid"))
Then, once the model is defined, we can compile it. The backend automatically chooses the best way to represent the network to train and make predictions to run on your hardware, such as CPU, GPU, or evenly distributed.
When compiling, we must specify some additional properties required when training the network. Recall that training a network means finding the best set of weights to map inputs to outputs in our dataset.
model.compile(loss = "binary_crossentropy", optimizer = "adam", metrics = ["accuracy"])
model
We will define the optimizer known as adam
. This is a popular version of gradient descent because it is automatically tuned and gives good results on a wide range of problems. We will collect and report the classification accuracy, defined through the argument of the metrics.
Training occurs in epochs, and each epoch is divided into batches.
The training process will run for a fixed number of iterations, which are the epochs. We must also set the number of rows in the data set that are considered before the model weights are updated within each epoch, which is called the batch size and is set by the batch_size
argument.
For this problem, we will run a small number of epochs (150) and use a relatively small batch size of 10:
# Fit the keras model on the data set
model.fit(X_train, y_train, epochs = 150, batch_size = 10)
_, accuracy = model.evaluate(X_train, y_train)
print(f"Accuracy: {accuracy}")
The training time of a model will depend, first of all, on the size of the dataset (instances and features), and also on the type of model and its configuration.
The accuracy of the training set is 84.20%
.
y_pred = model.predict(X_test)
y_pred[:15]
As we can see, the model does not return the classes 0
and 1
directly, but requires a previous preprocessing:
y_pred_round = [round(x[0]) for x in y_pred]
y_pred_round[:15]
With raw data, it is very difficult to know whether the model is getting it right or not. To do this, we must compare it with reality. There are many metrics to measure the effectiveness of a model in predicting, including accuracy, which is the fraction of predictions that the model makes correctly.
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred_round)
Once we have the model we were looking for (presumably after hyperparameter optimization), to be able to use it in the future it is necessary to store it in our directory.
model.save("keras_8-12-8-1_42.keras")
Adding an explanatory name to the model is vital, since in the case of losing the code that has generated it we will know what architecture it has (in this case we say 8-12-8-1
because it has 8 neurons in the input layer, 12 and 8 in the two hidden layers and one neuron in the output layer) and also the seed to replicate the random components of the model, which in this case we do by adding a number to the file name, 42
.
The following is a simple example of how to train a neural network to classify images from the MNIST dataset. MNIST is a dataset of images of handwritten digits, from 0 to 9.
from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Normalize the data (transform pixel values from 0-255 to 0-1)
X_train, X_test = X_train / 255.0, X_test / 255.0
The pixel values of the images are normalized to be in the range 0 to 1 instead of 0 to 255.
The architecture of the neural network is defined. In this case, we are using a simple sequential model with a flattening layer that transforms 2D images into 1D vectors, a dense layer with 128 neurons, and an output layer with 10 neurons.
An alternative way to create an ANN to the above is provided below. Both are valid:
from tensorflow.keras.layers import Flatten
set_random_seed(42)
model = Sequential([
# Layer that flattens the 28x28 pixel input image to a vector of 784 elements
Flatten(input_shape = (28, 28)),
# Dense hidden layer with 128 neurons and ReLU activation function
Dense(128, activation = "relu"),
# Output layer with 10 neurons (one for each digit from 0 to 9)
Dense(10)
])
We also added the network compiler to define the optimizer and the loss function, as we did before:
from tensorflow.keras.losses import SparseCategoricalCrossentropy
model.compile(optimizer = "adam", loss = SparseCategoricalCrossentropy(from_logits = True), metrics = ["accuracy"])
The model is trained on the training set for a certain number of epochs. When working with images, it is less common to use the batch_size
parameter:
model.fit(X_train, y_train, epochs = 5)
_, accuracy = model.evaluate(X_train, y_train)
print(f"Accuracy: {accuracy}")
The training time of a model will depend, first of all, on the size of the dataset (instances and features), and also on the type of model and its configuration.
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print('\nTest accuracy:', test_acc)
Once we have the model we were looking for (presumably after hyperparameter optimization), to be able to use it in the future, it is necessary to store it in our directory.
model.save("keras_28x28-128-10_42.keras")
Adding an explanatory name to the model is vital, since in the case of losing the code that has generated it we will know what architecture it has (in this case we say 28x28-128-10
because it has an input layer of 28 x 28 pixels, 128 neurons in the only hidden layer it has, and 10 neurons in the output layer) and also the seed to replicate the random components of the model, which in this case we do by adding a number to the file name, 42
.