Python
Keras
Deep Learning
TensorFlow
Transfer Learning
Data Augmentation
Early Stopping
Training Deep Learning models from scratch can be costly in terms of data, time, and computational resources. In many cases, projects lack millions of images or weeks of training time. However, there are techniques that can reduce these costs, improve model performance, and prevent common errors like overfitting.
In this article, we will explore three widely used strategies by deep learning practitioners:
These three techniques are not mutually exclusive. In fact, they are often used together as part of a robust strategy for efficient model training.
Transfer Learning involves taking a model previously trained on a large task and reusing it, either fully or partially, to solve a new task.
Models like VGG, ResNet, or BERT have been trained on large datasets (such as ImageNet or Wikipedia). During this process, they learn general representations (like detecting edges, shapes, or text patterns). These representations can be reused for similar tasks, reducing training costs and improving accuracy, especially when the new dataset is small.
There are two common strategies:
1from tensorflow.keras.applications import VGG16 2from tensorflow.keras.models import Sequential 3from tensorflow.keras.layers import Flatten, Dense 4 5# Load pre-trained model 6base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)) 7base_model.trainable = False # Freeze weights 8 9# Add new layers 10model = Sequential([ 11 base_model, 12 Flatten(), 13 Dense(128, activation='relu'), 14 Dense(1, activation='sigmoid') # Binary classification 15])
Data Augmentation is a technique to generate new training samples from existing ones by applying transformations that do not alter the data's class. It is especially useful in computer vision problems, where transformations such as:
Rotations
Shifts
Scaling and zooming
Horizontal flips
Random cropping
Brightness or contrast adjustments
can be applied.
When the model sees multiple versions of the same image with slight variations, it learns to generalize better and is less prone to memorizing specific details of the training set.
1from tensorflow.keras.preprocessing.image import ImageDataGenerator 2 3datagen = ImageDataGenerator( 4 rotation_range=30, 5 width_shift_range=0.2, 6 height_shift_range=0.2, 7 zoom_range=0.2, 8 horizontal_flip=True 9) 10 11train_generator = datagen.flow_from_directory( 12 'data/train/', 13 target_size=(224, 224), 14 batch_size=32, 15 class_mode='binary' 16)
This technique can also be adapted for text, audio, and other domains using specific strategies (synonym replacement, noise addition, pitch shifting, etc.).
Early Stopping is a simple and effective technique to prevent overfitting during model training.
While training, the model is evaluated on a validation set. If the validation loss (val_loss) stops improving for a certain number of consecutive epochs, training is automatically stopped. This way, training does not continue when the model is no longer learning anything useful and starts overfitting.
1from tensorflow.keras.callbacks import EarlyStopping 2 3early_stop = EarlyStopping( 4 monitor='val_loss', 5 patience=3, 6 restore_best_weights=True 7) 8 9model.fit(X_train, y_train, validation_split=0.2, epochs=50, callbacks=[early_stop])
This approach helps optimize resources and results in a more generalizable model without manual intervention.