Recurrent Neural Networks: Understanding Memory in Deep Learning Models

How does an RNN work?
Limitations of Classic RNNs

In many real-world problems, data is presented in sequential form: words in a sentence, stock prices over time, musical notes in a melody, or physiological signals measured at regular intervals. To address such problems, it is not enough to analyze data in isolation; models must be capable of capturing the dependencies between elements in a sequence.

Recurrent Neural Networks (RNNs) were specifically designed with this goal in mind: enabling a model to learn temporal dependencies in data by incorporating a form of internal memory.

What is a Recurrent Neural Network?

A recurrent neural network is a network architecture that, unlike a traditional feedforward neural network, incorporates cycles in its structure. This allows it to maintain an internal state that is updated with each new input and influences the model's future decisions.

The key idea behind an RNN is that the model's output at a given time depends not only on the current input but also on the accumulated internal state up to that point.

How does an RNN work?

In a recurrent neural network, the computation process at each time step can be simplified as follows:

x_t: input at time t.
h_t: hidden state at time t, updated as a function of x_t and the previous state h_{t-1}.
y_t: model output at time t.

The state update and output generation can be formalized with these equations:

1h_t = tanh(W_hh * h_{t-1} + W_xh * x_t + b_h) y_t = W_hy * h_t + b_y

The state h_t acts as a dynamic memory, which adjusts as the model progresses through the sequence. RNNs are particularly useful in tasks where context is essential. Some examples include:

Natural Language Processing (NLP): language modeling, machine translation, text generation, sentiment analysis.
Time Series Prediction: forecasting financial prices, energy demand, temperature, etc.
Speech Recognition: transcribing audio to text.
Biological Sequence Analysis: identifying patterns in DNA or protein sequences.

Limitations of Classic RNNs

Although RNNs can theoretically learn long-term dependencies, in practice, they suffer from a phenomenon known as vanishing or exploding gradients during training, which hinders their ability to remember distant information in the sequence.

This limits their effectiveness in tasks where capturing relationships between events far apart in the sequence is crucial. To overcome the limitations of traditional RNNs, improved architectures were developed:

LSTM (Long Short-Term Memory): introduces gating mechanisms to regulate what information is retained, updated, and forgotten, allowing the state to be preserved over more time steps.
GRU (Gated Recurrent Unit): similar to LSTM but with a more simplified structure, maintaining competitive performance while reducing the number of parameters.

Both variants are now the de facto standard when working with sequences.

Basic Implementation with Keras

The following example demonstrates how to build a simple RNN using the Keras library in Python:

1from keras.models import Sequential
2from keras.layers import SimpleRNN, Dense
3
4model = Sequential()
5model.add(SimpleRNN(64, input_shape=(timesteps, features)))
6model.add(Dense(1, activation='sigmoid'))
7model.compile(optimizer='adam', loss='binary_crossentropy')

This model can be adapted for tasks such as sentiment classification or binary event prediction from sequences.