Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

Authors: Khurram Javed, Haseeb Shah, Richard S. Sutton, Martha White; 24(256):1−34, 2023.

Abstract

Constructing states from sequences of observations is a crucial aspect of reinforcement learning agents. Recurrent neural networks provide a solution for state construction. Back-propagation through time (BPTT) and real-time recurrent learning (RTRL) are two popular gradient-based methods for recurrent learning. However, BPTT requires complete trajectories of observations before it can compute gradients, making it unsuitable for online updates. On the other hand, RTRL can perform online updates but does not scale well to large networks. In this paper, we propose two constraints that enable scalable RTRL. By either decomposing the network into independent modules or learning the network in stages, we demonstrate that RTRL can scale linearly with the number of parameters. Unlike other scalable gradient estimation algorithms like UORO and Truncated-BPTT, our algorithms do not introduce noise or bias to the gradient estimate. Instead, they prioritize computationally efficient learning at the expense of the functional capacity of the network. We validate the effectiveness of our approach over Truncated-BPTT using a prediction benchmark inspired by animal learning and by conducting policy evaluation of pre-trained policies for Atari 2600 games.

[abs]

[pdf][bib]
[code]