Minimal Width for Universality of Deep RNNs
Chang hoon Song, Geonho Hwang, Jun ho Lee, Myungjoo Kang; 24(121):1−41, 2023.
Abstract
Recurrent neural networks (RNNs) are commonly used deep learning networks for handling sequential data. An infinite-width RNN can approximate any open dynamical system in a compact domain, mimicking a dynamical system. In practice, deep narrow networks with bounded width and arbitrary depth are generally more effective than wide shallow networks with arbitrary width and bounded depth. However, the universal approximation theorem for deep narrow structures has not been extensively studied. In this study, we demonstrate the universality of deep narrow RNNs and establish that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or $L^p$ function with the widths $d_x+d_y+3$ and $\\max\\{d_x+1,d_y\\}$, respectively. Here, the target function maps a finite sequence of vectors in $\\mathbb{R}^{d_x}$ to a finite sequence of vectors in $\\mathbb{R}^{d_y}$. We also calculate the additional width required if the activation function is sigmoid or higher. Additionally, we establish the universality of other recurrent networks such as bidirectional RNNs. By bridging a multi-layer perceptron and an RNN, our theory and technique provide insights for further research on deep RNNs.
[abs]