Discrete Variational Calculus for Accelerated Optimization

Cédric M. Campos, Alejandro Mahillo, David Martín de Diego; 24(25):1−33, 2023.

Abstract

The field of machine learning has seen significant advancements in gradient-based optimization methods. A recent approach to studying these methods is through a variational perspective (Betancourt et al., 2018), which has led to the introduction of variational and symplectic methods using geometric integration. This paper focuses on the introduction of variational integrators (Marsden and West, 2001) to derive different optimization methods. By using Hamilton’s and Lagrange-d’Alembert’s principles, we develop two families of optimization methods that generalize Polyak’s heavy ball (Polyak, 1964) and Nesterov’s accelerated gradient (Nesterov, 1983). The second family reduces the oscillations observed in classical momentum methods and mimics the behavior of Nesterov’s accelerated gradient. However, it is important to note that the preservation of symplecticity of autonomous systems occurs solely on the fibers due to the explicit time-dependence of the systems considered. The paper includes several experiments to exemplify the results.

[abs]

[pdf][bib]

[code]