ANALYSIS OF CLOSED - LOOP INERTIAL GRADIENT DYNAMICS Subhransu S. Bhattacharjee ∗ School of Engineering The Australian National University Canberra, Australia 2601 Ian R. Petersen Professor, School of Engineering The Australian National University Canberra, Australia 2601 Abstract—In this paper, we analyse the performance of the closed-loop Whiplash gradient descent algorithm [1] for L- smooth convex cost functions. Using numerical experiments, we study the algorithm’s performance for convex cost functions, for different condition numbers. We analyse the convergence of the momentum sequence using symplectic integration and introduce the concept of relaxation sequences which analyses the non-classical character of the whiplash method. Under the additional assumption of invexity, we establish a momentum- driven adaptive convergence rate. Furthermore, we introduce an energy method for predicting the convergence rate with convex cost functions for closed-loop inertial gradient dynamics, using an integral anchored energy function and a novel lower bound asymptotic notation, by exploiting the bounded nature of the solutions. Using this, we establish a polynomial convergence rate for the whiplash inertial gradient system, for a family of scalar quadratic cost functions and an exponential rate for a quadratic scalar cost function. Index Terms—Optimization; Non-linear dynamics and control 1. I NTRODUCTION In the field of continuous optimization, we study unconstrained minimisation in a finite-dimensional setting. We revisit clas- sical problems in optimization theory, which form the heart of popular deep learning algorithms. Though Nemirovsky [2] and Nesterov [3] introduced decades ago an oracle-machine perspective into the field of optimization, it did not receive due attention. It is only with the recent treatment of optimization methods as ordinary differential equations (ODEs), that oracle machine perspective of solving black box problems have become popular [4], in the field of systems theory. While studying such algorithms in continuous-time, we consider finite-dimensional global minimisation problems with optima at x ∗ ∈X and the optimal cost f ∗ = min X∈R d f (x), (1) where f : R d → R. A central aspect of analysing gradient- based learning methods is to make necessary assumptions to solve a class of problems. One such assumption is the Lipschitz continuity of the gradient of the cost function, where there exists a constant L, termed as the Lipschitz constant, such that: ‖∇f (y) −∇f (x)‖≤ L‖y − x‖ ∀ x, y ∈ R d , (2) where ||.|| indicates the ℓ 2 norm. We make this assumption for our cost functions throughout our analysis, unless specified otherwise. When we discuss iterative gradient schemes in optimization, a sufficient condition for learning the gradient * Mr. Subhransu Bhattacharjee is the corresponding author for this paper. Please direct all queries to him at his official mail u7143478@anu.edu.au. of such a cost function is to take a small enough step-size s such that: 0 <s ≤ 1 L . (3) We consider computational models that make queries to an oracle which contain a finite-dimensional vector map of the entire linear span of gradients of a cost function [2]. Unlike other optimization methods, we use a black-box model of optimization which yields the gradient of the cost function only at the point of the query. In this paper, we are specifically investigating optimisation methods in the accelerated framework [3]. The question: “Why acceleration speeds up rates of convergences, and how it ties to the exact design structure of the algorithms?”, is still an active area of research. We gain some intuition from the method of estimate sequences, introduced by Nesterov [3]. However, continuous-time methods to analyse optimization algorithms provide us with more intuition. A variational approach developed by Wilson et al. [5], for continuous-time systems, described a generalised Lyapunov analysis framework. Using this framework, they showed that “there is an equivalence between the technique of estimate sequences and a family of Lyapunov functions in both continuous and discrete-time”. Hence, carefully designed time-scaled energy functions have become a staple to prove convergence rates for optimization algorithms which can be studied as continuous-time inertial gradient flow dynamical systems. Shi et al. [6], demonstrate the impracticality of using discrete schemes to find discrete Lyapunov functions. Though in continuous time, this is much more intuitive, the task of finding such energy functions is complicated and can only be developed vaguely by using phys- ical energy-based analogies. For closed-loop non-autonomous ODEs, which use damping coefficients, this task often becomes very difficult. This is because the damping terms themselves depend on the dynamics of the system. In their recent paper, Attouch et al. [7] uses a parametric manufactured solution for a particular cost function, to consider a specific family of dynamical systems. However, this method is limited from a control-theoretic perspective as it does not provide any energy method to prove the convergence rate for the cost function. The aim of our study is to realise a generalised framework to solve unconstrained global optimization problems, using a control-theoretic acceleration framework. The objectives of our immediate study are as follows: ➀ To understand the phenomenon of oscillation attenuation of the objective value for the whiplash method [1]. ➁ A Lyapunov analysis for closed-loop inertial gradient systems. arXiv:2203.02140v2 [math.OC] 10 Mar 2022