A Constrained Optimization Approach to Bilevel Optimization with Multiple Inner Minima Daouda Sow 1 , Kaiyi Ji 2 , Ziwei Guan 1 , Yingbin Liang 1 1 Department of ECE, The Ohio State University 2 Department of EECS, University of Michigan, Ann Arbor sow.53@osu.edu, kaiyiji@umich.edu, liang889@osu.edu March 3, 2022 Abstract Bilevel optimization has found extensive applications in modern machine learning problems such as hyperparameter optimization, neural architecture search, meta-learning, etc. While bilevel problems with a unique inner minimal point (e.g., where the inner function is strongly convex) are well understood, bilevel problems with multiple inner minimal points remains to be a challenging and open problem. Existing algorithms designed for such a problem were applicable to restricted situations and do not come with the full guarantee of convergence. In this paper, we propose a new approach, which convert the bilevel problem to an equivalent constrained optimization, and then the primal-dual algorithm can be used to solve the problem. Such an approach enjoys a few advantages including (a) addresses the multiple inner minima challenge; (b) features fully first-order efficiency without involving second-order Hessian and Jacobian computations, as opposed to most existing gradient-based bilevel algorithms; (c) admits the convergence guarantee via constrained nonconvex optimization. Our experiments further demonstrate the desired performance of the proposed approach. 1 Introduction Bilevel optimization has received intensive attention recently due to its applications in a variety of modern machine learning problems. Typically, parameters handled by bilevel optimization are divided into two different types such as meta and base learners in few-shot meta-learning (Bertinetto et al., 2018; Rajeswaran et al., 2019), hyperparameters and model parameters training in automated hyperparameter tuning (Franceschi et al., 2018; Shaban et al., 2019), actors and critics in reinforcement learning (Konda & Tsitsiklis, 2000; Hong et al., 2020), and model architectures and weights in neural architecture search (Liu et al., 2018). Mathematically, bilevel optimization captures intrinsic hierarchical structures in those machine learning models, and can be formulated into the following two-level problem: min x∈X ,y∈Sx f (x, y) with S x = arg min y∈Y g(x, y), (1) 1 arXiv:2203.01123v1 [math.OC] 1 Mar 2022