Bayesian Variational Optimization for Combinatorial Spaces Tony C Wu * , Daniel Flam-Shepherd * , Al ´ an Aspuru-Guzik University of Toronto Vector Institute tonyc.wu@utoronto.ca, danielfs@cs.toronto.edu, alan@aspuru.com Abstract This paper focuses on Bayesian Optimization in combina- torial spaces. In many applications in the natural science. Broad applications include the study of molecules, proteins, DNA, device structures and quantum circuit designs, a on op- timization over combinatorial categorical spaces is needed to find optimal or pareto-optimal solutions. However, only a limited amount of methods have been proposed to tackle this problem. Many of them depend on employing Gaussian Process for combinatorial Bayesian Optimizations. Gaussian Processes suffer from scalability issues for large data sizes as their scaling is cubic with respect to the number of data points. This is often impractical for optimizing large search spaces. Here, we introduce a variational Bayesian optimiza- tion method that combines variational optimization and con- tinuous relaxations to the optimization of the acquisition function for Bayesian optimization. Critically, this method allows for gradient-based optimization and has the capabil- ity of optimizing problems with large data size and data di- mensions. We have shown the performance of our method is comparable to state-of-the-art methods while maintaining its scalability advantages. We also applied our method in molec- ular optimization. Introduction Bayesian optimization (BO) is a powerful framework for tackling global optimization problems involving black-box functions (Jones, Schonlau, and Welch 1998). BO seeks to identify an optimal solution with the minimal possible in- curred costs. It has been widely applied, yielding impres- sive results on many problems in different areas ranging from automatic chemical design (G´ omez-Bombarelli et al. 2018) to hyperparameter optimization (Snoek, Larochelle, and Adams 2012) have been reported. However, this is not true for all types of search spaces, in particular discrete spaces. Consider for example optimizing some black box function on a discrete grid of integers like in Figure 1. In this work we focus on Bayesian optimiza- tion of objective functions on combinatorial search spaces consisting of discrete variables where the number of possi- ble configurations quickly explodes. For n categorical vari- ables with k categories the number of possible combinations scales with O(k n ). * Equal contributions. Figure 1: Optimization of a black box function over a dis- crete, non-differentiable input space. The path from an initial input point highlighted in green is highlighted with arrows ending at the red input with optimal value. Combinatorial BO (Baptista and Poloczek 2018) aims to find the global optima of highly non-linear, black-box objec- tives for which simple and exact solutions are inaccurate and gradient-based optimizers are not amenable. These objec- tives typically have expensive and noisy evaluations and thus require optimizers with high sample efficiency. Some sim- ple common examples of typical combinatorial optimization problems include the traveling salesman problem, integer linear programming, boolean satisfiability and scheduling. The vast majority of the BO literature focuses on contin- uous search spaces. The reason for this is that BO relies on Gaussian processes and the smoothness from kernel meth- ods used to model functional uncertainty. One first speci- fies a ”belief” over possible explanations of the underlying function f using a probabilistic surrogate model and then combines this with the use of an acquisition function which assesses the expected utility of a set of novel X chosen by solving an inner optimization problem. arXiv:2011.02004v1 [cs.LG] 3 Nov 2020