Optimization of Stochastic Strategies for Spatially Inhomogeneous Robot Swarms: A Case Study in Commercial Pollination Spring Berman, Radhika Nagpal, and ´ Ad´ am Hal´ asz Abstract— We present a scalable approach to optimizing robot control policies for a target collective behavior in a spatially inhomogeneous robotic swarm. The approach can incorporate robot feedback to maintain system performance in an unknown environmental ﬂow ﬁeld. We consider systems in which the robots follow both deterministic and random motion and transition stochastically between tasks. Our methodology is based on an abstraction of the swarm to a macroscopic continuous model, whose dimensionality is independent of the population size, that describes the expected time evolution of swarm subpopulations over a discretization of the environ- ment. We incorporate this model into a stochastic optimization method and map the optimized model parameters onto the robot motion and task transition control policies to achieve a desired global objective. We illustrate our methodology with a scenario in which the behaviors of a swarm of robotic bees are optimized for both uniform and nonuniform pollination of a blueberry ﬁeld, including in the presence of an unknown wind. I. INTRODUCTION A robotic swarm is a system that would consist of hun- dreds or thousands of autonomous, relatively expendable robots with limited sensing, communication, and computa- tional capabilities. This kind of system has the potential to perform tasks with a high degree of parallelism, redundancy, ﬂexibility, and adaptability to dynamic, possibly hazardous environments. A key challenge in robotic swarms is the development of approaches to design robot control policies that can provably produce a speciﬁc macroscopic outcome which is robust to disturbances in the system. Fully centralized control strategies for a robot collective can provide globally optimal solutions but are computa- tionally infeasible for such enormous populations. We use the paradigm of a broadcast architecture [18] to ensure scalability of the control approach with the swarm population size while enabling guarantees on system performance. A su- pervisory agent computes parameters that govern the robots’ behaviors and transmits them to the swarm, without requiring information about the individual robot activities. Each robot in the entire swarm or a large subset is identical, unidentiﬁed, and follows the same set of decentralized algorithms, which rely only on local information from sensors and/or commu- nication without knowledge of the global system state. Our main contribution in this work is a top-down approach to synthesizing robot control policies that are optimized for a particular global objective to be accomplished by a spatially S. Berman and R. Nagpal are with the School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138 (e-mail: {sberman,rad}@eecs.harvard.edu) ´ A. Hal´ asz is with the Department of Mathematics, West Virginia Univer- sity, Morgantown, WV 26506 (e-mail: halasz@math.wvu.edu) inhomogeneous swarm, whose members are arbitrarily dis- tributed throughout their environment. We consider systems in which the robots transition stochastically between tasks at constant rates and follow a velocity ﬁeld while executing random motion that can be modeled as Brownian motion. We describe optimization strategies that are open-loop, which compute the control policies independently of robot measure- ments, and closed-loop, which use this feedback to adapt the control policies to an unknown bulk motion of the medium, such as wind or water, in which the robots operate. Our approach relies on the development of an abstraction of the physical system, which is facilitated by the stochastic- ity of the robot behaviors. We employ a modeling technique from the stochastic simulation of reaction-diffusion chem- ical systems [9]. From this model, we derive the macro- continuous model, a set of ordinary differential equations (ODE’s) that govern the time evolution of average swarm subpopulations in each cell of a discretization of the envi- ronment. The dimensionality of this model is independent of the swarm population size; hence, for a fairly coarse discretization, it is much faster to numerically solve the ODE’s than to simulate individual robots. This makes the model suitable for use in a stochastic optimization technique as a tool for quickly predicting system performance under a certain set of parameters. A supervisory agent can use such a technique to optimize the model parameters for a target objective in terms of swarm subpopulations. When these parameters are mapped to the robot motion controllers and stochastic policies for task transitions, the swarm produces the desired collective behavior. In prior work on this topic [3], we described an advection- diffusion-reaction partial differential equation (PDE) model of a spatially inhomogeneous swarm and discussed the mapping between its parameters and the robot controllers. Similarly, [11], [22] develop PDE models of swarms based on the Fokker-Planck equation; they do not address the problem of controller optimization. Existing methods of optimizing control policies for swarms of robots whose be- havioral rules are stochastic, or can be modeled as stochastic, apply to problems of task allocation [2], [5], [16], [20] and robotic assembly and self-assembly [7], [13], [17] that do not incorporate spatial descriptions of swarms. Optimization of a spatial model is considered in [19] for the speciﬁc purpose of directing a swarm to a desired location. As in [3], we apply our methodology to a scenario of interest for the Robobees project [1], whose objective is to develop a colony of insect-inspired micro air vehicles [23]. We address the problem of designing control policies for a