Cloud-based collaborative learning of optimal feedback controllers Valentina Breschi ∗ Laura Ferrarotti ∗∗ Alberto Bemporad ∗∗ ∗ Dipartimento di Elettronica e Informazione, Politecnico di Milano, 20133 Milan, Italy (e-mail: valentina.breschi@polimi.it). ∗∗ IMT School for Advanced Studies Lucca, 55100 Lucca, Italy (e-mail: {laura.ferrarotti,alberto.bemporad}@imtlucca.it) Abstract: Industrial systems deployed in mass production, such as automobiles, can greatly benefit from sharing selected data among them through the cloud to self-adapt their control laws. The reason is that in mass production systems are clones of each other, designed, constructed, and calibrated by the manufacturer in the same way, and thus they share the same nominal dynamics. Hence, sharing information during closed-loop operations can dramatically help each system to adapt its local control laws so to attain its own goals, in particular when optimal performance is sought. This paper proposes an approach to learn optimal feedback control laws for reference tracking via a policy search technique that exploits the similarities between systems. By using resources available locally and on the cloud, global and local control laws are concurrently synthesized through the combined use of the alternating direction method of multipliers (ADMM) and stochastic gradient descent (SGD). The enhancement of learning performance due to sharing knowledge on the cloud is shown in a simple numerical example. Keywords: Consensus and Reinforcement learning control, Control over networks. 1. INTRODUCTION Research on data-driven control is now gaining renewed interest within the control community, given the limi- tations of model-based design in coping with uncertain real-world systems and varying operating conditions. Data-driven control strategies, such as virtual reference feedback tuning (Campi et al., 2002), aim at synthesizing controllers from batches of input/output data, bypassing the model identification phase. These approaches rely on the choice of a reference model, representing the desired behavior in closed loop. Although this decision is of paramount importance for closed-loop performance, the selection of the reference model is generally performed a priori, without any guarantee that the desired behavior is actually attainable given the selected class of controllers. By sharing the same philosophy of data-driven control strategies without requiring the selection of a reference model, model-free reinforcement learning (RL) exploits information collected from interactions between a process and the environment to determine optimal policies, i.e., control laws that maximize a given reward. RL methods are generally classified as actor-critic, critic-only, and actor-only (or policy search ) approaches (Konda and Tsitsiklis, 2003), with the latter exploiting a specific parameterization of the policy instead of resorting to successive approximations of the cost to be optimized. Although investigated within both the control and the machine learning communities, methods for data-driven design of optimal controllers generally leverage informa- tion gathered from a single plant. As collecting informative data from a single plant might require running it for a long time, exploration throughout the learning phase is generally quite limited, often requiring additional efforts to satisfactorily search the state and action spaces (e.g., add exploration noise to the best decision). Thanks to recent advances in cloud computing, it is now technically and economically feasible to collect and store information gathered from different plants with the same nominal behavior, for example in a large volume production setting. Since it is likely that these plants (or agents ) share similar objectives while operating within different environments, the design of local control policies might be improved by using the additional information within experiences shared by other agents. In fact, each plant may explore different regions of the state and action spaces than others, so that the union of such explorations can provide a wide coverage of the operating space. In case of agents having limited embedded computing capabilities but access to resources on the cloud, we can even assume that each agent locally performs simple operations only. In this paper, we present a policy search approach to design feedback optimal controllers for systems whose dynamics are only known to be similar, that are allowed to share their experiences through the cloud. Differently from multi-agent RL approaches, such as the one in (Dimakopoulou et al., 2018), the proposed method is not aimed at finding local policies for systems interacting within the same (non-stationary) environment and co- operating to perform a common task. Instead, here the advantage of a cloud-aided framework is exploited by introducing a global policy, that is related to the local control laws by (known) constraints, reflecting the similar- ity among the agents. We will handle such constraints via the Alternating Direction Method of Multipliers (ADMM) (Boyd et al., 2011). Differently from what is generally