Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp Jonathan Ko * * Dept. of Computer Science & Engineering, University of Washington, Seattle, WA Daniel J. Klein Dept. of Aeronautics & Astronautics, University of Washington, Seattle, WA Dieter Fox * Dirk Haehnel Intel Research Seattle, Seattle, WA Abstract— Blimps are a promising platform for aerial robotics and have been studied extensively for this purpose. Unlike other aerial vehicles, blimps are relatively safe and also possess the ability to loiter for long periods. These advantages, however, have been difficult to exploit because blimp dynamics are complex and inherently non-linear. The classical approach to system modeling represents the system as an ordinary differential equation (ODE) based on Newtonian principles. A more recent modeling approach is based on representing state transitions as a Gaussian process (GP). In this paper, we present a general technique for system identification that combines these two modeling approaches into a single formulation. This is done by training a Gaussian process on the residual between the non-linear model and ground truth training data. The result is a GP-enhanced model that provides an estimate of uncertainty in addition to giving better state predictions than either ODE or GP alone. We show how the GP-enhanced model can be used in conjunction with reinforcement learning to generate a blimp controller that is superior to those learned with ODE or GP models alone. I. I NTRODUCTION AND MOTIVATION Unmanned aerial vehicles (UAVs) have become a helpful component for many applications where human operation is considered unnecessary or too dangerous. Blimps effec- tively combine the capabilities of airplanes with those of hot air balloons into one aircraft. This unique combination of maneuverability and the ability to float with relatively low power requirements makes a blimp an ideal research platform for sensor and control technology. Blimps have been studied in various contexts. So far, blimp controllers are mainly based on PID controllers [14], [15], [16] or non- linear dynamic models [1], [2], [3], [5]. System identification is the first step towards designing a controller for an autonomous blimp, and dynamical systems in general. A system model describes how the state changes from one instant to the next. The quality of a model can be measured by how well it predicts the next state given the current state and control input. A higher fidelity model results in improved state estimation and controller performance. The result of classical dynamic modeling is an ordinary differential equation which describes the evolution of the state. The model can be formulated without collecting any training data, however extensive human knowledge is re- quired. Another disadvantage of this approach is that the system noise is generally difficult to model. Gaussian process (GP) regression models have recently been applied to the Fig. 1. The left image shows the blimp used in our test environment equipped with a motion capture system. It has a customized gondola (right images) that includes an XScale based computer with sensors, two ducted fans that can be rotated by 360 degrees, and a webcam. problem of learning dynamic models from training data [4], [6]. GPs have several key properties that make them ideally suited to our problem. They are non-parametric, which lets them model a wide range of dynamical systems. Further- more, they can automatically learn the smoothness and noise levels of the underlying system. Finally, they provide a notion of uncertainty about the learned process. This uncertainty can be very valuable when learning a controller. However, standard GPs assume that the process underlying the data is zero-mean, which is clearly not the case when learning a model of blimp dynamics. In order to overcome this problem, we combine dynamical and data-driven model- ing to form a single GP-enhanced model. The GP-enhanced model begins with a classical non-linear dynamical model created by a human expert. The parameters of this blimp model are learned using ground truth data. Then a Gaussian process is used to model the residual between the prediction of the dynamical model and ground truth data. Experiments with an indoor blimp show that the GP-enhanced model outperforms both the classical non-linear approach and the pure GP-based approach. The GP-enhanced model is then used in reinforcement learning to learn a controller for the blimp. This paper is organized as follows. The blimp hardware testbed used in the experiments is described in Section II. In Section III, the non-linear dynamics of the blimp are derived. Our approach to using Gaussian Processes for learning pre- dictive models is described in Section IV. A blimp controller is built using reinforcement learning in Section V. Finally, experimental results illustrating the advantages of the GP- enhanced model are presented in Section VI.