Self-monitoring Reinforcement Metalearning for Energy Conservation in Data-ferried Sensor Networks Ben Pearre and Timothy X. Brown University of Colorado, Boulder, CO, USA {benjamin.pearre,timxb}@colorado.edu Abstract—Given multiple widespread stationary data sources such as ground-based sensors, an unmanned aircraft can fly over the sensors and retrieve their data via a wireless link. When sensors have limited energy resources, they can reduce the energy used in data transmission if the ferry aircraft is allowed to extend its flight time. Complex vehicle and communication dynamics and imperfect knowledge of the environment confound planning since accurate system models are difficult to acquire and maintain, so we present a reinforcement learning approach that allows the ferry aircraft to optimise data collection trajectories and sensor energy use in situ, obviating the need for system identification. We address a key problem of reinforcement learning—the high cost of acquiring sufficient experience—by introducing a metalearner that transfers knowledge between tasks, thereby reducing the number of flights required and the frequency of significantly suboptimal flights. The metalearner monitors the quality of its own output in order to ensure that its recommendations are used only when they are likely to be beneficial. We find that allowing the ferry aircraft to double its range can reduce sensor radio transmission energy by 60% or better, depending on the accuracy of the aircraft’s information about sensor locations. Keywords-Sensor networks; data ferries; energy optimisation; reinforcement learning; metalearning I. I NTRODUCTION We consider the problem of collecting data from widespread energy-limited stationary data sources such as ground-based sensors. Our approach uses a fixed-wing unmanned aircraft (UA) to fly over the sensors and gather the data via a wireless link [1]. We assume that the UA has a known range limit and can be recharged/refuelled at a base station, and that the sensors may continuously generate data over long periods, so that the UA needs to ferry the data to a collection site over repeated flights. The goal is to trade energy used by the UA against energy saved by the sensor nodes. The system is difficult to model, so the challenge is to develop a model-free approach that can quickly learn to minimise the sensors’ radio transmission energy subject to the UA’s range constraints. The problem may be subdivided into the following pieces: Aircraft trajectory optimisation seeks to discover a flight path over the sensor nodes (a so-called tour) that minimises some mission cost. We decompose this piece as follows: ● Tour Design decides in what order to visit sensor nodes of known location, or establishes a search pattern when the locations are unknown. ● Trajectory Optimisation finds a sequence of waypoints the UA should follow in order to visit the sensor nodes. ● Vehicle Control translates the waypoints into control surface and engine commands. Radio energy optimisation consists of the following: ● Radio Design chooses radio hardware and protocols to support high-efficiency communication. ● Power Management varies the transmission power of nodes’ radios during interaction with the ferry aircraft. This paper focuses on the Aircraft Trajectory Optimisation and Radio Power Management layers. We assume that the tour is given and that the nodes’ locations are known only approximately as when, for example, the sensors have been deployed from an aircraft. Vehicle control to track a set of waypoints requires an autopilot, whose behaviour is a complex function of the waypoints, weather, aircraft dynamics, and the control models within the autopilot. Similarly, communication system performance is a complex function of the radio pro- tocols, antenna patterns, noise, and interference. We assume autopilot and communication systems are black boxes whose specific functionality is unknown to the upper layers. Only aggregate performance of the ferry system is reported to the learner. In [2], we examined model-free minimisation of UA tra- jectory length. Here we extend the technique: since network lifetime or maintenance costs may depend on the energy reserves of the sensor nodes, we seek to minimise their transmission energy cost per bit. Data ferries can be highly effective for reducing radio energy requirements. Jun et al. [3] compare ferry-assisted networks with hopping networks in simulation and finds that a ferry can reduce node energy consumption by up to 95% (further gains would have been possible with a broader config- uration space). Tekdas et al. [4] reach a similar conclusion on a real toy network in which wheeled robots represent ferries. Anastasi et al. [5] consider the total energy requirement per message including overhead associated with turning a node’s radio on in order to search for a fixed-trajectory ferry. Ma and Yang [6] optimise the lifetime of nodes by choosing between multi-hop node-to-node routing and changing the ferry’s route and speed. Optimal solutions under the trade- off between energy use and latency have been examined for 296 Copyright (c) IARIA, 2012. ISBN: 978-1-61208-207-3 SENSORCOMM 2012 : The Sixth International Conference on Sensor Technologies and Applications