Using reinforcement learning to control life support systems Theresa J. Klein, Devika Subramanian, David Kortenkamp, Scott Bell January 2004 Abstract Advanced life support systems have many interacting processes and limited resources. Con- trolling and optimizing advanced life support systems presents unique challenges that are ad- dressed in this paper. In particular, advanced life support systems are nonlinear coupled dy- namical systems and it is difficult for humans to take all interactions into account to design an effective control strategy. We have developed a controller using reinforcement learning [1], that actively explores the space of possible control strategies, guided by rewards from a user specified long term objective function. We evaluated this controller using a discrete event simulation of an advanced life support system. This simulation, called BioSim, has multiple, interacting life support modules including crew, food production, air revitalization, water recovery, solid waste incineration and power. These are implemented in a consumer/producer relationship in which certain modules produce resources that are consumed by other modules. Stores hold resources between modules. Control of this simulation is via adjusting flows of resources between modules and into/out of stores. This paper describes the results of using reinforcement learning to control the flow of resources in BioSim. Our technique discovered unobvious strategies for maximizing mission length. By exploiting non-linearities in the simulation dynamics, the learned controller outperforms a handwritten controller. 1 Introduction Keeping human beings alive in space is a complex task, particularly given the constraints imposed by launch costs from the surface of the earth. Weight considerations necessitate small buffers and low margins on consumables, so control policies that allocate energy and resources optimally are essential for success. Thus, optimal recycling of air and water is of paramount importance. For this reason NASA is developing Advanced Life Support systems (ALSs) that are designed to optimize recycling capabilities for maximum mission length. Because of the interactions of the various recycling systems in a closed environment, ALSs can be characterized as coupled dynamical systems. ALSs exhibit emergent behavior that cannot be simply explained as linear combinations of subsystem behavior. In addition, due to the presence of adaptive biological elements such as plants and humans, their dynamics change over time. The resulting non-linearity in the interactions makes designing controllers difficult. One potential solution is to use machine learning techniques to search the space of possible ALS controllers and identify optimal control policies. We do this search in simulation allowing for rapid convergence on interesting results. This paper describes a specific machine learning technique called reinforcement learning and a specific ALS simulation called BioSim [8]. We apply reinformcement learning to BioSim and compare its performance to that of a hand-written controller. 1