IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 1, NO. 4, OCTOBER 2014 397 Adaptive Pinpoint and Fuel Efﬁcient Mars Landing Using Reinforcement Learning Brian Gaudet Roberto Furfaro Abstract—Future unconstrained and science-driven missions to Mars will require advanced guidance algorithms that are able to adapt to more demanding mission requirements, e.g. landing on selected locales with pinpoint accuracy while autonomously ﬂying fuel-efﬁcient trajectories. In this paper, a novel guidance algorithm designed by applying the principles of reinforcement learning (RL) theory is presented. The goal is to devise an adaptive guidance algorithm that enables robust, fuel efﬁcient, and accurate landing without the need for off line trajectory generation and real-time tracking. Results from a Monte Carlo simulation campaign show that the algorithm is capable of autonomously following trajectories that are close to the optimal minimum-fuel solutions with an accuracy that surpasses that of past and future Mars missions. The proposed RL-based guidance algorithm exhibits a high degree of ﬂexibility and can easily ac- commodate autonomous retargeting while maintaining accuracy and fuel efﬁciency. Although reinforcement learning and other similar machine learning techniques have been previously applied to aerospace guidance and control problems (e.g., autonomous helicopter control), this appears, to the best of the authors knowledge, to be the ﬁrst application of reinforcement learning to the problem of autonomous planetary landing. Index Terms—Mars landing guidance, reinforcement learning, policy iteration, Markov decision process. I. I NTRODUCTION F UTURE unconstrained, science-driven, robotic and hu- man missions to Mars will require a higher degree of landing accuracy. Indeed, the next generation of Mars landers will require more advanced guidance and control capabilities to satisfy the increasingly stringent accuracy requirements driven by the desire to explore Mars regions that have the potential to yield the highest scientiﬁc return. The Mars Science Laboratory (MSL) [1] , which landed on Mars during the summer of 2012, is a clear example of a mission where the scientiﬁc desire to explore regions that have never been previously accessed before required the development and im- plementation of a novel guidance approach that can deliver the mobility system on the Martian surface with higher precision than previous attempts. For the earlier phoenix Mars lander [2] , Manuscript received September 6, 2013; accepted March 24, 2014. This work was supported by the University of Arizona. Recommended by Associate Editor Dixon Warren Citation: Brian Gaudet, Roberto Furfaro. Adaptive pinpoint and fuel ef- ﬁcient Mars landing using reinforcement learning. IEEE/CAA Journal of Automatica Sinica, 2014, 1(4): 397-411 Brian Gaudet is with the Research Engineer, Department of Systems and Industrial Engineering, University of Arizona, 1127 E. Roger Way, Tucson Arizona, 85721, USA (e-mail: briangaudet@me.com). Roberto Furfaro is with the Assistant Professor, Department of Systems and Industrial Engineering, Department of Aerospace and Mechanical Engi- neering, University of Arizona, 1127 E. James E. Rogers Way, Tucson, AZ 85721, USA (e-mail: robertof@email.arizona.edu). the entry, descent and landing (EDL) mission proﬁle included two unguided phases followed by a powered descent segment designed to close the loop only on the altitude while attempting to reduce the velocity to zero. As a result, the landing error ellipse was estimated to be of the order of 120 km [2] . The MSL lander was expected to reduce the landing error to less than 10 km, which has been achieved by implementing an active bank angle control during the hypersonic entry phase [1] . Moreover, according to the EDL timeline, after the MSL unguided parachute deceleration phase is completed, a powered descent 5th order polynomial trajectory is computed onboard to imple- ment a linear, closed-loop, trajectory-following approach that the spacecraft executes to safely land on the surface within the predicted error ellipse [3] . Importantly, the powered descent phase guidance algorithm does not have the ability to select the desired location within the landing ellipse, i.e. the selected landing point on the surface is designed to be located at a pre- ﬁxed downrange and crossrange distance from the point where the liquid rockets are activated. Although a 10 km landing precision has never been achieved on Mars, this accuracy is still far from the precision required by upcoming missions. Future missions to Mars may also require the ability to execute real-time retargeting while en-route toward the planet ′ s surface. Such maneuvers may be necessary to avoid obstacles that would interfere with the safety of the spacecraft during landing. Since such obstacles may not be apparent when the landing site is initially selected, it is important that during the powered descent segmentthe lander autonomously identiﬁes hazardous landing sites and dynamically retargets itself to a safer site. In practice andwithin the lander ′ s retargeting capabilities, a pattern recognition algorithm coupled with some combination of radar and optical inputs could be employed to determine a target site that provides the highest probability of a safe landing. The estimate of the best landing site would be continuously updated until the safe landing is achieved. To date, most of the powered descent algorithms for plan- etary landing have been based on variations of the Apollo guidance algorithm [4] . The original guidance approach, which was used to drive the lunar exploration module (LEM) to the lunar surface, was based on an iterative approach that computed ofﬂine a ﬂyable reference trajectory in the form of a quartic polynomial [5] . The real-time guidance algorithm would subsequently generate an acceleration command that tracks the pre-computed reference trajectory. While effective for past missions, such class of guidance algorithms may not be able to satisfy more stringent landing requirements imposed by novel mission architectures designed for precision landing.