1014 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 38, NO. 4, AUGUST 2008 Improved Adaptive–Reinforcement Learning Control for Morphing Unmanned Air Vehicles John Valasek, Senior Member, IEEE, James Doebbler, Monish D. Tandale, and Andrew J. Meade Abstract—This paper presents an improved Adaptive– Reinforcement Learning Control methodology for the problem of unmanned air vehicle morphing control. The reinforcement learning morphing control function that learns the optimal shape change policy is integrated with an adaptive dynamic inversion control trajectory tracking function. An episodic unsupervised learning simulation using the Q-learning method is developed to replace an earlier and less accurate Actor-Critic algorithm. Sequential Function Approximation, a Galerkin-based scattered data approximation scheme, replaces a K-Nearest Neighbors (KNN) method and is used to generalize the learning from previously experienced quantized states and actions to the continuous state-action space, all of which may not have been experienced before. The improved method showed smaller errors and improved learning of the optimal shape compared to the KNN. Index Terms—Adaptive control, approximation methods, learning control systems, shape control, unmanned air vehicles. I. I NTRODUCTION M ORPHING research has led to a series of break- throughs in a wide variety of disciplines that, when fully realized for air vehicle applications, have the potential to produce large increments in aviation safety, affordability, and environmental compatibility. Valasek et al. developed an Adaptive–Reinforcement Learning Control (A-RLC) method- ology to the morphing air vehicle control problem [1]. Struc- tured Adaptive Model Inversion (SAMI) was used as the controller for tracking trajectories and handling time-varying properties, parametric uncertainties, unmodeled dynamics, and disturbances. A reinforcement learning (RL) module using an Actor-Critic algorithm was used to learn how to produce the optimal shape at every flight condition. While the A-RLC methodology worked well, the learning was found to be de- pendent on the performance of the function approximator. The Manuscript received July 31, 2007. The morphing research was supported by the National Aeronautics and Space Administration (NASA) under Award NCC-1-02038, and the sequential function approximation work was provided by the NASA Ames Research Center under Grant NCC-2-8077 and NASA Cooperative Agreement NCC-1-02038. Any opinions, findings, and conclu- sions or recommendations expressed in this paper are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration. This paper was recommended by Guest Editor F. Lewis. J. Valasek and J. Doebbler are with the Department of Aerospace Engi- neering, Texas A&M University, College Station, TX 77843 USA (e-mail: valasek@tamu.edu; james.doebbler@tamu.edu). M. D. Tandale is with Optimal Synthesis, Palo Alto, CA 94303 USA (e-mail: monish@optisyn.com). A. J. Meade is with the Department of Mechanical Engineering and Materials Science, William Marsh Rice University, Houston, TX 77251-1892 USA (e-mail: meade@rice.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCB.2008.922018 learning itself was found to be decoupled from the actual approximation method used; the data to be approximated is sparse in some regions. The K-Nearest Neighbors (KNN) is not accurate in those regions because it is an “averager”-type approximator, and at times, this produced significant errors in the achievable morphed shapes. This paper extends and improves upon the methods and results in [1] by using improved nonlinear Shape Memory Alloy (SMA) dynamics, a Q-learning algorithm in place of the Actor- Critic algorithm, and a Galerkin Sequential Function Approxi- mation (SFA) in place of the KNN function approximation. II. MORPHING AIR VEHICLE MODEL AND SIMULATION A simplified morphing air vehicle model in the shape of an ellipsoid is used to obtain the results in this paper. The morphing used in this paper involves a change in the dimen- sions of the ellipsoid axes while maintaining a constant total volume. The RL module specifies the y- and z-axis dimensions, corresponding to the current flight condition, and the x dimen- sion is calculated by enforcing the constant volume condition x =6V ol/πyz. The ellipsoidal air vehicle is composed of an SMA whose shape can be modulated by applying voltage. The morphing dynamics are nonlinear differential equations given by ¨ y +2.y +2.5y +0.4 sin (π(y 2)) 5= V olt y (1) ¨ z +1.z +2z +0.6(z 2)(z 4) 4= V olt z . (2) The model for relating the y and z dimensions to the applied voltages is given by (1) and (2). Note that the coefficients are arbitrarily selected to form a conceptual model for the morph- ing dynamics. The optimal y and z dimensions, respectively, are arbitrarily selected to be nonlinear functions of the flight condition F , i.e., S y (F ) = 3 + cos π 5 F (3) S z (F )=2+2e 0.5F . (4) With the optimal y and z dimensions given by (3) and (4), the costs associated with the y and z dimensions are summed to give the total cost J = J y + J z =(y S y (F )) 2 +(z S z (F )) 2 (5) where y and z are any arbitrary dimensions. For the simulation, flight conditions are specified at various locations along a predesignated flight path. For this simplified example, the opti- mal shapes are not correlated to the flight path but depend only on the flight condition. 1083-4419/$25.00 © 2008 IEEE