1014 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 38, NO. 4, AUGUST 2008
Improved Adaptive–Reinforcement Learning Control
for Morphing Unmanned Air Vehicles
John Valasek, Senior Member, IEEE, James Doebbler, Monish D. Tandale, and Andrew J. Meade
Abstract—This paper presents an improved Adaptive–
Reinforcement Learning Control methodology for the problem
of unmanned air vehicle morphing control. The reinforcement
learning morphing control function that learns the optimal
shape change policy is integrated with an adaptive dynamic
inversion control trajectory tracking function. An episodic
unsupervised learning simulation using the Q-learning method
is developed to replace an earlier and less accurate Actor-Critic
algorithm. Sequential Function Approximation, a Galerkin-based
scattered data approximation scheme, replaces a K-Nearest
Neighbors (KNN) method and is used to generalize the learning
from previously experienced quantized states and actions to
the continuous state-action space, all of which may not have
been experienced before. The improved method showed smaller
errors and improved learning of the optimal shape compared to
the KNN.
Index Terms—Adaptive control, approximation methods,
learning control systems, shape control, unmanned air vehicles.
I. I NTRODUCTION
M
ORPHING research has led to a series of break-
throughs in a wide variety of disciplines that, when
fully realized for air vehicle applications, have the potential
to produce large increments in aviation safety, affordability,
and environmental compatibility. Valasek et al. developed an
Adaptive–Reinforcement Learning Control (A-RLC) method-
ology to the morphing air vehicle control problem [1]. Struc-
tured Adaptive Model Inversion (SAMI) was used as the
controller for tracking trajectories and handling time-varying
properties, parametric uncertainties, unmodeled dynamics, and
disturbances. A reinforcement learning (RL) module using an
Actor-Critic algorithm was used to learn how to produce the
optimal shape at every flight condition. While the A-RLC
methodology worked well, the learning was found to be de-
pendent on the performance of the function approximator. The
Manuscript received July 31, 2007. The morphing research was supported
by the National Aeronautics and Space Administration (NASA) under Award
NCC-1-02038, and the sequential function approximation work was provided
by the NASA Ames Research Center under Grant NCC-2-8077 and NASA
Cooperative Agreement NCC-1-02038. Any opinions, findings, and conclu-
sions or recommendations expressed in this paper are those of the author(s)
and do not necessarily reflect the views of the National Aeronautics and Space
Administration. This paper was recommended by Guest Editor F. Lewis.
J. Valasek and J. Doebbler are with the Department of Aerospace Engi-
neering, Texas A&M University, College Station, TX 77843 USA (e-mail:
valasek@tamu.edu; james.doebbler@tamu.edu).
M. D. Tandale is with Optimal Synthesis, Palo Alto, CA 94303 USA (e-mail:
monish@optisyn.com).
A. J. Meade is with the Department of Mechanical Engineering and Materials
Science, William Marsh Rice University, Houston, TX 77251-1892 USA
(e-mail: meade@rice.edu).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSMCB.2008.922018
learning itself was found to be decoupled from the actual
approximation method used; the data to be approximated is
sparse in some regions. The K-Nearest Neighbors (KNN) is
not accurate in those regions because it is an “averager”-type
approximator, and at times, this produced significant errors in
the achievable morphed shapes.
This paper extends and improves upon the methods and
results in [1] by using improved nonlinear Shape Memory Alloy
(SMA) dynamics, a Q-learning algorithm in place of the Actor-
Critic algorithm, and a Galerkin Sequential Function Approxi-
mation (SFA) in place of the KNN function approximation.
II. MORPHING AIR VEHICLE MODEL AND SIMULATION
A simplified morphing air vehicle model in the shape of
an ellipsoid is used to obtain the results in this paper. The
morphing used in this paper involves a change in the dimen-
sions of the ellipsoid axes while maintaining a constant total
volume. The RL module specifies the y- and z-axis dimensions,
corresponding to the current flight condition, and the x dimen-
sion is calculated by enforcing the constant volume condition
x =6V ol/πyz. The ellipsoidal air vehicle is composed of an
SMA whose shape can be modulated by applying voltage.
The morphing dynamics are nonlinear differential equations
given by
¨ y +2.5˙ y +2.5y +0.4 sin (π(y − 2)) − 5= V olt
y
(1)
¨ z +1.8˙ z +2z +0.6(z − 2)(z − 4) − 4= V olt
z
. (2)
The model for relating the y and z dimensions to the applied
voltages is given by (1) and (2). Note that the coefficients are
arbitrarily selected to form a conceptual model for the morph-
ing dynamics. The optimal y and z dimensions, respectively,
are arbitrarily selected to be nonlinear functions of the flight
condition F , i.e.,
S
y
(F ) = 3 + cos
π
5
F
(3)
S
z
(F )=2+2e
−0.5F
. (4)
With the optimal y and z dimensions given by (3) and (4), the
costs associated with the y and z dimensions are summed to
give the total cost
J = J
y
+ J
z
=(y − S
y
(F ))
2
+(z − S
z
(F ))
2
(5)
where y and z are any arbitrary dimensions. For the simulation,
flight conditions are specified at various locations along a
predesignated flight path. For this simplified example, the opti-
mal shapes are not correlated to the flight path but depend only
on the flight condition.
1083-4419/$25.00 © 2008 IEEE