RLS Algorithms and Convergence Analysis Method for Online DLQR Control
Design via Heuristic Dynamic Programming
Watson R. M. Santos, Jonathan A. Queiroz, Jo˜ao Viana da F. Neto, Patr´ıcia H. M. Rˆego,
Ewaldo Santana and Gustavo Andrade
Federal University of Maranh˜ao, Federal Institute of Maranh˜ao, State University of Maranh˜ao
Embedded Systems and Intelligent Control Laboratory
S˜ao Luis - Maranh˜ao - Brazil
e-mail: watson.itz@ifma.edu.br jviana@dee.ufma.br
Abstract — In this paper, a method to design online optimal
policies that encompasses Hamilton-Jacobi-Bellman (HJB)
equation solution approximation and heuristic dynamic
programming (HDP) approach is proposed. Recursive least
squares (RLS) algorithms are developed to approximate the
HJB equation solution that is supported by a sequence of
greedy policies. The proposal investigates the convergence
properties of a family of RLS algorithms and its numerical
complexity in the context of reinforcement learning and
optimal control. The algorithms are computationally evaluated
in an electric circuit model that represents an MIMO dynamic
system. The results presented herein emphasize the
convergence behaviour of the RLS, projection and Kaczmarz
algorithms that are developed for online applications.
Keywords — Recursive Least Squares; Heuristic Dynamic
Programming; RLS Convergence; MIMO Dynamic Systems;
Optimal Control; Adaptive Dynamic Programming.
I. INTRODUCTION
In order to overcome the curse of dimensionality
problem in applications of Dynamic Programming
approach, such as online design of optimal control [1] [2]
[3], a lot of efforts has been spent to develop methods to
approximate the solution Hamilton-Jacobi-Bellman (HJB)
equation [4] [1]. Recently, the development of approximate
dynamic programming (ADP) methods and algorithms has
been proposed to improve the numerical stability [5] and
increase convergence speed of the RLS algorithms family
[6] [7]. In this article, a proposal to reduce computational
complexity of approximate solution of the HJB equation
underlying the discrete linear quadratic regulator (DLQR)
problem via derivatives of recursive least square (RLS)
method, such as Kacmarz and projection algorithms is
presented.
The online DLQR Optimal Control design based on
heuristic dynamic programming (HDP) schema is the
general context of the main contributions presented in this
paper, including recursive least square (RLS) algorithms in
addition to convergence analysis method. The RLS
algorithms are developed to approximate the Hamilton-
Jacobi-Bellman (HJB) equation solution and the
convergence is a general procedure to select the parameters
of these algorithms. Specifically, an approximation method
based on RLS approach to solve the discrete algebraic
Riccati equation (DARE), which is a particular form of the
HJB equation of the discrete linear quadratic regulator
(DLQR), is presented. A convergence analysis method for
state and value functions of heuristic dynamic programming
algorithms is based on non-singularities of Kronecker
transformation of state regressor matrix of RLS estimators.
From our point of view, online control design means that the
controller gains are automatically adjustable as the dynamic
process is in operation mode.
The present proposal is in the context of reinforcement
leaning (RL), approximate dynamic programming (ADP)
and policy iteration to develop online algorithms for the
DLQR solution. These algorithms are based on the RLS
method to approximate the Hamilton-Jacobi-Bellman
equation solution of the DLQR parameterizations. The
solutions are given for the discrete algebraic Riccati
equation and optimal gain in state value function
approximations, respectively.
The reinforcement learning (control) and environment
(process) are associeated with ADP, as show in Figure 1
representing the control system with classical feedback in
the context of adaptive control that is performed by the critic
and actor blocks. Specifically, the RL approach is based on
minimizing the Bellman error and heuristic dynamic
programming by means of a scheme that combines RLS
value function approximation with policy improvement.
This approach is directed to online solution of optimal
control problems in the sense that the policy improvements
are performed at every time step along the realization of
state trajectory towards the optimal policy.
Figure 1. Actor Critic Schematic in Control System
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation
978-1-4799-4923-6/14 $31.00 © 2014 IEEE
DOI 10.1109/UKSim.2014.109
77