Online Partially Model-Free Solution of
Two-Player Zero Sum Differential Games
Praveen P
*
Shubhendu Bhasin
**
*
Department of Electrical Engineering, Indian Institute of Technology
Delhi, India (e-mail: eez138263@ee.iitd.ac.in)
**
Department of Electrical Engineering, Indian Institute of Technology
Delhi, India (e-mail: sbhasin@ee.iitd.ac.in)
Abstract: An online adaptive dynamic programming based iterative algorithm is proposed
for a two-player zero sum linear differential game problem arising in the control of process
systems affected by disturbances. The objective in such a scenario is to obtain an optimal
control policy that minimizes the specified performance index or cost function in presence of
worst case disturbance. Conventional algorithms for the solution of such problems require full
knowledge of system dynamics. The algorithm proposed in this paper is partially model-free
and solves the two-player zero sum linear differential game problem without knowledge of state
and control input matrices.
Keywords: Two-player zero sum differential game, Adaptive dynamic programming,
Approximate Dynamic Programming
1. INTRODUCTION
In a two-player zero sum game, each player’s gain is exactly
balanced by the loss of his competitor and the net gain of
the players at any point in time is zero [Isaacs (1965)].
In scenarios of perfect competition such as that exist in a
zero sum game, each player tries to make the best possible
decision taking into account the fact that his opponent also
tries to do the same. The theory of zero sum differential
game finds applications in a wide variety of disciplines
including that of control theory [Tomlin et al. (2000); Wei
and Liu (2012)].
The problem of designing optimal controller for any pro-
cess system subject to worst possible disturbance is a min-
imax optimization problem where the controller tries to
minimize and the disturbance tries to maximize the infinite
horizon quadratic cost. Using game theoretic framework,
the above minimax optimization problem can be viewed
as a zero sum differential game where the controller acts
as the ’minimizer’ or minimizing player, while the dis-
turbance acts as the ’maximizer’ or maximizing player
[Basar and Bernhard (1995); Basar and Olsder (1995)].
The optimal control in such a scenario is equivalent to
finding the Nash equilibrium or saddle point equilibrium
of the corresponding two-player zero sum differential game
[Basar and Bernhard (1995)].
To obtain the saddle point equilibrium strategy of each
player in the two-player zero sum linear differential game,
one needs to solve an Algebraic Riccati Equation (ARE)
with a sign indefinite quadratic term known as the Game
Algebraic Riccati Equation (GARE). [Kleinman (1968)]
proposed an iterative method for solving the ARE with
a sign definite quadratic term. A series of Lyapunov
equations are constructed at each iteration and the posi-
tive semi-definite solutions of the Lyapunov equations are
shown to converge to the solution of ARE. However the
algorithm proposed in [Kleinman (1968)] for solving the
ARE could not be extended to the GARE due to the
presence of a sign indefinite quadratic term.
Subsequently, several Newton-type algorithms were pro-
posed for the solution the GARE [Arnold and Laub (1984);
Damm and Hinrichsen (2001); Mehrmann and Tan (1988)].
[Mehrmann (1991)] and [Sima (1996)] proposed a matrix
sign function method for solving GARE. A more robust
iterative algorithm for solving GARE was proposed by
[Lanzon et al. (2008)], where the GARE with a sign indefi-
nite quadratic term is replaced by a sequence of AREs with
sign definite quadratic terms. Each of these AREs can then
be sequentially solved using Kleinman’s algorithm or any
other existing algorithm and the recursive solution of these
AREs converge to the solution of GARE. However, all the
above results require knowledge of full system dynamics,
which is a severe restriction owing to the uncertainties in
system modelling.
The concept of Adaptive Dynamic Programming (ADP)
was proposed by [Werbos (1992)] for solving the dynamic
programming problems related to classical optimal control
in a forward in time fashion. ADP is based on the concepts
of Dynamic Programming [Bellman (2003)] and Reinforce-
ment Learning [Sutton and Barto (1998)] and has been
widely used to reach approximate solutions of optimal
control problems [Abu-Khalaf and Lewis (2005); Lewis
and Liu (2013)]. The classical optimal control problem
is a single player linear differential game problem [Isaacs
(1965)] and a detailed analysis of the application of ADP
for the solution of single player linear differential game
problems (optimal control problems) is provided in [Wang
et al. (2009)] and [Lewis et al. (2012)].
Preprints of the 10th IFAC International Symposium on Dynamics and Control of Process Systems
The International Federation of Automatic Control
December 18-20, 2013. Mumbai, India
Copyright © 2013 IFAC 696