Online Partially Model-Free Solution of Two-Player Zero Sum Differential Games Praveen P * Shubhendu Bhasin ** * Department of Electrical Engineering, Indian Institute of Technology Delhi, India (e-mail: eez138263@ee.iitd.ac.in) ** Department of Electrical Engineering, Indian Institute of Technology Delhi, India (e-mail: sbhasin@ee.iitd.ac.in) Abstract: An online adaptive dynamic programming based iterative algorithm is proposed for a two-player zero sum linear differential game problem arising in the control of process systems affected by disturbances. The objective in such a scenario is to obtain an optimal control policy that minimizes the specified performance index or cost function in presence of worst case disturbance. Conventional algorithms for the solution of such problems require full knowledge of system dynamics. The algorithm proposed in this paper is partially model-free and solves the two-player zero sum linear differential game problem without knowledge of state and control input matrices. Keywords: Two-player zero sum differential game, Adaptive dynamic programming, Approximate Dynamic Programming 1. INTRODUCTION In a two-player zero sum game, each player’s gain is exactly balanced by the loss of his competitor and the net gain of the players at any point in time is zero [Isaacs (1965)]. In scenarios of perfect competition such as that exist in a zero sum game, each player tries to make the best possible decision taking into account the fact that his opponent also tries to do the same. The theory of zero sum differential game finds applications in a wide variety of disciplines including that of control theory [Tomlin et al. (2000); Wei and Liu (2012)]. The problem of designing optimal controller for any pro- cess system subject to worst possible disturbance is a min- imax optimization problem where the controller tries to minimize and the disturbance tries to maximize the infinite horizon quadratic cost. Using game theoretic framework, the above minimax optimization problem can be viewed as a zero sum differential game where the controller acts as the ’minimizer’ or minimizing player, while the dis- turbance acts as the ’maximizer’ or maximizing player [Basar and Bernhard (1995); Basar and Olsder (1995)]. The optimal control in such a scenario is equivalent to finding the Nash equilibrium or saddle point equilibrium of the corresponding two-player zero sum differential game [Basar and Bernhard (1995)]. To obtain the saddle point equilibrium strategy of each player in the two-player zero sum linear differential game, one needs to solve an Algebraic Riccati Equation (ARE) with a sign indefinite quadratic term known as the Game Algebraic Riccati Equation (GARE). [Kleinman (1968)] proposed an iterative method for solving the ARE with a sign definite quadratic term. A series of Lyapunov equations are constructed at each iteration and the posi- tive semi-definite solutions of the Lyapunov equations are shown to converge to the solution of ARE. However the algorithm proposed in [Kleinman (1968)] for solving the ARE could not be extended to the GARE due to the presence of a sign indefinite quadratic term. Subsequently, several Newton-type algorithms were pro- posed for the solution the GARE [Arnold and Laub (1984); Damm and Hinrichsen (2001); Mehrmann and Tan (1988)]. [Mehrmann (1991)] and [Sima (1996)] proposed a matrix sign function method for solving GARE. A more robust iterative algorithm for solving GARE was proposed by [Lanzon et al. (2008)], where the GARE with a sign indefi- nite quadratic term is replaced by a sequence of AREs with sign definite quadratic terms. Each of these AREs can then be sequentially solved using Kleinman’s algorithm or any other existing algorithm and the recursive solution of these AREs converge to the solution of GARE. However, all the above results require knowledge of full system dynamics, which is a severe restriction owing to the uncertainties in system modelling. The concept of Adaptive Dynamic Programming (ADP) was proposed by [Werbos (1992)] for solving the dynamic programming problems related to classical optimal control in a forward in time fashion. ADP is based on the concepts of Dynamic Programming [Bellman (2003)] and Reinforce- ment Learning [Sutton and Barto (1998)] and has been widely used to reach approximate solutions of optimal control problems [Abu-Khalaf and Lewis (2005); Lewis and Liu (2013)]. The classical optimal control problem is a single player linear differential game problem [Isaacs (1965)] and a detailed analysis of the application of ADP for the solution of single player linear differential game problems (optimal control problems) is provided in [Wang et al. (2009)] and [Lewis et al. (2012)]. Preprints of the 10th IFAC International Symposium on Dynamics and Control of Process Systems The International Federation of Automatic Control December 18-20, 2013. Mumbai, India Copyright © 2013 IFAC 696