IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 6, NO. 3, MAY 2019 657 Optimal Fixed-Point Tracking Control for Discrete-Time Nonlinear Systems via ADP Ruizhuo Song, Member, IEEE and Liao Zhu Abstract—Based on adaptive dynamic programming (ADP), the ﬁxed-point tracking control problem is solved by a value iteration (VI) algorithm. First, a class of discrete-time (DT) nonlinear system with disturbance is considered. Second, the convergence of a VI algorithm is given. It is proven that the iterative cost function precisely converges to the optimal value, and the control input and disturbance input also converges to the optimal values. Third, a novel analysis pertaining to the range of the discount factor is presented, where the cost function serves as a Lyapunov function. Finally, neural networks (NNs) are employed to approximate the cost function, the control law, and the disturbance law. Simulation examples are given to illustrate the effective performance of the proposed method. Index Terms—Adaptive dynamic programming (ADP), ﬁxed- point tracking, optimal control. I. I NTRODUCTION D YNAMIC programming (DP) is one of the methods for solving optimal control problems of nonlinear sys- tems [1], [2]. We can obtain the optimal control laws of nonlinear systems by solving the Hamilton-Jacobi-Bellman (HJB) equation. However, it is difﬁcult to obtain because of “curse of dimensionality” [3]. In [4] and [5], Werbos proposed adaptive dynamic programming (ADP), which overcomes the weaknesses of DP. Approximate structures such as neural networks (NNs) and polynomials are employed to estimate the cost function and control laws forward-in-time in ADP. Up to now, the main components of ADP include heuristic dynamic programming (HDP), dual heuristic dynamic programming (DHP), action dependent HDP (ADHDP), action dependent DHP (ADDHP), globalized dual heuristic dynamic program- ming (GDHP), action dependent globalized DHP (ADGDHP), and a single network adaptive critic (SNAC) architecture [5]− [7]. In recent years, ADP has made new progress in solving optimal control problems of nonlinear systems. To relax the initial condition of the value iterative (VI) algorithm, Wei et al. Manuscript received May 30, 2018; revised August 17, 2018; accepted October 6, 2018. This work was supported in part by the National Natural Science Foundation of China (61873300, 61722312) and in part by the Fundamental Research Funds for the Central Universities (FRF-GF-17-B45). Recommended by Associate Editor Huaguang Zhang. (Corresponding author: Ruizhuo Song.) Citation: R. Z. Song and L. Zhu, “Optimal ﬁxed-point tracking control for discrete-time nonlinear systems via ADP,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 657-666, May 2019. The authors are both with the School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China (e-mail: ruizhuosong@ustb.edu.cn; liao.zhu@foxmail.com). Color versions of one or more of the ﬁgures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/JAS.2019.1911453 [8] proposed a novel method called the “local value iteration ADP algorithm” and proved the convergence of iterative cost function by a novel analysis. In [9], the uniqueness of the solution of the Bellman equation was established by Bertsekas et al., and the convergence results of VI and policy iteration (PI) were proved. In [10], Fan et al. progressed beyond usual optimal control results to show that the output of the plant was always within user-deﬁned bounds. Liu et al. used the ADHDP method to solve the residential energy scheduling problem in [11]. In [12], an optimal battery sequential control iteratively was obtained for smart home energy systems. Wang et al. [13] dealt with the fault detection, estimation, and fault- tolerant control problems of a nonlinear single input single output model-free system. It is well known that a large class of real systems are driven by more than one controller or disturbance with each using an individual strategy. We assess the quality of their strategies through a cost function, where they always operate as if in a game [14]−[17]. In the two-player zero-sum (ZS) game, the choice of one player depends strictly on other players in select- ing strategies. The theory of H ∞ control can be employed to a ZS game which uses the controller to reduce the disturbance effect [18], [19]. To obtain the Nash equilibrium solution of a ZS game is equivalent to solving the Hamilton-Jacobi- Isaacs (HJI) equation of the system [20]. Many researchers are devoted to obtain the optimal control laws of nonlinear systems and discuss the existence of criteria of an optimal solution. Adaptive critic approximate dynamic programming designs were derived to solve the discrete-time (DT) ZS game in [21]. [22] was concerned with a new data-driven zero-sum neuro-optimal control problem for continuous-time unknown nonlinear systems with disturbances. Zhu et al. proposed an iterative ADP algorithm to solve the continuous- time, unknown nonlinear ZS game with only online data in [23]. Based on [24], Wei et al. presented a novel iterative ZS ADP algorithm to solve an inﬁnite-horizon DT two-player ZS game of nonlinear systems in [25]. One of the most important problems in control theory is the tracking problem, which is to obtain control laws to achieve asymptotic tracking of prescribed trajectories. It has been studied by many researchers [26]−[29]. Recently, in [30], the decentralized tracking control problem was investigated for unknown large-scale systems based on ADP. For completely unknown nonlinear systems, Lv et al. used an identiﬁer NN to approximate the unknown system and solved the H ∞ tracking control problem based on an augmented matrix [31]. The optimal output tracking control problem of DT nonlinear system was considered in [32], which proposed multistep