Iterative Learning Heuristic Dynamic Programming (ILHDP) design of a Steam Power Plant Controller Udhay Ravishankar, Member IEEE, and Milos Manic, Member IEEE University of Idaho, Idaho Falls ravi4736@vandals.uidaho.edu, misko@ieee.org Abstract-This paper presents a new dynamic programming method called the Iterative Learning Heuristic Dynamic Programming (ILHDP). The ILHDP is an Iterative Learning Control (ILC) based Neural Dynamic Programming (NDP) algorithm. The NDP aspect of the ILHDP algorithm is borrowed from traditional Adaptive Critic Design (ACD) algorithms. Typical NDP algorithms in the ACD class of algorithms train a Model Network beforehand and use a Critic Network, as the gradient approximator, trained back-and-forth with the Action Network in each iteration to converge the Action Network towards the optimal control policy. The proposed ILHDP algorithm updates the Model Network continually based on newly obtained data sampled during each Action Network optimization step on the same experiment. This process of Model Network updation ensures better gradient approximation presented by the Model Network itself. The presented ILHDP is used for the design of a Steam Power Plant controller with respect to the Active-Power-to-Frequency droop characteristics. Test results indicated that the ILHDP designed controller was capable of stabilizing the output power of the Steam Power Plant to track the load with a maximum tracking error of 0.011 for abrupt load changes as fast as 15s. The Steam Power Plant was also subjected to large transient spikes for which the designed controller proved to recover the system back to stability. I. INTRODUCTION Power Grid optimization is a growing field in the realm of Smart Grid Research and Development. The demand for optimal control has heightened as the Electric Power Grid has become more complex and unmanageable under harsh load conditions. Examples of work done in the field of optimal control for Smart Grids can be found in [1] – [3]. Beyond the realm of Smart Grids, optimal control is also becoming popular in the field of Robotics, Industrial Processes, Engine Control, etc, such as in [4] – [6]. The most popular optimal controller design methods belong to a class of Adaptive Critic Design (ACD) algorithms introduced by Werbos in [7]. The reason for their popularity is that they are neural network based and hence the complex mathematics behind dynamic programming can be approximated using simple neural network properties. In a typical ACD setup, three neural networks are used that are connected in cascaded fashion starting with an Action Network followed by a Model Network and then a Critic Network. The Action Network is the approximation of the optimal control policy, while the Model Network is the approximation of the concerned system dynamics and the Critic Network is the approximation of the Hamilton-Jacobi- Bellman (HJB) equation, or its gradient, typically found in dynamic programming literature [8] – [9]. In typical ACD algorithms, the Model Network is trained beforehand from previously sampled data after which the Critic and Action Network are trained back-and-forth optimizing the Action Network in the process. The training of the neural networks is performed using the Error- Backpropagation (EBP) algorithm introduced by Werbos in 1974 [10]. The theoretical details of the different ACD dynamic programming algorithms can be found in [11]. Other developments in ACD include greedy HDP iteration method to optimize the Action Network. Examples of work published using the greedy HDP iteration method can be found in [12] – [14]. Iterative Learning Control (ILC) is also a popular optimal control design method. Its principle is based on improvising the controller using previously learned information about the system iteratively. While ACD is purely neural network based, ILC is more analytical. Examples of work published in this field can be found in [15] – [16]. The proposed ILHDP is a purely neural network optimal control design algorithm based on the principle of ILC. In the proposed ILHDP algorithm, the Model Network is continuously updated from newly obtained data sampled during the Action Network optimization process. This process of continual Model Network updation ensures better gradient approximation presented by the Model Network itself. The ILHDP algorithm was applied to the design of an optimal Steam Power Plant neural controller. The Steam Power Plant is modeled as a third order linear system that converts power from the boiler to mechanical power on the turbine. The neural controller acts as an Automatic Generation Controller (AGC) that regulates the output power to track the load. The rest of this paper proceeds as follows: Section II discusses the background overview of Iterative Learning Control (ILC) and Adaptive Critic Design (ACD) followed by the introduction the ILHDP algorithm in Section III. The Steam Power Plant model and its associated Grid model for loading conditions are introduced in Section IV along with the ILHDP design implementation of the Steam Power Plant controller. Test results are discussed in Section V and the paper finally concludes with future work in Section VI.