Optimal Process Control of Symbolic Transfer Functions C. Griffin and E. Paulson Applied Research Laboratory Penn State University University Park, PA 16802 griffinch@ieee.org, ecp141@psu.edu ABSTRACT Transfer function modeling is a standard technique in clas- sical Linear Time Invariant and Statistical Process Control. The work of Box and Jenkins was seminal in developing methods for identifying parameters associated with classical (r, s, k) transfer functions. Computing systems are often fundamentally discrete and feedback control in these situations may require discrete event systems for modeling control structures and process flow. In these situations, a discrete transfer function in the form of an accurate hidden Markov model of input/output relations can be used to derive optimally responding con- trollers. In this paper, we extend work begun by the authors in identifying symbolic transfer functions for discrete event dy- namic systems (Griffin et al. Determining A Purely Sym- bolic Transfer Function from Symbol Streams: Theory and Algorithms. In Proc. 2008 American Control Conference, pgs. 1166-1171, Seattle, WA, June 11-13, 2008). We assume an underlying input/output system that is purely symbolic and stochastic. We show how to use algorithms for esti- mating a symbolic transfer function and then use a Markov Decision Processes representation to find an optimal sym- bolic control function for the symbolic system. 1. INTRODUCTION Transfer function modeling is critical in minimum mean square error (MMSE) control [1]. The work of Box and Jenkins was seminal [2] and has been extended and enhanced over the years by several authors. We contrast this with the discrete event control literature. In that case, plant models are often developed by hand. This may be reasonable in some cases but for real-world applications controllers need to be synthesized for complex (e.g., computational) systems that are not fully known a priori. In particular, it is difficult to be certain that manu- ally created models accurately reflect plant dynamics. This is especially true when system transitions follow probabil- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. ity distributions. If we could automatically derive a plant model whose outputs are observed responses to a known set of inputs, the resulting model of input-output relationships would be a discrete event transfer function. This transfer function could be used to synthesize a discrete event con- troller that could optimize some objective function defined in terms of the discrete event dynamical system. In [3], we showed how to extend Crutchfield and Shalizi’s CSSR algorithm [4–6] to identify an asymptotically opti- mal Mealy Machine representation, when three parameters are supplied: l1, the maximal input history length; l2, the maximal output history length; and k the delay. We called this the symbolic transfer function. In this paper, we show how to use the derived system to optimize the return in a discounted reward context, essentially showing that this is equivalent to finding the solution to a Markov Decision Pro- cess [7]. It should be noted that this work is similar in spirit to the work of Watkins [8] and the Q-learning literature [9], which derive an optimal response to a Markov decision pro- cess when limited initial information on the reward structure is available. This work is different from the Q-learning liter- ature in that (i) The Q-learning literature assumes an under- lying Markov Decision Process (MDP) structure with only reward outputs; i.e., there is no input-output assumption. (ii) In Q-learning, reward is only a function of input, not input and output. (iii) There is no underlying assumption of lagged outputs as a function of inputs. (iv) Q-learning is a fundamentally online process that attempts to learn a system as it evolves. In this paper, we investigate a learn- ing and optimization framework that operates offline. Thus the work is in perfect analogy to the Box-Jenkins control work [2]. 2. PRELIMINARIES In this section we provide the notation and preliminar- ies necessary for the proposed approach. Our notation is derived from Box and Jenkins [2] and symbolic dynamical systems [10] using time series expressed as a string of sym- bols. We discuss probabilistic automata, including prob- abilistic labeled transition systems (which are essentially Markov chains with labels) and probabilistic Mealey ma- chines. 2.1 Time Series of Symbols Let A (the input alphabet) and A (the output alphabet) be finite sets of symbols. A symbolic time series is a sequence: x = ...x(−2)x(−1)x(0)x(1)x(2) ... , where x(t) represents the symbol that occurred at discrete time t in x. If x(t) is