Skill reconstruction as induction of LQ controllers with subgoals Dorian Suc dorian. suc@fri.uni-lj. si Faculty of Computer and Information Sciences, University of Ljubljana, Trzaska 25, 1000 Ljubljana, Slovenia Abstract Controlling a complex dynamic system, such as a plane or a crane, usually requires a skilled operator. Such a control skill is typically hard to reconstruct through introspection. There- fore an attractive approach to the reconstruc- tion of control skill involves machine learning from operators' control traces, also known as behavioural cloning. In the most common ap- proach to behavioural cloning, a controller is induced in the form of a rule set or a deci- sion or regression tree that maps system states to actions. Unfortunately, induced controllers usually suffer from lack of robustness and lack typical elements of human control strategies, such as subgoals and substages of the control plan. In this paper we present a new approach to behavioural cloning which involves the in- duction of a model of the controlled system and enables the identification of subgoals that the operator is pursuing at various stages of the ex- ecution trace. The underlying formal basis for the present approach to behavioural cloning is the theory of LQ controllers. Experimental re- sults show that this approach greatly improves the robustness of the induced controllers and also offers a new way of understanding the op- erator's subcognitive skill. 1 Introduction Controllers can be designed by Machine Learning using different kinds of information available to the learning system. Approaches like reinforcement learning, genetic- algorithms and neural networks typically don't use prior knowledge about the system to be controlled. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies from their prior knowledge of the system or from demon- stration of experienced operators. Control theory makes use of the former, but it doesn't consider operator's skill. Ivan Bratko ivan.bratko@fri.uni-lj.si Faculty of Computer and Information Sciences, University of Ljubljana, Trzaska 25, 1000 Ljubljana, Slovenia On the other hand, the idea of behavioural cloning (a term introduced by Donald Michie [Michie, 93]) is to make use of the operator's skill in the development of an automatic controller. A skilled operator's control traces are used as examples for machine learning to re- construct the underlying control strategy that the oper- ator executes subconsciously. The goal of behavioural cloning is not only to induce a successful controller, but also to achieve better understanding of the human oper- ator's subconscious skill [Urbancic and Bratko, 94]. Be- havioural cloning was successfully used in problem do- mains as pole balancing, production line scheduling, pi- loting [Sammut et al., 92] and operating cranes. These experiments are reviewed in [Bratko et al., 95]. Con- trollers were usually induced in the form of decision or regression trees. Although successful clones have been induced in the form of trees or rule sets, the following problems have generally been observed with this approach:  Typically, induced clones are brittle with respect to small changes in the control task.  The clone induction process typically has low yield: the proportion of successful controllers among all the induced clones is low, typically well below 50%.  Resulting clones are purely reactive and inade- quately structured as conceptualisations of the hu- man skill. They lack typical elements of human con- trol strategies: goals, subgoals, phases and causal- ity. In this paper we propose a different approach to be- havioural cloning which exploits some results from con- trol theory. In particular, our clones take the form of LQ controllers. The approach also involves induction of approximate models of controlled systems. It is experi- mentally demonstrated that this approach dramatically improves both the clones' robustness with respect to the changes of the control task, and the yield of the cloning process. Also, the approach provides a way of interpret- ing the induced clones in terms of the operator's goals 914 LEARNING