LEARNING TO FLY WITH CHURPs David Stirling BHP Research,Port Kembla Laboratories P.O. Box 77, Port Kembla, NSW 2505 davids@resptk.bhp.com.au Abstract: This paper examines a new technique for uncovering and synthesising control skills evolved by human agents involved in controlling complex machines or devices. The repetitive application of such skills often renders them as automatic, sub-cognitive responses such as driving (controlling), a car or riding a bicycle. Uncovering and imitating such sub- symbolic abilities has received much attention in the field of machine learning of late, particularly through the technique of behavioural cloning. However, to date, the advantages afforded by this technique, typically, transforming such implicit skills to symbolic forms are often eclipsed by a lack of robustness, particularly in domains involving dynamic control skills. By employing Compressed Heuristic Universal Reaction Planners, or CHURPs, which offers a different perspective on control skills, all of these disadvantages can be overcome. CHURPs also provide a human-like synthetic form of knowledge, substantially better control performance and offer a range of additional benefits, such as surrogate control and goal sharing behaviours. In this paper the structure of CHURPs is explained and comparative performance results on Sammut’s flight domain 1 are given. 1. Introduction Mankind has always influenced his environment by using various tools, artifacts or machines constructed for the purpose. Using such devices to achieve certain goals forms the general notion of control. This notion of control is quite ubiquitous in everyday life, for example tasks such as driving a car, filling a bath to the desired level and temperature, to conducting or guiding a manufacturing process, or flying an aeroplane. The environment’s response to such causal influences is dynamic; that is, various inertias and stored energies disturbed from a state of equilibrium will result in a further chain of events and behaviours. If we can model control problems containing multiple degrees of freedom using techniques that traditional control practitioners are apt to invoke (such as model reductions, simplifications and abstractions), then we may find an optimal controller solution. However, real-world events can generate unexpected disturbances that have neither been anticipated nor modelled by such methods. This produces a somewhat less than optimal performance, at best sub-optimal performance, or worst a complete failure. Yet in many such situations human operators using their accrued knowledge and skills manage to cope admirably. This fact underpins much of the current research into the skills of expert human agents particularly where such skills could be employed to develop vastly improved intelligent, autonomous control systems. Expert agents, however, who have developed their skills over numerous interactions with a machine or artifact often undergo an automisation process where such skills which where once possibly in a symbolic, cognitive form are replaced with a non-symbolic, sub-cognitive or implicit form 2 . In other cases, these skills may have evolved directly from the sensory- motor feedback of the agent’s body without any preceeding symbolic existence. Recovery of Appearing in the Proceedings of the 8th Australian Joint Conference on Artificial Intelligence, Australian Defence Force Academy, November. 15-17, Canberra, 1995.