Learning Procedures for Autonomic Computing Tessa Lau, Daniel Oblinger, Lawrence Bergman, and Vittorio Castelli IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 tessalau@us.ibm.com Corin Anderson Google 2400 Bayshore Parkway Mountain View, CA 94043 1 Introduction Today’s skilled IT professionals bring to bear an enormous amount of knowledge about how systems are configured, how they function on a day-to-day basis, and how to repair them when they break. However, there are not enough skilled IT professionals to meet the ever-growing demand. Autonomic computing offers a way out of this dilemma: offload the re- sponsibility of managing complex systems onto the systems themselves, rather than relying on limited human resources. This problem raises a large challenge: how will we trans- fer the knowledge about systems management and configu- ration from the human experts to the software managing the systems? We believe this problem is fundamentally a knowl- edge acquisition problem. Our approach to solving this prob- lem draws on machine learning and knowledge representa- tion. Our core idea is based on programming by demonstra- tion: by observing several human experts each solve a similar problem on different systems, we generalize from traces of their activity to create a robust procedure that is capable of automatically performing the same task in future instances. What will make it work is the observation that solutions to similar problems share similar sub-procedures. By capturing these nuggets of problem-solving knowledge from multiple experts, we form a robust procedure that captures the impor- tant parts of the procedures executed by all of the experts. We are currently employing this approach to acquire desk- side technical support procedures, such as upgrading a net- work card, troubleshooting email problems, and installing a new printer. Our system captures traces of multiple desk-side support representatives as they perform one task, such as di- agnosing a dysfunctional network adapter, under a variety of operational conditions. From these traces, our system gener- alizes and aligns the traces into a single general procedure for repairing network adapters. An important feature of our ap- proach is that it works across applications, by instrumenting at the Windows operating system level. This paper describes our formulation of this problem as a machine learning problem. First we the problem and de- scribes how various problem characteristics affect the dif- ficulty of the learning problem. We then outline the sub- problems we have identified, and describe our approach to each one. Finally, we conclude with a summary of current results and directions for future work. 2 Procedural knowledge acquisition We formulate the problem of procedural knowledge acquisi- tion as follows. Given as input one or more traces of an expert’s keyboard and mouse actions as she demonstrates a procedure, output a procedure model that, when executed on a new system, performs the same task. Our approach to this problem is based on machine learn- ing: given traces of a procedure’s execution behavior, induce the procedure. We are also concerned with knowledge rep- resentation (how to represent a procedure, a procedure step, and the state of the world) and procedure execution (taking a generalized procedure model and mapping it into the concrete actions required to perform the procedure on a new system). Clearly, the type of procedure as well as the quality of the traces determines how difficult it will be to construct the pro- cedure model. We have identified a number of problem char- acteristics that affect problem difficulty: • Procedure structure complexity: A straight-line pro- cedure with no deviations from the main path will be easier to learn than a procedure that has many condi- tional actions or alternative paths. • Trace noisiness: Execution traces in which the ex- pert performs extraneous steps, or in which unexpected events happen asynchronously, will make it more diffi- cult to learn. • Incremental or batch learning: The choice of learning algorithm depends on how it is going to be used. In- cremental learning, where a procedure model is updated dynamically as traces are created rather than overnight in a batch process, places different constraints on the al- gorithms that may be used to learn procedures. • State observability: The choice of what action to per- form at each step of the procedure depends on how much information is available to the system to make that deci- sion. If the choice can be made based on some informa- tion displayed on the user’s screen, the problem is easier than if the choice is made based on some hidden vari- able, perhaps some state stored in the expert’s mind. In the next section, we describe some of the research chal- lenges we have identified in working on this problem, and outline the approaches we have taken on each challenge.