Failure handling in a design expert system David C Brown This paper is concerned with how to handle the failures that occur during design problem-solving. Failure handlers and redesigners are introduced. Failure recovery action and the knowledge involved is presented for each agent. The role of suggestions and redesign strategies is discussed. The handling of plan failures is also presented. The paper concludes by surveying other methods of failure handling from the litera- ture. failure handling, expert systems, redesign strategies This research is concerned with the design of mechanical components. It views design as a problem-solving activity, (see Chandrasekaran 1 ). The theory explains the activity of a human designer when solving a problem that falls into a particular subclass of mechanical design. Recently there has been an increasing amount of research on design or design related problem-solving systems. This work includes Birmingham and Siewiorek 2, Bowen a, Dixon et al 4 , Grinberg s, Kowalski and Thomas 6, Latombe 7, McDermott 8'9, Mitchell 1° and Sussman u amongst others. Much effort has concentrated on electronics, but some attention is starting to be paid to research on design in other areas. Some of this research has been included in the field of expert systems. We feel that this has come to mean the production of applications systems using existing artificial intelligence (AI) techniques; in particular, the use of rules plus an inference engine. However expert the performance of such systems may be, there is still need to explore the structure of the knowledge and the problem-solving strate- gies that underlie an expert's performance. Our research is motivated by this need. Design activity in general has many components; such as planning, the use of prestored plans, refinement of descrip- tions and the use of large amounts of knowledge. Not all designing involves all of these. We have described three classes of design activity, Brown and Chandrasekaran 12, which vary according to their knowledge and problem- solving components. Our work refers only to the third class, which requires that at every stage of the design the designer knows both what sequences of design steps are appropriate and also what knowledge is required. This is a particular type of 'routine' design. The theory hypothesizes that the design activity is orga- nized around a hierarchy of concepts, where each concept is active in the design. A concept may be considered to be a specialist about some subproblem of the design. The hier- archy reflects the way that the designer thin ks of the object during design. Artificial Intelligence Group, Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA Each specialist can select from its own set of plans. Selec- tion depends on the current state of the design. Each plan is a sequence of design actions. An action may be a request to attempt portions of the design using another specialist lower in the hierarchy. Alternatively an action could use a task to make small additions to the design itself. A task uses steps. A step decides the value of one member of the group of attributes for which the task is responsible. For example, a hole might be designed by a task, while a step would decide the radius. Constraints may be planted at any point in order to test the validity of the design. The complete design process proceeds by first obtaining and checking the requirements. It then does a rough design to establish whether full design is worth pursuing. If the rough design succeeds, then the full design is attempted by requesting a design from the topmost specialist. Com- munication between active design agents is done by passing messages that give instructions or report on success or failure. This theory has been used to construct an expert problem- solver for the design of a type of air cylinder. The system, called AIR-CYL, has been reported in Brown 13 and Brown and Chandrasekaran 14. It takesonly a few minutes to design an air cylinder that involves about 120 design decisions. AI R-CYL incorporates the failure handling theory that is presented below. The system is written in DSPL, and is implemented in ELISP on a DEC System-20. DSPL (see Brown is) is a language specifically tailored for writing design problem-solvers of this type. The rest of this paper will discuss an approach to handling failures that occur during this class of design activity. We will start by presenting the additional knowledge that a person and consequently a system, has, in order to deal with failures. Next, we describe how agents fail and attempt to correct themselves. Finally, we discuss some other research in AI that addresses failure handling and compare it with our research. KNOWLEDGE FOR FAILURE HANDLING Guidelines The proposed structure of design problem-solving (ie a hier- archical organization of specialists, plans, tasks and steps) provides the context in which to structure failure handling. We view failure handling as being a complex structured activity. We will assume that at any point in the structure only the minimum knowledge is available locally fibout the global state of the problem-solving activity. This is motivated by human memory limitations. We are proposing that all design agents detect their own failure, and be able to determine what went wrong (at least superficially). They will attempt to fix it locally, and report failure only if all attempts fail. Agents that have some con- trol over other agents can use those agents in their attempt to correct the detected problem. 436 0010-4485/85/090436-07 $03.00 © 1985 Butterworth & Co (Publishers) ktd computer-aided design