A Reactive Approach to Classifier Systems JosC M. Molina, Carlos Sevilla, Pedro Isasi, Araceli Sanchis Grupo de Vida Artificial. Departamento de Informiitica. Universidad zyxwvuts Carlos zyxwv III de Madrid, Spain. Avda. Butarque 15,2891 1 LeganCs. Madrid. ABSTRACT The navigation problem involves how to reach a goal avoiding obstacles in dynamic environments. This problem can be faced considering reactions andor sequences of actions. Classifier Systems (CS) have proven their ability of continous learning, however they have some problems in reactive systems. zyxwvuts A modified zyxwvutsrqpo CS is proposed to overcome these problems. Two special mechanisms are included in the developed CS to allow the learning of both reactions and sequences of actions. This learning process involves two main tasks: first, discriminating between rules and second, the discovery of new rules to obtain a successful operation in dynamic environments. Different experiments have been carried out using a mini-robot Khepera to find a generalized solution. The results show the ability of the system for continuous learning and adaptation to new situations. 1. INTRODUCTION A Classifier System (CS) [4] is well suited to leam multiple different concepts incrementally under payoff. These systems have been widely implemented and tested for a large number of theoretical problems, [15, 161, but there are not many cases in which they are included in real systems [2, 14, 151. In the most recent bibliography, especially in [14], the CS’s appear as systems of doubtful efficiency learning. They were employed with great frequency from the moment of their description by Holland [4, 51, but today they have smaller summit because of the problems and difficulties that present. When they are intended to apply Classifier Systems to the resolution of certain importance problems, a series of difficulties appear, that they could, even, make to think about the convenience of employing any other system. One of the principal problems located in CS’s are related to their application to dynamical environments. In spite of the article of Brooker [l], that describes these systems as prepared to operate in changing environments (concretely, he has developed a controller system for a predator that moves and hunts in a world), the reality is that in the bibliography are not collected good results for these cases. This bad results are due to the fact that, if the decision time is let to increase to the classifier, something which is necessary to provide a elaborated solution to the problem, while the individual (predator in [ l ] or “animat” in [15] and [16]) continues inside the world and being moved in it, when the system provides the solution, this no longer results valid, in most of the cases. Furthermore, the evaluation of that decision is not valid, since the appropriateness of the output is not known, due to the temporary lag between input and output. 0-7803-4778-1 /98 $10.00 zyxwvut 0 1998 IEEE 1359 The problem of the capacity of the system by producing a quick response should not be approached only from techniques that attempt to increase the speed of the process, but they can be approached from a different perspective: the injection of environmental data and obtainment of intermediate decisions in the course of global decision [ll, 121. The rule chaining in the traditional CS makes the system blind to the environment because it can not manage new sensorial inputs during the decision process. In a dynamic environment, system ought to read sensors in each decision step (reaction), that is the main feature of reactive systems. For instance, in a navigation problem in a dynamic environment (where the obstacles are moving around) a robot ought not to be blind any time, zyx so each movement has to be the result of applying the decision process over the last sensorial input. In this work, a new CS is proposed, Reactive CS (RCS), modifying the general process in order to allow reactions without loosing the possibility of rule chaining. The new process integrates the environmental input with the intemal state of the previous input. Then, from an input, the RCS gives directly an action and, at the same time, modified the intemal state. When the next input arrives, the message is fused with the previous intemal state to allow a new reaction or an action that chains with the previous action. This new RCS will be used to learn a fundamental requirement for autonomous mobile robots: navigation. This task gets the robot from place to place with safety and no damages. In the proposed learning process, the only previous information is about the number of inputs (robot sensors), the range of sensors, the number of outputs (number of robot motors) and its description. The robot controller starts without information about the right associations between sensor inputs and motor velocities. And from this situation the robot is able to learn through experience to reach the highest adaptability grade to the sensors information. The results obtained proved the capability of generating not only new better rules but the mechanisms for chaining new and existing rules. Another important aspect verified in this work is the possibility of continuously learning and adaptation to new situations that allow to solve the problem even if there are mobile objects, more than one goal, and dynamical goals that could appear and disappear or move when the robot is navigating. In this paper we present the results of a research aimed at learning reactive behaviors in an autonomous robot. In section 2, we outline the general theory of classifier and reactive systems. Section 3 is related to RCS and the goals of the work. Authorized licensed use limited to: Univ Carlos III. Downloaded on March 26, 2009 at 11:41 from IEEE Xplore. Restrictions apply.