Continuous Valued Q-learning for Vision-Guided Behavior Acquisition Yasutake Takahashi, Masanori Takeda, and Minoru Asada Dept. of Adaptive Machine Systems Graduate School of Engineering Osaka University Suita, Osaka 565-0871, Japan yasutake,takeda,asada@er.ams.eng.osaka-u.ac.jp Abstract Q-learning, a most widely used reinforcement learn- ing method, normally needs well-defined quantized state and action spaces to converge. This makes it difficult to be applied to real robot tasks because of poor perfor- mance of learned behavior and further a new problem of state space construction. This paper proposes a continuous valued Q-learning for real robot applications, which calculates contribu- tion values for estimate a continuous action value in order to make motion smooth and effective. The pro- posed method obtained the better performance of de- sired behavior than the conventional real-valued Q-learning method, with roughly quantized state and action. To show the validity of the method, we applied the method to a vision-guided mobile robot of which task is to chase the ball. Although the task was simple, the performance was quite impressive. Further improve- ment is discussed. 1 Introduction Reinforcement learning has been receiving increased attention as a method with little or no a priori knowl- edge and higher capability of reactive and adaptive behaviors through such interactions [1]. Asada et al. have presented a series of works on soccer robot agents which chase and shoot a ball into the goal or pass it to another agent. In their reinforcement learning methods, the state and action spaces are quantized by the designer [2, 3] or constructed through the learning process [4, 5, 6] in order to make Q-learning, a most widely used reinforcement learning method [7], appli- cable. That is, well-defined and quantized state and action spaces are needed to apply Q-learning to real robot tasks. This causes two kinds of problems: Performance of robot behavior is not smooth, but jerky due to quantized action commands such as forward and left turn. State space construction which satisfies Marko- vian assumption is a new problem as noted in [4, 5, 6] In this paper, we propose a continuous valued Q- learning for real robot applications. There were sev- eral related works so far. Boyan and Moore reported that the combination of dynamic programming and parameterized function approximation had shown poor performances even for benign cases [13]. Saito and Fukuda[11] and Sutton[12] proposed to use sparse- coarse-coded function approximator (CMAC) for Q- value estimation. However CMAC has its own prob- lem of quantization and generally need a lot of learn- ing data. This means their method takes long learning time. On the other hand, the proposed method inter- polates continuous values between roughly quantized states and actions. This contributes to realize smooth motions with much less computational resources. To show the validity of the method, we applied the method for a vision-guided mobile robot of which task is to chase the ball. Although the task was simple, the performance was quite impressive. The rest of this article is structured as follows: first, Q-learning is briefly described, then our method is explained. The method is applied to the domain of soccer robot, RoboCup [8], where a learning robot at- tempts to approach a ball. Finally, the real robot learning results are shown and a discussion is given. 2 An Overview of Q-Learning Before getting into the details of our method, we will briefly review the basics of Q-learning, a most widely used reinforcement learning algorithm. Q-learning is a form of model-free reinforcement learning based on stochastic dynamic programming.