Reasonable Performance in Less Learning Time by Real Robot Based on Incremental State Space Segmentation Yasutake Takahashi, Minoru Asada and Koh Hosoda Dept. of Mech. Eng. for Computer-Controlled Machinery Osaka University, 2-1, Yamadaoka, Suita, Osaka 565, Japan yasutake@robotics.ccm.eng.osaka-u.ac.jp Abstract Reinforcement learning has recently been receiving increased attention as a method for robot learning with little or no a priori knowledge and higher capability of reactive and adaptive behaviors. However, there are two major problems in applying it to real robot tasks: how to construct the state space, and how to re- duce the learning time. This paper presents a method by which a robot learns purposive behavior within less learning time by incrementally segmenting the sensor space based on the experiences of the robot. The in- cremental segmentation is performed by constructing local models in the state space, which is based on the function approximation of the sensor outputs to reduce the learning time and on the reinforcement signal to emerge a purposive behavior. The method is applied to a soccer robot which tries to shoot a ball into a goal. The experiments with computer simulations and a real robot are shown. As a result, our realrobot has learned a shooting behavior within less than one hour training by incrementally segmenting the state space. 1 Introduction Reinforcement learning has recently been receiving increased attention as a method for robot learning with little or no a priori knowledge and higher capa- bility of reactive and adaptive behaviors[1]. However, there are two major problems in applying it to real robot tasks. 1. Selection of the sensor information to describe the state of robots and their environment. If one uses all the sensor outputs, the amount of the data the robot has to deal with will exceed the capability of the robot (memory and processing power). 2. Even though the sensor information is well se- lected for the given task, the segmentation prob- lem will remain. The state space designed by the programmer is not guaranteed as an opti- mal one for the robot to perform the task. The coarse segmentation will cause so-called “per- ceptual aliasing problem”[2] by which the robot cannot discriminate the states important to ac- complish the task at hand. On the other hand, the ﬁne segmentation to avoid the perceptual aliasing problem will produce too many states to generalize the experiences. Since the learn- ing time increases exponentially with the num- ber of states, the robot needs enormous amount of learning time. For the former, Whitehead and Ballard [2] proposed a selection method of the sensor information in order to avoid the perceptual aliasing. Tan [3] proposed a method of sensor selection that reduces the sensing cost. Chapman and Kaelbling [4] proposed an algo- rithm based on recursive splitting of the state space based on statistical measures of the diﬀerences in re- inforcements received. However, they have dealt with the discrete state space, therefore, these methods can- not be directly applied to continuous state space. For the latter, roughly speaking, there are two ap- proaches for continuous state space: learning the value function with a method of function approximation or with segmentation of continuous state space. Boyan et al. [5] reported on the method for the function approximation that the combination of dy- namic programming and function approximation had shown poor performances even for benign cases. Then, they proposed Grow-Support algorithm for the func- tion approximation. However, they need the environ- mental model and can cope with only deterministic worlds. Sutton [6] used CMAC[7][8] as a method of the function approximation. CMAC has its own problem of quantization (segmentation). Also, Saito and Fuku- da [9] used CMAC to estimate the Q values. However, the sensor space was huge and they needed enormous learning time, therefore they reduced the searching s- pace by using the initial controller. As a method for state space segmentation, Kr¨ose and Dam [10]