Autonomous Reconstruction of State Space for Learning of Robot Behavior Takehisa Yairi, Koichi Hori and Shinichi Nakasuka Research Center for Advanced Science and Technology, University of Tokyo 4-6-1 Komaba, Meguro-ku, Tokyo JAPAN 153-8904 {yairi,hori,nakasuka}@ai.rcast.u-tokyo.ac.jp Abstract When an autonomous robot is to learn its behavior, whether an appropriate state space is available or not is a most critical issue for the ﬂexibility and eﬃciency of the learning process. What is problematic is that it is usually very diﬃcult to prepare such an ideal state space manually beforehand. In this paper, we propose a new state space “re- construction” method. With this, behavior-based robots can autonomously “rebuild” their state spaces after they accumulate behavior experience using initial state spaces. This reconstruction approach is more advanta- geous than the conventional state space construction methods or incremental state partitioning methods in that it achieves both the eﬃciency in the learning pro- cess and the optimality of the resultant behavior per- formance. 1 Introduction In recent years, capability of autonomous behavior learning has been recognized as an essential element for realizing highly intelligent robots. Actually, a va- riety of machine learning methods have been applied to a number of intelligent robot systems so far. One of the most important and general themes in this robot learning research is “what kind of knowledge should be given manually beforehand, and what should be learned by the robots themselves ?”, as training data for learning is usually very expensive in the real envi- ronment. If a large part of behavior knowledge of a robot is given as built-in knowledge and only a few pa- rameters are left to be learned by the robot, the ben- eﬁt of learning itself(i.e., optimality of the acquired behavior, adaptability to the environment, etc.) will be quite limited though the learning cost will be low. On the other hand, if no a priori knowledge is given and the whole behavior knowledge is to be learned by the robot itself, an unrealistically amount of training data will be required even for simple tasks, while it may achieve better result in the long run than the previous case. A typical and signiﬁcant example of this trade-oﬀ relationship in behavior learning can be found in the issue of state space deﬁnition. As a behavior learn- ing method of reaction-based robots, reinforcement learning methods such as Q-learning have been widely used. Their purpose is to obtain favorable mappings between the discrete sets of states and actions (or state space and action space) based on the experience of reward acquisition. One signiﬁcant problem about this approach is that most of the conventional behav- ior learning systems assume that the state spaces are pre-deﬁned manually. This approach is considered to have a certain limitation, because manual deﬁnition of proper state space becomes drastically diﬃcult, when the task of the robot becomes more complicated and the number of sensors increases. For this reason, several researchers have recently proposed methods of autonomous state space construc- tion or sensor input generalization. Though these approaches have partly overcome the drawbacks of manually predeﬁned state spaces, a new problem has arisen that those methods generally require a tremen- dous amount of training data or behavior experiences, because they attempt to construct state spaces from scratch. In other words, they do not use any a priori knowledge about the sensor space. From this background, this paper proposes a new state space “reconstruction”method for behavior-based robots. With this method, a robot can re-build its state space based on the similarity of behavior out- comes rather than constructing it completely from scratch, after it collects a certain amount of behavior experi- ence eﬀectively using the manually predeﬁned state space. As a result, an optimal state space for each individual task, sensors and environment can be ob-