Multiple Model-based Reinforcement Learning Kenji Doya ∗1235 , Kazuyuki Samejima 1,3 , Ken-ichi Katagiri 45 , and Mitsuo Kawato 135 November 2, 2001 1 Human Information Science Laboratories, ATR International 2-2-2 Hikaridai, Seika, Soraku, Kyoto 619-0288, Japan Phone: +81-774-95-1251 Fax: +81-774-95-1259 Email: doya@atr.co.jp 2 CREST, Japan Science and Technology Corporation 3 Kawato Dynamic Brain Project, ERATO Japan Science and Technology Corporation 4 ATR Human Information Processing Research Laboratories 5 Nara Institute of Science and Technology Abstract We propose a modular reinforcement learning architecture for non-linear, non- stationary control tasks, which we call multiple model-based reinforcement learn- ing (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The 1