Learning of Action Patterns and Reactive Behavior Plans via a Novel Two-Layered Ethology-Based Action Selection Mechanism Il Hong Suh, Sanghoon Lee Graduate School of Information and Communications Hanyang University, Seoul, Korea ihsuh@hanyang.ac.kr, shlee@incorl.hanyang.ac.kr Woo Young Kwon, and Young-Jo. Cho Electronics and Telecommunications Research Institute Daejun, Korea {wykwon,youngjo}@etri.re.kr Abstract— The two most important abilities for a robot to survive in a given environment are selecting and learning the most appropriate actions in a given situation. Historically, they have also been the biggest problems in robotics. To help solve this problem, we propose a two-layered action selection mechanism (ASM) which designates an action pattern layer and a reactive behavior plan layer. In the reactive behavior plan layer, a task is selected by comparing behavior motiva- tion values that, in an animal, correspond to external stimuli as well as internal states due to hormones. After a task is selected, its corresponding reactive behavior plan is executed as a set of sequential dynamic behavior motivations (DBMs), each of which is associated with an action pattern. In the action pattern layer, each action pattern can be functionally decomposed into primitive motor actions. Shortest Path-based Q-Learning (SPQL) is incorporated into both the reactive behavior plan and action pattern layers. In the reactive behavior plan layer, relationships between perceptions and action patterns are learned to satisfy a given motivation, as well as the relative priorities of these relationships. In the action pattern layer, the relations between sensory states and primitive motor actions can be learned. To establish the validity of our proposed ASM, experiments with our real designed robot will be illustrated together with simulations. Index Terms— Action Selection Mechanism, Reinforcement Learning, Ethology I. I NTRODUCTION An autonomous robot has to select the most appropriate action based on both its internal state and its external envi- ronment. This is why robots require sensors to understand a given situation as well as effectors to interact with its envi- ronment to accomplish its given task or mission. Moreover, the robot must also include an action selection mechanism or behavior coordination mechanism to decide on the proper sequence of actions [1], [2]. Brooks suggested a new architecture called ’Behavior-based Architecture’, which is composed of independently-operating competence modules that have their own sensors and effectors [3]. When two or more competence modules work simultaneously, modules with high priority inhibit modules with low priority. The priority of these competence modules are ﬁxed and pre- designed in the subsumption architecture. This makes it difﬁcult to design robots to complete missions in complex environment. To make early stage of behavior-based AI architecture more useful for a robot in a real environment, several researchers have proposed ethologically-inspired Action Selection Mechanisms(ASMs) for action selection [1] [4] [5]. Those models have showed better results in that they better imitate real-life behavior. However, they were mostly performed by simulated robots in simulated environments, not by real robots in real environments. Many ethology-based models are primarily focused on the cause or motivation for a particular behavior, because the actual action of an animal is much more complex to analyze and imitate. Thus, in those models, the abstraction level of primitive behavior is artiﬁcially high. However, designing primitive behaviors in real robots with high abstraction levels is extremely difﬁcult. On the other hand, in both behavior-based AI approach and ethology-based approach, planning sequential behav- iors is quite difﬁcult. In the behavior-based approach, it is extremely difﬁcult to see how these plans could be expressed. In other words, in the behavior-based approach, it is very hard to express plans as we know them. In the ethology-based model studies, planning sequential behav- iors was not even considered. Bryson considered planning sequences of behaviors to be ”reactive plans,” a formal- ized expression which is often used in behavior-based appoaches [6]. The reactive plan is a more complicated plan structure used for circumstances in which the exact ordering of steps cannot be predetermined, and consists of three elements: priorities, preconditions and actions. Most of the approaches presented above were tested in simple environments, where the world was assumed to be completely understood. However, it is hard to give a completely understanding of real environments to a robot. Moreover, if the environment changes, formerly ”complete” knowledge of the environment may become completely invalid. Therefore, a robot must have the ability to adapt to its environment; to do this, a computational architecture is required to process and store information about the environ- ment. Reinforcement learning is a learning technique used widely in autonomous robots to select which action is the most appropriate in a particular situation [7] [8]. However, such reinforcement learning techniques may present some difﬁculties when they are applied to reality. For instance, rewards may not be given immediately after a behavior has been completed. Delayed rewards makes learning very time-consuming, even to the point of impracticality. To