Fuzzy Rule Based Neuro-Dynamic Programming for Mobile Robot Skill Acquisition on the basis of a Nested Multi-Agent Architecture John N. Karigiannis, Theodoros I. Rekatsinas and Costas S. Tzafestas Abstract— Biologically inspired architectures that mimic the organizational structure of living organisms and in general frameworks that will improve the design of intelligent robots attract significant attention from the research community. Self- organization problems, intrinsic behaviors as well as effective learning and skill transfer processes in the context of robotic systems have been significantly investigated by researchers. Our work presents a new framework of developmental skill learning process by introducing a hierarchical nested multi- agent architecture. A neuro-dynamic learning mechanism em- ploying function approximators in a fuzzified state-space is utilized, leading to a collaborative control scheme among the distributed agents engaged in a continuous space, which enables the multi-agent system to learn, over a period of time, how to perform sequences of continuous actions in a cooperative manner without any prior task model. The agents comprising the system manage to gain experience over the task that they collaboratively perform by continuously exploring and exploit- ing their state-to-action mapping space. For the specific problem setting, the proposed theoretical framework is employed in the case of two simulated e-Puck robots performing a collaborative box-pushing task. This task involves active cooperation between the robots in order to jointly push an object on a plane to a specified goal location. We should note that 1) there are no contact points specified for the two e-Pucks and 2) the shape of the object is indifferent. The actuated wheels of the mobile robots are considered as the independent agents that have to build up cooperative skills over time, in order for the robot to demonstrate intelligent behavior. Our goal in this experimental study is to evaluate both the proposed hierarchical multi-agent architecture, as well as the methodological control framework. Such a hierarchical multi-agent approach is envisioned to be highly scalable for the control of complex biologically inspired robot locomotion systems. keywords: Developmental Robotics, Multi-Agent Architec- tures, Neuro-Dynamic Learning I. I NTRODUCTION Finding new methods for designing and controlling robotic systems, inspired by biological mechanisms, processes and principles in general, is attracting significant attention from the research community. The reason we are fervent support- ers of this attempt is that robotic systems designed according to these principles will be able to evolve skills and in general demonstrate learning abilities without having a detailed task John N. Karigiannis, Ph.D. Candidate at School of Electrical & Computer Engineering, Division of Signals, Control & Robotics, National Technical University of Athens, Zographou, Athens, Greece, john@fhw.gr Theodoros I. Rekatsinas, Ph.D Candidate at School of Computer Science, University of Maryland, College Park, MD 20742, USA, thodrek@umd.edu Costas S. Tzafestas, Assistant Professor at School of Electrical & Computer Engineering, Division of Signals, Control & Robotics, National Technical University of Athens, Zographou Campus, Athens, Greece, ktzaf@softlab.ntua.gr model description as a requirement for their proper operation. Hence, the new scientific field situated in the intersection of robotics and developmental sciences (i.e. cognitive psychol- ogy, neuroscience) named Developmental Robotics, tries to address these problems. The goal of developmental robotics can been defined as: a) employing robots to instantiate and investigate models originating from developmental science, and b) an attempt that seeks to design better robotic systems by applying insights gained from studies on ontogenetic development. Furthermore, developmental robotics motivates the usage of robots as a novel research tool to model and study the development of cognition and action. Ontogenetic development has many facets. For instance, it can be defined as a self-organizing, incremental process, but it can also be seen as comprising self exploratory activities, and in many occasions cooperative activities. Thus, in order to understand better all these different facets of developmental learning, several research groups have been addressing their work to cognitive multi-agent robotic system. A complete survey can be found in [27]. Having said that, we should note that understanding human cooperative behavior has been a major concern in multi-agent robotic systems, and has been addressed by work done on mobile robots [17], robotic hands, and multiple manipulators [18], [19], [20]. In [23], manipulation protocols have been developed for a team of mobile robots that collaborate in order to push large boxes. In [22], an algorithmic structure coordinates the reorientation of objects in a plane by inde- pendent robot-agents. In [21], a study is presented where distributed cooperation strategies are required by a group of behavior-based mobile robots for handling an object. The common approach in all these works relies on the assumption that the motion of the object under pushing/manipulation is quasi-static, and that all the agents involved have predefined behavior models that they combine by employing certain architecture (like subsumption architecture [24]). Human behavior also demonstrates evolutionary charac- teristics and self-organizing abilities. These unique attributes of human behavior have been extensively studied in the process of designing intelligent robots that need to oper- ate/collaborate autonomously and adapt to their environment. In this context, the application and use of bio-inspired techniques, such as reinforcement learning using function approximators, evolutionary computation and fuzzy systems constitutes an emergent research topic. More specifically, Neuro-Dynamic programming [2], commonly known as Re- inforcement Learning (RL) [1], [2], [3] is an active area of machine learning research that is also receiving attention 978-1-4244-9317-3/10/$26.00 © 2010 IEEE 312 Proceedings of the 2010 IEEE International Conference on Robotics and Biomimetics December 14-18, 2010, Tianjin, China