Abstract— In the last few years, the field of mobile robotics has made lots of advancements. These advancements are due to the extensive application of mobile robots for autonomous exploration. Mobile robots are being popularly used for applications in space, underwater explorations, underground coal mines monitoring, inspection in chemical/toxic/ nuclear factories etc. But if these environments are unknown/unpredictable, conventional/ classical robotics may not serve the purpose. In such cases robot learning is the best option. Learning from the past experiences, is one such way for real time application of robots for completely unknown environments. Reinforcement learning is one of the best learning methods for robots using a constant system- environment interaction. Both single and multi-agent concepts are available for implementation of learning. The current research work describes a multi-agent based reinforcement learning using the concept of behaviour-based robotics for autonomous exploration of mobile robots. The concept has also been tested both in indoor and outdoor environments using real-time robots. I. INTRODUCTION ECENTLY the field of robotics, especially the mobile robotics has been identified as on of the most important areas of research due to its huge potential for autonomous explorations in different hazardous or toxic and unapproachable domains. These exploration domains extend from underwater exploration to factory automation, polar to planetary exploration, landmine detection to unknown environment mapping. But for such explorations, the use of a mobile robot with classical control is possible if and only if the programmer or user has the prior knowledge about the environment. It is completely impossible to develop a mobile robot for explorations without knowing the environment beforehand. For such cases, the concept of learning from past experiences may provide a better strategy for explorations. The system will learn constantly from the interactions with the environment and modify the strategy of exploration accordingly. The most suitable learning in this direction is the Reinforcement learning, especially the Q- Manuscript received August 08, 2011. This work is partly supported by CSIR, India through Eleventh Five Year Plan (2007 – 12), under the Supra Institutional Project head (SIP 24). D. N. Ray is with Surface Robotics Laboratory, Central Mechanical Engineering Research Institute (CSIR), Durgapur – 713209, India (Phone: 0091-343-6452039; fax: 0091-343-2546745; e-mail: dnray@cmeri.res.in) A. Mandal is project assistant at Surface Robotics Lab, CMERI, Durgapur for the last one year. (e-mail: amit.dgp12@gmail.com) S. Majumder was with University of Sydney and now with Surface Robotics Laboratory, CMERI, Durgapur (e-mail: sjm@cmeri.res.in). S. Mukhopadhyay is with Department of Mechanical Engineering, National Institute of Technology, Durgapur – 713209, India (e-mail: msumitnit@yahoo.co.in). learning which uses delayed rewards [1]. The current research work proposes a new approach of autonomous exploration using multiagent Q-learning using behaviour- based robotics. This paper is organised as follows: after this introduction, related works and a few insights have been described. Then there are proposed methodology and experimental results and discussions followed by a conclusion. From the detailed literature survey it can be concluded that three types of works are reported in literature, for both single-agent and multiagent reinforcement/ Q-learning. 1) First type of papers [2, 3, 4, 5] is basically review type and discuss about the work done so far in the field. They do not propose any theory or describe any experiment. 2) Second type of papers [6, 7, 8, 9, 10, 11, 12] is theoretical based and the proposed methodologies/ algorithms or any modifications of existing algorithms have been established by simulation. Further more such types of papers can be categorized into (a) purely analytical [6, 7, 8, 11, 12] and (b) simulation based robotics [9, 10]. 3) Third type of papers [13, 14, 15, 4] are experimental based i.e. the papers discuss the use of real robots in indoor/simulated environments, although work is very limited. Literature survey also reveals that till date no work has been carried out using robot for outdoor explorations using single agent/ multiagent reinforcement learning. The current work has tried to address this issue of autonomous outdoor exploration using multiagent Q-learning based on behavior- based robotics. II. A FEW INSIGHTS This current work is related mainly with behavior-based robotics, reinforcement learning (especially Q-learning), multiagent system. The following paragraphs will provide a brief idea about the above topics in nutshell. A. Behaviour-based Robotics The existing conventional/classical robotics has some control mechanism, which guides the end effectors to act accordingly, after it analyzes the inputs, obtained from various sensors and sends responses to those end effectors. But if the end effecters are directly coupled to those sensors and there is an intelligent agent to control the system individually, then it will be able to take decision itself. So, this is one kind of intelligence, often looked for in robots. This behavior is often called ‘Reactive’ in nature like the closing of eyes due to intense light in human beings. Human-like Gradual Multi-agent Q-learning using the concept of Behavior-based Robotics for Autonomous Exploration Dip N Ray, Member, IEEE, Amit Mandal, Somajyoti Majumder, Sumit Mukhopadhyay R 978-1-4577-2138-0/11/$26.00 © 2011 IEEE 2725 Proceedings of the 2011 IEEE International Conference on Robotics and Biomimetics December 7-11, 2011, Phuket, Thailand