Abstract— In the last few years, the field of mobile robotics
has made lots of advancements. These advancements are due to
the extensive application of mobile robots for autonomous
exploration. Mobile robots are being popularly used for
applications in space, underwater explorations, underground
coal mines monitoring, inspection in chemical/toxic/ nuclear
factories etc. But if these environments are
unknown/unpredictable, conventional/ classical robotics may
not serve the purpose. In such cases robot learning is the best
option. Learning from the past experiences, is one such way for
real time application of robots for completely unknown
environments. Reinforcement learning is one of the best
learning methods for robots using a constant system-
environment interaction. Both single and multi-agent concepts
are available for implementation of learning. The current
research work describes a multi-agent based reinforcement
learning using the concept of behaviour-based robotics for
autonomous exploration of mobile robots. The concept has also
been tested both in indoor and outdoor environments using
real-time robots.
I. INTRODUCTION
ECENTLY the field of robotics, especially the mobile
robotics has been identified as on of the most important
areas of research due to its huge potential for
autonomous explorations in different hazardous or toxic and
unapproachable domains. These exploration domains extend
from underwater exploration to factory automation, polar to
planetary exploration, landmine detection to unknown
environment mapping. But for such explorations, the use of
a mobile robot with classical control is possible if and only if
the programmer or user has the prior knowledge about the
environment. It is completely impossible to develop a
mobile robot for explorations without knowing the
environment beforehand. For such cases, the concept of
learning from past experiences may provide a better strategy
for explorations. The system will learn constantly from the
interactions with the environment and modify the strategy of
exploration accordingly. The most suitable learning in this
direction is the Reinforcement learning, especially the Q-
Manuscript received August 08, 2011. This work is partly supported by
CSIR, India through Eleventh Five Year Plan (2007 – 12), under the Supra
Institutional Project head (SIP 24).
D. N. Ray is with Surface Robotics Laboratory, Central Mechanical
Engineering Research Institute (CSIR), Durgapur – 713209, India (Phone:
0091-343-6452039; fax: 0091-343-2546745; e-mail: dnray@cmeri.res.in)
A. Mandal is project assistant at Surface Robotics Lab, CMERI,
Durgapur for the last one year. (e-mail: amit.dgp12@gmail.com)
S. Majumder was with University of Sydney and now with Surface
Robotics Laboratory, CMERI, Durgapur (e-mail: sjm@cmeri.res.in).
S. Mukhopadhyay is with Department of Mechanical Engineering,
National Institute of Technology, Durgapur – 713209, India (e-mail:
msumitnit@yahoo.co.in).
learning which uses delayed rewards [1]. The current
research work proposes a new approach of autonomous
exploration using multiagent Q-learning using behaviour-
based robotics. This paper is organised as follows: after this
introduction, related works and a few insights have been
described. Then there are proposed methodology and
experimental results and discussions followed by a
conclusion.
From the detailed literature survey it can be concluded
that three types of works are reported in literature, for both
single-agent and multiagent reinforcement/ Q-learning.
1) First type of papers [2, 3, 4, 5] is basically review type
and discuss about the work done so far in the field. They do
not propose any theory or describe any experiment.
2) Second type of papers [6, 7, 8, 9, 10, 11, 12] is
theoretical based and the proposed methodologies/
algorithms or any modifications of existing algorithms have
been established by simulation. Further more such types of
papers can be categorized into (a) purely analytical [6, 7, 8,
11, 12] and (b) simulation based robotics [9, 10].
3) Third type of papers [13, 14, 15, 4] are experimental
based i.e. the papers discuss the use of real robots in
indoor/simulated environments, although work is very
limited.
Literature survey also reveals that till date no work has
been carried out using robot for outdoor explorations using
single agent/ multiagent reinforcement learning. The current
work has tried to address this issue of autonomous outdoor
exploration using multiagent Q-learning based on behavior-
based robotics.
II. A FEW INSIGHTS
This current work is related mainly with behavior-based
robotics, reinforcement learning (especially Q-learning),
multiagent system. The following paragraphs will provide a
brief idea about the above topics in nutshell.
A. Behaviour-based Robotics
The existing conventional/classical robotics has some
control mechanism, which guides the end effectors to act
accordingly, after it analyzes the inputs, obtained from
various sensors and sends responses to those end effectors.
But if the end effecters are directly coupled to those sensors
and there is an intelligent agent to control the system
individually, then it will be able to take decision itself. So,
this is one kind of intelligence, often looked for in robots.
This behavior is often called ‘Reactive’ in nature like the
closing of eyes due to intense light in human beings.
Human-like Gradual Multi-agent Q-learning using the concept of
Behavior-based Robotics for Autonomous Exploration
Dip N Ray, Member, IEEE, Amit Mandal, Somajyoti Majumder, Sumit Mukhopadhyay
R
978-1-4577-2138-0/11/$26.00 © 2011 IEEE
2725
Proceedings of the 2011 IEEE
International Conference on Robotics and Biomimetics
December 7-11, 2011, Phuket, Thailand