U. Visser et al. (Eds.): RoboCup 2007, LNAI 5001, pp. 409–416, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Implementing Parametric Reinforcement Learning in
Robocup Rescue Simulation
Omid Aghazadeh
1
, Maziar Ahmad Sharbafi
1,2
, and Abolfazl Toroghi Haghighat
1
1
Mechatrronic Research Lab, Azad University of Qazvin, Qazvin, Iran
2
Electrical and Computer engineering Department, University of Tehran,Tehran, Iran
aghazadeh@mrl.ir, m.sharbafi@ece.ut.ac.ir
Abstract. Decision making in complex, multi agent and dynamic environments
such as Rescue Simulation is a challenging problem in Artificial Intelligence.
Uncertainty, noisy input data and stochastic behavior which is a common
difficulty of real time environment makes decision making more complicated in
such environments. Our approach to solve the bottleneck of dynamicity and
variety of conditions in such situations is reinforcement learning. Classic
reinforcement learning methods usually work with state and action value
functions and temporal difference updates. Using function approximation is an
alternative method to hold state and action value functions directly. Many
Reinforcement learning methods in continuous action and state spaces
implement function approximation and TD updates such as TD, LSTD, iLSTD,
etc. A new approach to online reinforcement learning in continuous action or
state spaces is presented in this paper which doesn’t work with TD updates. We
have named it Parametric Reinforcement Learning. This method is utilized in
Robocup Rescue Simulation / Police Force agent’s decision making process and
the perfect results of this utilization have been shown in this paper. Our
simulation results show that this method increases the speed of learning and
simplicity of use. It has also very low memory usage and very low costing
computation time.
Keywords: Reinforcement Learning, Multi Agent Coordination, Decision
Making.
1 Introduction
Rescue simulation environment as a disaster space and a branch of RoboCup
competitions, models a city after an earthquake occurrence. Its main purpose is to
provide emergency decisions supported by integration of disaster information,
prediction, planning, and human interface. In such a multi agent system, the
coordination between heterogeneous agents is the main problem.
Reinforcement Learning (RL) is one of the most powerful strategies in dynamic
and time variant environments. Adaptation with changes according to the results of
actions is the basic property of RL which is needed in these situations. RL-based
techniques with an adaptive behavior use interactions with the system to optimize the
policy used to generate the decisions.