U. Visser et al. (Eds.): RoboCup 2007, LNAI 5001, pp. 409–416, 2008. © Springer-Verlag Berlin Heidelberg 2008 Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation Omid Aghazadeh 1 , Maziar Ahmad Sharbafi 1,2 , and Abolfazl Toroghi Haghighat 1 1 Mechatrronic Research Lab, Azad University of Qazvin, Qazvin, Iran 2 Electrical and Computer engineering Department, University of Tehran,Tehran, Iran aghazadeh@mrl.ir, m.sharbafi@ece.ut.ac.ir Abstract. Decision making in complex, multi agent and dynamic environments such as Rescue Simulation is a challenging problem in Artificial Intelligence. Uncertainty, noisy input data and stochastic behavior which is a common difficulty of real time environment makes decision making more complicated in such environments. Our approach to solve the bottleneck of dynamicity and variety of conditions in such situations is reinforcement learning. Classic reinforcement learning methods usually work with state and action value functions and temporal difference updates. Using function approximation is an alternative method to hold state and action value functions directly. Many Reinforcement learning methods in continuous action and state spaces implement function approximation and TD updates such as TD, LSTD, iLSTD, etc. A new approach to online reinforcement learning in continuous action or state spaces is presented in this paper which doesn’t work with TD updates. We have named it Parametric Reinforcement Learning. This method is utilized in Robocup Rescue Simulation / Police Force agent’s decision making process and the perfect results of this utilization have been shown in this paper. Our simulation results show that this method increases the speed of learning and simplicity of use. It has also very low memory usage and very low costing computation time. Keywords: Reinforcement Learning, Multi Agent Coordination, Decision Making. 1 Introduction Rescue simulation environment as a disaster space and a branch of RoboCup competitions, models a city after an earthquake occurrence. Its main purpose is to provide emergency decisions supported by integration of disaster information, prediction, planning, and human interface. In such a multi agent system, the coordination between heterogeneous agents is the main problem. Reinforcement Learning (RL) is one of the most powerful strategies in dynamic and time variant environments. Adaptation with changes according to the results of actions is the basic property of RL which is needed in these situations. RL-based techniques with an adaptive behavior use interactions with the system to optimize the policy used to generate the decisions.