International Journal of Computer Applications (0975 – 8887) Volume 38– No.4, January 2012 25 Accelerated Method based on Reinforcement Learning and Case Base Reasoning in Multi agent Systems Sara Esfandiari Department of Computer Engineering, Islamic Azad University, Qazvin Branch, Qazvin, Iran. Behrooz Masoumi Department of Computer Engineering, Islamic Azad University, Qazvin Branch, Qazvin, Iran. Abdolkarim Niazi Department of Manufacturing and Industrial Engineering, Faculty of Mechanical Engineering, Universiti Teknologi Malaysia. Mohammad Reza Meybodi Departments of Computer Engineering, Amirkabir Industrial University, Tehran, Iran. ABSTRACT In this paper, a new algorithm based on case base reasoning and reinforcement learning is proposed to increase the rate convergence of the reinforcement learning algorithms in multi- agent systems. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function has been proposed to select the action, which has led to an increase in algorithms based on Q-learning. The algorithm mentioned has been used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi- agent systems. The results of experiments have shown that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy. General Terms Multi Agent Learning , Machine Learning . Keywords Reinforcement Learning, Case Base Reasoning, Multi agent Systems, Cooperative Markov Games, Machine Learning. 1. INTRODUCTION Case Based Reasoning (CBR) is a knowledge based problem solving technique, which is based on reusing on the previous experiences and has been originated from the researches of cognitive sciences [1]. In this method, it is assumed that the similar problems can possess similar solutions. Therefore, the new problems may be solvable using the experienced solutions to the previous similar problems. A multi-agent system (MAS) is comprised of a collection of autonomous and intelligent agents that interact with each other in an environment to optimize a performance measure [2]. Multi-agent systems are applied in a wide variety of domains including robotic teams, distributed control, resource management, collaborative decision support systems, data mining, and are useful in the modeling, analysis and design of systems where control is distributed among several autonomous decision makers. In multi-agent system research, two main perspectives are found in the literature; the cooperative and non-cooperative perspective. In cooperative MASs, the agents pursue a common goal and the agents can be built expect benevolent intentions from other agents. In contrast, a non-cooperative MAS setting has non- aligned goals, and individual agents try to obtain only to maximize their own profits. In multi-agent systems, the need for learning and adaption is essentially caused by the fact that the environment of an agent is dynamic and just empirically observable while the environment (the reward functions and the transition states) is unknown. Noting that the agents in multi- agent systems face shortage or lack of information about the environment and there is not a comprehensive knowledge of environment (the reward functions and the transition states) and the environment is also usually unknown, using reinforcement learning algorithms is very important [22]. Hence, the reinforcement learning methods may be applied in MAS to find an optimal policy in MGs. In addition, agents in a multi-agent system face the problem of incomplete information with respect to the action choice. If agents get information about their own choice of action as well as that of the others, then we have joint action learning [3], [4]. Joint action learners are able to maintain models of the strategy of others, and the explicitly takes into account the effects of joint actions. In contrast, independent agents only know their own action which is often a more realistic assumption since distributed multi- agent applications are typically subject to limitations such as partial observability, communication costs, and stochastic. There are several models proposed in the literature for multi agent systems MASs based on Markov models. One of these models is the Markov Game (also called Markov Games-MG). which is the Markov games are extensions of Markov Decision Process (MDP) to multiple agents. In an MG, actions are the result of joint action selection of all agents, while rewards and the state transitions depend on these joint actions. In a fully cooperative MG called a multi-agent MDP (or MMDP), all agents share the same reward function and they should learn to agree on the same optimal policy [5]. There are several methods for finding an optimal policy in MMDPs. In [6], an algorithm is proposed for learning cooperative MMDPs, but it is only suitable for deterministic environments. In [7], another view on Markov Games is taken, i.e. The game can be seen as a sequence of normal form games. The algorithm called as Nash-Q is proposed which under restrictive conditions converges to Nash equilibrium policy. In [8] MMDPs are approximated as a sequence of intermediate