Aggregate Reinforcement Learning for multi-agent territory division: The Hide-and-Seek game Mohamed K. Gunady a,n , Walid Gomaa a,1 , Ikuo Takeuchi a,b a Egypt–Japan University of Science and Technology, New Borg El-Arab, Alexandria, Egypt b Faculty of Science and Engineering, Waseda University, Tokyo, Japan article info Article history: Received 25 May 2013 Received in revised form 21 March 2014 Accepted 15 May 2014 Keywords: Multi-agent systems Reinforcement Learning Hierarchical learning Q-learning State aggregation Hide-and-Seek abstract In many applications in Robotics such as disaster rescuing, mine detection, robotic surveillance and warehouse systems, it is crucial to build multi-agent systems (MAS) in which agents cooperate to complete a sequence of tasks. For better performance in such systems, e.g. minimizing duplicate work, agents need to agree on how to divide and plan that sequence of tasks among themselves. This paper targets the problem of territory division in the children’s game of Hide-and-Seek as a test-bed for our proposed approach. The problem is solved in a hierarchical learning scheme using Reinforcement Learning (RL). Based on Q-learning, our learning model is presented in detail; definition of composite states, actions, and reward function to deal with multiple agent learning. In addition, a revised version of the standard updating rule of the Q-learning is proposed to cope with multiple seekers. The model is examined on a set of different maps, on which it converges to the optimal solutions. After the complexity analysis of the algorithm, we enhanced it by using state aggregation (SA) to alleviate the state space explosion. Two levels of aggregation are devised: topological and hiding aggregation. After elaborating how the learning model is modified to handle the aggregation technique, the enhanced model is examined by some experiments. Results indicate promising performance with higher convergence rate and up to 10 space reduction. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction Most of the real life applications of multi-agent systems and machine learning consist of highly complicated tasks. One learning paradigm for handling such complexity is to learn simpler tasks first then increase the task complexity in a hierarchical manner till reaching the desired complexity level, similar to the learning paradigm followed by children. Various practical planning pro- blems can be reduced to Hide-and-Seek games in which children learn the basic concepts of path planning, map building, naviga- tion, and team cooperation. Robotics applications in various domains such as disaster rescuing, criminal pursuit, and mine detection/demining are typical examples. One important aspect when dealing with cooperative seekers (in Hide-and-Seek games) is how to divide the search environment to achieve an optimal seeking performance. This can be viewed as a territory division problem. The game of Hide-and-Seek is a simple test-bed for such applications. We adopt this simple test-bed to tackle the task of territory division that frequently appears in multi-robot systems like in security and rescuing robots. Such class of problems that have multiple agents searching for a target resides in an unknown place, but have a likelihood of their possible location. Consider a simple form of the Hide-and-Seek game in which there are multiple seekers and a single hider. Before the game starts, the hider chooses a hiding place based on a pre-determined hiding probability distribution over the environment. Afterwards, it is the task of the seekers to search for the hider and find his hiding place in the minimum amount of time. A game terminates as soon as one seeker finds the hider by reaching his location. Seekers should not just scan the whole environment for the hider; otherwise they suffer from a poor seeking performance. In contrast, they should consider only the possible hiding places. A possible hiding place is a location in the environment with a non-zero probability of being used by the hider as a hiding place. Given the hiding probability distribution, the seekers should do an optimized scan that mini- mizes the expected time of finding the hider. For a game with multiple seekers, a good seeking strategy is to cooperate to apply the concept of dividing the seekers’ territories in order to increase the seeking efficiency and thus decrease the game duration. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/engappai Engineering Applications of Artificial Intelligence http://dx.doi.org/10.1016/j.engappai.2014.05.012 0952-1976/& 2014 Elsevier Ltd. All rights reserved. n Corresponding author. E-mail addresses: mgunady@cs.umd.edu (M.K. Gunady), walid.gomaa@ejust.edu.eg (W. Gomaa), nue@nue.org (I. Takeuchi). 1 Currently on leave from the Faculty of Engineering, Alexandria University, Egypt. Engineering Applications of Artificial Intelligence 34 (2014) 122–136