Abstract— This paper addresses the problem of steering a swarm of autonomous agents out of an unknown maze to some goal located at an unknown location. This is particularly the case in situations where no direct communication between the agents is possible and all information exchange between agents has to occur indirectly through information “deposited” in the environment. To address this task, an  –greedy, collaborative reinforcement learning method using only local information exchanges is introduced in this paper to balance exploitation and exploration in the unknown maze and to optimize the ability of the swarm to exit from the maze. The learning and routing algorithm given here provides a mechanism for storing data needed to represent the collaborative utility function based on the experiences of previous agents visiting a node that results in routing decisions that improve with time. Two theorems show the theoretical soundness of the proposed learning method and illustrate the importance of the stored information in improving decision-making for routing. Simulation examples show that the introduced simple rules of learning from past experience significantly improve performance over random search and search based on Ant Colony Optimization, a metaheuristic algorithm. I. INTRODUCTION This paper presents a randomized, distributed approach to steer a swarm of agents out of any type of unknown maze to a goal located at some unknown location using only locally stored information and no direct communication between the agents. This is an important problem not only for groups of autonomous robots but also for minimum overhead distributed routing and graph search problems for a wide range of applications. The approach presented here employs a collaborative reinforcement learning (RL) framework and is based on formal results underlining the soundness of the approach. The problem of robot learning to escape a maze is not new to the machine learning research community; it was originally posed many decades back by H. Abelson and A. A. diSessa in [1]. Since then, there has been a great deal of research in robots learning to navigate in and escape from a maze. In [5] an architecture for autonomous mobile agents is proposed that maps a two-dimensional environment, and provides safe paths to unexplored regions. In [6], algorithms are proposed for two heterogeneous robots searching for each M. Aurangzeb and F. L. Lewis are with the University of Texas at Arlington Research Institute (UTARI), 7300 Jack Newell Blvd. S., Fort Worth, TX 76118 USA (phone/fax: +817-272-5938; e-mail: {aurangze, lewis}@uta.edu). M. Huber is with the Department of Computer Science and Engineering, The University of Texas at Arlington, TX; email: (huber@cse.uta.edu). *This work was supported by the National Science Foundation ECCS- 1128050, the Army Research Office W91NF-05-1-0314, the Air Force Office of Scientific Research FA9550-09-1-0278, and China NNSF 61120106011. other in an unknown environment. In [7] an ultrasonic sensor localization system for autonomous mobile robot navigation in an indoor semi-structured environment is presented. To efficiently navigate mazes, various approaches for automated maze search have been implemented and several testing environments have been proposed [2], [8]. In [3] a knowledge-guided route search is proposed based on obstacle adaptive spatial cells. Similarly, a neural network based approach is used in [4] for a robot to solve a maze while avoiding concave obstacles. Besides these applications in robotics and route planning, maze exploration is also used as a standard test benchmark for artificial intelligence and machine learning techniques [9]. Along those lines, there is also some study showing that antibodies in an immune system use a mechanism of learning from their surroundings to efficiently fight antigens. This has led scientists to use machine learning for the development of artificial immune systems [10], [11], [12] which, in turn have been tested on moving robots in mazes [10]. While most of the robotics learning work on maze navigation deals with single or small groups of robots, swarm intelligence (SI) is a class of decentralized algorithms based on the cooperative behavior of a large number of agents to achieve a common goal. These algorithms are based on simple rules inspired by biological systems in nature. There are a significant number of SI algorithms proposed in literature [13], [14]. These include Ant Colony Optimization (ACO) [15], [44], Artificial Bee Colony (ABC) [16], Artificial Immune System (AIS) [17], Charged System Search (CSS) [18], Cuckoo Search (CS) [19], [20], Firefly algorithm (FA) [21], [22], Gravitational Search Algorithm (GSA) [23], Intelligent Water Drops algorithm (IWD) [24], Particle swarm optimization (PSO) [25], Multi-Swarm Optimization (MSO) [26], River Formation Dynamics (RFD) [27], Self-propelled particles (SPP) [28], and Stochastic Diffusion Search (SDS) [29], [30]. These algorithms can be applied to flocking behavior in discrete surroundings [32] and to solve mazes. E.g. in [31], ACO is deployed to unknown mazes. However, many of these algorithms have some centralized elements and rely on empirical metaheuristics. Unlike most of these algorithms, this paper presents an SI approach with rigorous mathematical base that efficiently addresses the problem of steering a swarm of agents through an unknown maze to an unknown goal using only local information exchange. The fundamental maze exploration algorithm is random search. In many cases this is the only available exploration method. Other maze exploration algorithms are generally deterministic in nature. The wall follower method [34], [35] which coarsely corresponds to a depth-first search strategy, works well with 2D perfect mazes [33], [34], [35], i.e. mazes that do not contain loops and thus form a tree when Efficient, Swarm-Based Path Finding in Unknown Graphs Using Reinforcement Learning* M. Aurangzeb, F. L. Lewis, and M. Huber 2013 10th IEEE International Conference on Control and Automation (ICCA) Hangzhou, China, June 12-14, 2013 U.S. Government work not protected by U.S. copyright 870