International Journal of Computing and Network Technology ISSN 2210-1519 Int. J. Com. Net. Tech. 3, No. 3 (Sept. 2015) E-mail address: malemran@buc.edu.om http://journals.uob.edu.bh Speeding Up the Learning in A Robot Simulator Mostafa Al-Emran 1 1 Al Buraimi University College, Al Buraimi, Oman Received: 10 May 2015, Revised: 25 July 2015, Accepted: 10 August 2015, Published: 1 (September) 2015 Abstract: Q-learning is a one of the well-known Reinforcement Learning algorithms that has been widely used in various problems. The main contribution of this work is how to speed up the learning in a single agent environment (e.g. the robot). In this work, an attempt to optimize the traditional Q-learning algorithm has been done via using the Repeated Update Q-learning (RUQL) algorithm (the recent state-of-the-art) in a robot simulator. The robot simulator should learn how to move from one state into another in order to reach the end of screen as faster as possible. An experiment has been conducted in order test the effectiveness of the RUQL algorithm versus the traditional Q-learning algorithm by comparing both algorithms through using similar parameters’ values for several trials. Experiment results revealed that the RUQL algorithm has outperforms the traditional Q-learning algorithm in all the trials. Keywords: Robot, Simulator, Q-Learning. 1. INTRODUCTION Q-learning has proved its effectiveness as one of the Reinforcement Learning algorithms as to be used in a wide range of problems. Recently, researchers try to optimize the traditional Q-learning performance such as: Q-learning Influence Map (QIM) [1], Transfer Learning (TL) [2], Frequency Adjusted Q-learning (FAQL) [3] and the Repeated Update Q-learning (RUQL) [4]. In this paper, RUQL has been used to optimize the traditional Q-learning performance in a robot simulator. The robot simulator has been programmed in Java. The robot should learn how to move from one state into another in order to reach the end of the screen as faster as possible. An experiment has been conducted in order to compare the RUQL algorithm and the traditional Q- learning algorithm by trying similar parameters’ values for both algorithms for several trials. The other paper sections are organized as follows: section 2 will provide a background about Q-learning; section 3 will demonstrate the related research papers in the field; section 4 will describe the methodology and techniques used; evaluation and analysis will take place under section 5; discussion of results will be demonstrated in section 6; conclusion and future work will be presented in section 7. 2. Q-LEARNING Q-learning is one of the Reinforcement Learning algorithms [1], [4], [6] that has been widely used in various domains such as: simple toys, face recognition and games [5]. The Q-learning update equation is described as follows: ሺ ሻ ሺ ሻ ሺ ሺ ሻ ሺ ሻሻ Q-learning attempts to find an optimal action policy by calculating the function Q(s,a) where s represents the state from the possible set of statesS, and a represents the action from the possible set of actions A. The parameters: α represents the learning rate and γ represents the discount factor. 3. RELATED WORK Different research papers have been worked on different optimization techniques in order to enhance the traditional Q-learning performance. Cho [1] stated that the larger number of environment’s states where the agent interacts with, can cause the Q-learning to take more time to learn these states as much as the number of states become larger. Q-learning technique using an influence map (QIM) has been proposed in order to reduce the amount of time required for learning. Celiberto [2] utilizes the Transfer Learning (TL) among agents that permits the use of cases as heuristics in order to speed up the regular Q-learning. Q-learning shows artifacts in non-stationary environments (e.g. the probability of playing the best action might be reduced if the Q-values diverge considerably from the true values; this could occur in the initial phase along with the changes in the environment. Kaisers [3] had resolved the