1 Knowledge Abstraction in Reinforcement Learning and Its Application on Robotics Differentiation Report Zhihui Luo School of Electronics, Electrical Engineering and Computer Science Queen's University Belfast zluo02@qub.ac.uk 1. Introduction Learning is the ability that one can improve its behaviour through diligent study of its own experiments [1]. When the agent observes and interacts with the environment, it gains experiences about the world. The agent’s learning happens in the processes of analyzing its experience and action generated from decision making. Based on the type of feedback available to learning, the field of modern machine learning is usually classified into three types: supervised learning [2], unsupervised learning [3] and reinforcement learning [4]. A supervised agent learns from a set of instructed input and output. When a set of information is input, the agent learns from its specified output. Unsupervised learning involves learning from the input when no specified output is provided. Reinforcement learning (RL) is the most general theory among these three types. It is learning what to do in different situations to maximize a numerical reinforcement reward. The learner does not need a provided action, but instead, by trial-and-error to find the most valuable actions. In a delayed feedback environment, the reinforcement learning method proves to be a feasible way to train an agent to perform at a high standard. To learn a task, agent needs to build a state space to represent the environment’s situation. However, in many real world applications, the state space is very large which hampered the application of reinforcement learning. The success of RL in complex problems depends heavily on proper generalization methods in order to reduce the state dimension. So recently, researchers have shown more and more interest on methods to develop high level knowledge from reinforcement learning to reduce the state space dimensions, and hence accelerate the learning process. Sutton et al introduced the options framework, which using macro-actions for high level learning [5, 6]. Konidaris and Barto introduced agent space to build portable knowledge based on the option framework [7]. Another way is to create informative reward structure based on shaping of the rewards [8, 9]. Some other researchers focus on hierarchical reinforcement learning frameworks. The MaxQ algorithm [10] is developed by Dietterich. This method decomposed the value function dividing a core MDP into a set of subtasks. Parr and Russell present another hierarchical methods using Hierarchy of Abstract Machine which is called HAM [11]. These methods show promising results. However, most of these studies require human to define and indicators to improve the agent’s performance. Our research will address the problem to develop an automatically learned high level abstraction on reinforcement learning and develop methods to improve the learning process. 2. General Research Objectives My research objective is to develop new methods to improve the learning ability of an intelligent agent, and implement the new methods on mobile robots. The general goals of this project are listed as follows: (1) To research and develop new algorithms for abstract reinforcement learning. To propose new learning methods based on the theory of reinforcement learning, or to improve the current exist reinforcement learning methods.