Transfer Learning as Representation Selection Trung Nguyen-Thanh nttrung@comp.nus.edu.sg Tomi Silander silander@comp.nus.edu.sg Tze-Yun Leong leongty@comp.nus.edu.sg School of Computing, National University of Singapore Abstract An appropriate representation of the environ- ment is often key to efficient problem solving. Consequently, it may be helpful for an agent to use different representations in different environments. In this paper, we study se- lecting and adapting multiple abstractions or representations of environments in reinforce- ment learning. We address the challenges of transfer learning in heterogeneous environ- ments with varying tasks. We present a sys- tem that, through a sequence of tasks, learns a set of world representations to be used in future tasks. We demonstrate the jumpstart and faster convergence to near optimum ef- fects of our system. We also discuss several important variants of our system and high- light assumptions under which these variants should improve the current system. 1. Introduction In Reinforcement learning (RL), an agent learns how to make sequential decisions through observing the environment. Agent behaves according to a reward- optimizing policy which suggests an action to be taken in a given state. The agent’s learned knowledge, however, is specific to a task in an environment. A small change in task or its environment may render the agent’s accumulated knowledge useless; costly re- learning from scratch is often needed. Transfer learning techniques proposed to address this shortcoming often assume that the agent uses the same state representation for all tasks. This assumption may not work well in real-life applications. For ex- ample, many environmental cues that help an agent Preliminary work submitted to an International Conference on Machine Learning (ICML) 2012 workshop. navigate through forest are simply missing when the agent tries to navigate at sea. To efficiently accomplish similar but varying tasks in different environments, the agent has to learn to focus attention on the crucial fea- tures of each different environment. In this paper we study a setting where the agent encounters many environments with different state spaces, thus with different goal states. The distri- bution of state features may also differ between en- vironments. To achieve good performance quickly, the agent tries to select a different simple representation for each environment. The agent, however, often does not know beforehand how effective or useful the knowl- edge transfer will be. Moreover, it may only have time to learn a simple, approximate model that can be used in a new task. We propose a system that tries to transfer old knowl- edge, but at the same time evaluates new options to see if they work better. The transferable knowledge is expressed as a library of state abstractions that im- plement different foci of attention. In different do- mains, different state abstractions may perform well; new combinations of features may be needed in some domains. A main contribution of this paper is to in- troduce multi-abstraction transfer, or multiple ways to see the world, that we call views. The aim is to learn to select a proper view for a new task. The rest of the paper is organized as follows. We will next introduce our system, and then discuss the re- lated work. We will then demonstrate the capabilities of our method via a set of experiments before we con- clude with some discussions and ideas for future work. 2. Method In reinforcement learning a task environment is typi- cally modeled as a Markov decision process (MDP). An MDP is defined by a tuple (S,A,T,R), where S is a set of states; A is a set of actions; T : S × A P (S|S, A) is a transition function indicating the probability of a