Transfer Learning as Representation Selection Trung Nguyen-Thanh nttrung@comp.nus.edu.sg Tomi Silander silander@comp.nus.edu.sg Tze-Yun Leong leongty@comp.nus.edu.sg School of Computing, National University of Singapore Abstract An appropriate representation of the environ- ment is often key to eﬃcient problem solving. Consequently, it may be helpful for an agent to use diﬀerent representations in diﬀerent environments. In this paper, we study se- lecting and adapting multiple abstractions or representations of environments in reinforce- ment learning. We address the challenges of transfer learning in heterogeneous environ- ments with varying tasks. We present a sys- tem that, through a sequence of tasks, learns a set of world representations to be used in future tasks. We demonstrate the jumpstart and faster convergence to near optimum ef- fects of our system. We also discuss several important variants of our system and high- light assumptions under which these variants should improve the current system. 1. Introduction In Reinforcement learning (RL), an agent learns how to make sequential decisions through observing the environment. Agent behaves according to a reward- optimizing policy which suggests an action to be taken in a given state. The agent’s learned knowledge, however, is speciﬁc to a task in an environment. A small change in task or its environment may render the agent’s accumulated knowledge useless; costly re- learning from scratch is often needed. Transfer learning techniques proposed to address this shortcoming often assume that the agent uses the same state representation for all tasks. This assumption may not work well in real-life applications. For ex- ample, many environmental cues that help an agent Preliminary work submitted to an International Conference on Machine Learning (ICML) 2012 workshop. navigate through forest are simply missing when the agent tries to navigate at sea. To eﬃciently accomplish similar but varying tasks in diﬀerent environments, the agent has to learn to focus attention on the crucial fea- tures of each diﬀerent environment. In this paper we study a setting where the agent encounters many environments with diﬀerent state spaces, thus with diﬀerent goal states. The distri- bution of state features may also diﬀer between en- vironments. To achieve good performance quickly, the agent tries to select a diﬀerent simple representation for each environment. The agent, however, often does not know beforehand how eﬀective or useful the knowl- edge transfer will be. Moreover, it may only have time to learn a simple, approximate model that can be used in a new task. We propose a system that tries to transfer old knowl- edge, but at the same time evaluates new options to see if they work better. The transferable knowledge is expressed as a library of state abstractions that im- plement diﬀerent foci of attention. In diﬀerent do- mains, diﬀerent state abstractions may perform well; new combinations of features may be needed in some domains. A main contribution of this paper is to in- troduce multi-abstraction transfer, or multiple ways to see the world, that we call views. The aim is to learn to select a proper view for a new task. The rest of the paper is organized as follows. We will next introduce our system, and then discuss the re- lated work. We will then demonstrate the capabilities of our method via a set of experiments before we con- clude with some discussions and ideas for future work. 2. Method In reinforcement learning a task environment is typi- cally modeled as a Markov decision process (MDP). An MDP is deﬁned by a tuple (S,A,T,R), where S is a set of states; A is a set of actions; T : S × A → P (S|S, A) is a transition function indicating the probability of a