Vol.:(0123456789) 1 3 Journal of Intelligent Manufacturing https://doi.org/10.1007/s10845-020-01629-3 Acquiring reusable skills in intrinsically motivated reinforcement learning Marzieh Davoodabadi Farahani 1  · Nasser Mozayani 1 Received: 2 December 2019 / Accepted: 12 July 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020 Abstract This paper proposes a novel incremental model for acquiring skills and using them in Intrinsically Motivated Reinforce- ment Learning (IMRL). In this model, the learning process is divided into two phases. In the first phase, the agent explores the environment and acquires task-independent skills by using different intrinsic motivation mechanisms. We present two intrinsic motivation factors for acquiring skills by detecting states that can lead to other states (being a cause) and by detect- ing states that help the agent to transition to a different region (discounted relative novelty). In the second phase, the agent evaluates the acquired skills to find suitable ones for accomplishing a specific task. Despite the importance of assessing task-independent skills to perform a task, the idea of evaluating skills and pruning them has not been considered in IMRL literature. In this article, two methods are presented for evaluating previously learned skills based on the value function of the assigned task. Using such a two-phase learning model and the skill evaluation capability helps the agent to acquire task- independent skills that can be transferred to other similar tasks. Experimental results in four domains show that the proposed method significantly increases learning speed. Keywords Hierarchical reinforcement learning · Skill · Option · Intrinsic motivation · Skill evaluation Introduction Reinforcement learning (RL) is a field of machine learn- ing that deals with the problem of how agents should take suitable actions in an environment to maximize cumula- tive reward. It has been applied successfully in different problems such as robotics, finance, and manufacturing (Li 2019). For example, in Chen et al. (2015), RL is applied for scheduling a multiple-load carrier to deliver parts to line- side buffers of a general assembly line; or in Aissani et al. (2012), a reactive multi-agent model is proposed for adaptive scheduling in multi-site companies based on RL. Many classical RL methods are not able to handle high- dimensional problems in a reasonable time. Hierarchical reinforcement learning algorithms apply skills to divide a task to set of subtasks and tackle this problem. Although much work has been done on skills acquisition in RL, extracting skills that are independent of any task is still an open problem. Also, despite the benefits mentioned for using skills in the literature, a methodology has not been provided for evaluating the usefulness of each acquired skill in RL problems. In recent years, a lot of interest has been drawn to intrin- sic motivation in reinforcement learning communities. This concept comes from psychology and describes the spontane- ous exploratory behaviors observed in humans especially in infants (Berlyne 1960). Psychologists discriminate between extrinsically and intrinsically motivated behaviors. The for- mer means doing an activity in order to achieve some exter- nally supplied reward such as a prize, and the latter means doing something because of its inherent satisfaction (Barto et al. 2004). The research on using intrinsic motivations in RL frame- work is relatively new and is still not fully structured (Mirolli and Baldassarre 2013a). Intrinsic motivation has been used in reinforcement learning for acquisition of general and re- usable skills (Barto et al. 2004), selecting between skills (Stout and Barto 2010; Merrick 2012), guiding exploration in large spaces (Oudeyer et al. 2007), learning the model of * Nasser Mozayani mozayani@iust.ac.ir Marzieh Davoodabadi Farahani davoodabadi@comp.iust.ac.ir 1 Computer Engineering Department, Iran University of Science and Technology, Tehran, Iran