Vol.:(0123456789) 1 3 Journal of Intelligent Manufacturing https://doi.org/10.1007/s10845-020-01629-3 Acquiring reusable skills in intrinsically motivated reinforcement learning Marzieh Davoodabadi Farahani 1 · Nasser Mozayani 1 Received: 2 December 2019 / Accepted: 12 July 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020 Abstract This paper proposes a novel incremental model for acquiring skills and using them in Intrinsically Motivated Reinforce- ment Learning (IMRL). In this model, the learning process is divided into two phases. In the ﬁrst phase, the agent explores the environment and acquires task-independent skills by using diﬀerent intrinsic motivation mechanisms. We present two intrinsic motivation factors for acquiring skills by detecting states that can lead to other states (being a cause) and by detect- ing states that help the agent to transition to a diﬀerent region (discounted relative novelty). In the second phase, the agent evaluates the acquired skills to ﬁnd suitable ones for accomplishing a speciﬁc task. Despite the importance of assessing task-independent skills to perform a task, the idea of evaluating skills and pruning them has not been considered in IMRL literature. In this article, two methods are presented for evaluating previously learned skills based on the value function of the assigned task. Using such a two-phase learning model and the skill evaluation capability helps the agent to acquire task- independent skills that can be transferred to other similar tasks. Experimental results in four domains show that the proposed method signiﬁcantly increases learning speed. Keywords Hierarchical reinforcement learning · Skill · Option · Intrinsic motivation · Skill evaluation Introduction Reinforcement learning (RL) is a ﬁeld of machine learn- ing that deals with the problem of how agents should take suitable actions in an environment to maximize cumula- tive reward. It has been applied successfully in diﬀerent problems such as robotics, ﬁnance, and manufacturing (Li 2019). For example, in Chen et al. (2015), RL is applied for scheduling a multiple-load carrier to deliver parts to line- side buﬀers of a general assembly line; or in Aissani et al. (2012), a reactive multi-agent model is proposed for adaptive scheduling in multi-site companies based on RL. Many classical RL methods are not able to handle high- dimensional problems in a reasonable time. Hierarchical reinforcement learning algorithms apply skills to divide a task to set of subtasks and tackle this problem. Although much work has been done on skills acquisition in RL, extracting skills that are independent of any task is still an open problem. Also, despite the beneﬁts mentioned for using skills in the literature, a methodology has not been provided for evaluating the usefulness of each acquired skill in RL problems. In recent years, a lot of interest has been drawn to intrin- sic motivation in reinforcement learning communities. This concept comes from psychology and describes the spontane- ous exploratory behaviors observed in humans especially in infants (Berlyne 1960). Psychologists discriminate between extrinsically and intrinsically motivated behaviors. The for- mer means doing an activity in order to achieve some exter- nally supplied reward such as a prize, and the latter means doing something because of its inherent satisfaction (Barto et al. 2004). The research on using intrinsic motivations in RL frame- work is relatively new and is still not fully structured (Mirolli and Baldassarre 2013a). Intrinsic motivation has been used in reinforcement learning for acquisition of general and re- usable skills (Barto et al. 2004), selecting between skills (Stout and Barto 2010; Merrick 2012), guiding exploration in large spaces (Oudeyer et al. 2007), learning the model of * Nasser Mozayani mozayani@iust.ac.ir Marzieh Davoodabadi Farahani davoodabadi@comp.iust.ac.ir 1 Computer Engineering Department, Iran University of Science and Technology, Tehran, Iran