Vol.:(0123456789) 1 3
Journal of Intelligent Manufacturing
https://doi.org/10.1007/s10845-020-01629-3
Acquiring reusable skills in intrinsically motivated reinforcement
learning
Marzieh Davoodabadi Farahani
1
· Nasser Mozayani
1
Received: 2 December 2019 / Accepted: 12 July 2020
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
This paper proposes a novel incremental model for acquiring skills and using them in Intrinsically Motivated Reinforce-
ment Learning (IMRL). In this model, the learning process is divided into two phases. In the first phase, the agent explores
the environment and acquires task-independent skills by using different intrinsic motivation mechanisms. We present two
intrinsic motivation factors for acquiring skills by detecting states that can lead to other states (being a cause) and by detect-
ing states that help the agent to transition to a different region (discounted relative novelty). In the second phase, the agent
evaluates the acquired skills to find suitable ones for accomplishing a specific task. Despite the importance of assessing
task-independent skills to perform a task, the idea of evaluating skills and pruning them has not been considered in IMRL
literature. In this article, two methods are presented for evaluating previously learned skills based on the value function of
the assigned task. Using such a two-phase learning model and the skill evaluation capability helps the agent to acquire task-
independent skills that can be transferred to other similar tasks. Experimental results in four domains show that the proposed
method significantly increases learning speed.
Keywords Hierarchical reinforcement learning · Skill · Option · Intrinsic motivation · Skill evaluation
Introduction
Reinforcement learning (RL) is a field of machine learn-
ing that deals with the problem of how agents should take
suitable actions in an environment to maximize cumula-
tive reward. It has been applied successfully in different
problems such as robotics, finance, and manufacturing (Li
2019). For example, in Chen et al. (2015), RL is applied for
scheduling a multiple-load carrier to deliver parts to line-
side buffers of a general assembly line; or in Aissani et al.
(2012), a reactive multi-agent model is proposed for adaptive
scheduling in multi-site companies based on RL.
Many classical RL methods are not able to handle high-
dimensional problems in a reasonable time. Hierarchical
reinforcement learning algorithms apply skills to divide a
task to set of subtasks and tackle this problem. Although
much work has been done on skills acquisition in RL,
extracting skills that are independent of any task is still an
open problem. Also, despite the benefits mentioned for using
skills in the literature, a methodology has not been provided
for evaluating the usefulness of each acquired skill in RL
problems.
In recent years, a lot of interest has been drawn to intrin-
sic motivation in reinforcement learning communities. This
concept comes from psychology and describes the spontane-
ous exploratory behaviors observed in humans especially in
infants (Berlyne 1960). Psychologists discriminate between
extrinsically and intrinsically motivated behaviors. The for-
mer means doing an activity in order to achieve some exter-
nally supplied reward such as a prize, and the latter means
doing something because of its inherent satisfaction (Barto
et al. 2004).
The research on using intrinsic motivations in RL frame-
work is relatively new and is still not fully structured (Mirolli
and Baldassarre 2013a). Intrinsic motivation has been used
in reinforcement learning for acquisition of general and re-
usable skills (Barto et al. 2004), selecting between skills
(Stout and Barto 2010; Merrick 2012), guiding exploration
in large spaces (Oudeyer et al. 2007), learning the model of
* Nasser Mozayani
mozayani@iust.ac.ir
Marzieh Davoodabadi Farahani
davoodabadi@comp.iust.ac.ir
1
Computer Engineering Department, Iran University
of Science and Technology, Tehran, Iran