Fuzzy Sets and Systems 160 (2009) 1420 – 1443
www.elsevier.com/locate/fss
Reinforcement distribution in fuzzy Q-learning
Andrea Bonarini, Alessandro Lazaric, Francesco Montrone, Marcello Restelli
∗
Department of Electronics and Information, Politecnico di Milano, Milan, Italy
Available online 6 December 2008
Abstract
Q-learning is one of the most popular reinforcement learning methods that allows an agent to learn the relationship between
interval-valued state and action spaces, through a direct interaction with the environment. Fuzzy Q-learning is an extension to this
algorithm to enable it to evolve fuzzy inference systems (FIS) which range on continuous state and action spaces. In a FIS, the
interaction among fuzzy rules plays a primary role to achieve good performance and robustness. Learning a system where this
interaction is present gives to the learning mechanism problems due to eventually incoherent reinforcements coming to the same
rule due to its interaction with other rules. In this paper, we will introduce different strategies to distribute reinforcement to reduce
this undesired effect and to stabilize the obtained reinforcement. In particular, we will present two strategies: the former focuses on
rewarding the actions chosen by each rule during the cooperation phase, the latter on rewarding the rules presenting actions closer
to those actually executed rather than the rules that contributed to generate such actions.
© 2008 Elsevier B.V. All rights reserved.
Keywords: Fuzzy systems; Fuzzy Q-learning; Reinforcement learning; Reinforcement distribution
1. Introduction
Fuzzy Q-learning is an approach to learn a set of fuzzy rules by reinforcement. It is an extension of the popular
Q-learning [1] algorithm, widely used to learn tabular relationships among states, described by a finite number of
values for each variable, and discrete actions. Learning fuzzy rules makes it possible to face problems where inputs
are described by real-valued variables, matched by fuzzy sets, and also actions are real-valued. Fuzzy sets play the
role of the ordinal values used in Q-learning, thus making possible an analogous learning approach, but overcoming
the limitations due to the interval-based approximation needed by Q-learning to face the same type of problems.
The partial overlapping among close fuzzy sets covering the range of each variable, although suitable for improving
robustness, smoothness, and many other desired characteristics of fuzzy inference systems (FIS), induces problems
for the evaluation of the contribution of the single rules, since each of them is activated in turn with different rules,
and may obtain from this collaboration [2] different reinforcements. This may result in an incoherent reinforcement
assignment which makes convergence more difficult. Thus, reinforcement distribution becomes a relevant issue in the
definition of fuzzy Q-learning.
In this paper, we propose a new reinforcement distribution method for fuzzy Q-learning that gives better performances
than the traditional one when the domain is described by a large number of fuzzy sets, and that combines favorably
with the traditional one when equivalently optimal actions are possible.
∗
Corresponding author.
E-mail addresses: bonarini@elet.polimi.it (A. Bonarini), lazaric@elet.polimi.it (A. Lazaric), montrone@elet.polimi.it (F. Montrone),
restelli@elet.polimi.it (M. Restelli).
0165-0114/$-see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.fss.2008.11.026