Intuitive Action Set Formation in Learning Classifier
Systems with Memory Registers
L. Sim ˜ oes and M.C. Schut and E. Haasdijk
1
Abstract. An important design goal in Learning Classifier Systems
(LCS) is to equally reinforce those classifiers which cause the level of
reward supplied by the environment. In this paper, we propose a new
method for action set formation in LCS. When applied to a Zeroth
Level Classifier System with Memory registers (ZCSM), our method
allows the distribution of rewards among classifiers which result in
the same memory state, rather than those encoding the same memory
update action.
1 INTRODUCTION
This paper introduces a new method for action set formation (asf )
in Learning Classifier Systems, and tests it in partially observable
environments requiring memory. The operation of asf is responsi-
ble for choosing the classifiers that will receive the reward supplied
by the environment, for some performed action. When new classi-
fiers are generated, the system has no way of knowing how good
these are. Their strengths depend on the actions in the contexts under
which they trigger, and on the other classifiers in the population with
which they interact. As classifiers are added to the population, these
are assigned an initial strength value. Then, by repeated usage, the
strength update component will gradually converge towards a better
estimate of their qualities. But since the system has to perform at the
same time it is building its rule base, it is forced to act despite its un-
certainty about the environment, and selecting from among an ever
changing population of insufficiently tested classifiers. The method
introduced here, iasf, eliminates some of the noise to which the qual-
ity estimation component is subjected, with the goal of improving
system performance.
2 BACKGROUND
In the mid-1990s, Wilson [7] proposed ZCS as a simplification of
Holland’s original LCS [3]. Most importantly, he left out the mes-
sage list which acted as memory in the original system. Thus, Wil-
son’s models had no way of remembering previously encountered
states and could not perform optimally in partially observable envi-
ronments where an agent can find itself in a state that is indistin-
guishable from another state. However, the best action to undertake
is not necessarily the same in both states. Wilson proposed [7] a so-
lution for this problem in the form of memory registers to extend the
classifiers. Cliff & Ross [2] follow this suggestion and implement
ZCSM, extending ZCS with a memory mechanism. In their exper-
iments they observed that ZCSM can efficiently exploit memory in
partially observable environments.
1
Department of Computer Science, Faculty of Sciences, VU University, Am-
sterdam, The Netherlands, email: {lfms, mc.schut, e.haasdijk}@few.vu.nl
Stone & Bull extensively compared ZCS to the more popular XCS
in noisy, continuous-valued environments [6] and found that what
makes XCS so good in deterministic environments (namely; its at-
tempt to build a complete, maximally accurate and maximally gen-
eral map of the payoff landscape) becomes a disadvantage as the level
of noise in the environment increases. ZCS’s partial map, focusing
on high-rewarding niches in the payoff landscape then becomes an
advantage. This suggests ZCS as an adaptive control mechanism in
multi-step, partially observable, stochastic real-world problems.
3 INTUITIVE ACTION SET FORMATION
ZCS works on a population P of rules which together present a so-
lution to the problem with which the system is faced. As it interacts
with the environment, the system is triggered on reception of a sen-
sory input. A match set M is then formed with all the rules in the
population matching that input. From this set, a classifier is chosen
by proportionate selection based on its strength, and its action is ex-
ecuted. With memory added as described in [2], rules prescribe an
external action as well as a modification of the memory bits.
It can be argued that the core of ZCS lies in the next, reinforcement
stage, as it is responsible for incrementally learning the quality of the
rules in the population, which will in turn determine the system’s
behaviour.
The action set A includes those rules in M that advocated the same
action as the chosen classifier. The rules in this action set share in the
reward that results from the selected action (with the rationale that
choosing any of those rules would have had the same effect). Rules
in M that advocate a different action are penalised.
Traditionally, A consists of those rules in M that match on a bit-
wise comparison with the action-part of the chosen classifier. Now,
consider ZCSM, where operators on the memory state are added
to the action part of the rules. Suppose, then, a situation where the
memory state was 01, and remains the same after execution of some
chosen classifier c, which advocated
2
[0#]. Traditional action set
formation would then have A include only those classifiers from M
advocating this same memory operation (“set the first memory reg-
ister to 0”) as well as the same external action as the chosen clas-
sifier. However, all of the internal actions {##,#1,01} would result
in exactly the same internal state. Not only would the system not re-
ward any classifier in M having one of those internal actions (and
the same external action) as the chosen classifier, it would actually
penalise them. This seems to conflict with ZCS’s goal of equally re-
warding those classifiers which would cause the same level of reward
supplied by the environment.
2
Disregarding the external output for simplicity.
ECAI 2008
M. Ghallab et al. (Eds.)
IOS Press, 2008
© 2008 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-58603-891-5-761
761