Article
DOI: 10.1111/exsy12115
Genetic programming for the minimum time swing up and
balance control acrobot problem
Dimitris C. Dracopoulos
*
and Barry D. Nichols
Department of Computer Science, University of Westminster, 115 New Cavendish Street, London, W1W 6UW, United
Kingdom
E-mail: d.dracopoulos@westminster.ac.uk; b.nichols@westminster.ac.uk
Abstract: This work describes how genetic programming is applied to evolving controllers for the minimum time swing up and inverted
balance tasks of the continuous state and action: limited torque acrobot. The best swing-up controller is able to swing the acrobot up to a
position very close to the inverted ‘handstand’ position in a very short time, shorter than that of Coulom (2004), who applied the same
constraints on the applied torque values, and to take only slightly longer than the approach by Lai et al. (2009) where far larger torque
values were allowed. The best balance controller is able to balance the acrobot in the inverted position when starting from the balance
position for the length of time used in the fitness function in all runs; furthermore, 47 out of 50 of the runs evolve controllers able to maintain
the balance position for an extended period, an improvement on the balance controllers generated by Dracopoulos and Nichols (2012),
which this paper is extended from. The most successful balance controller is also able to balance the acrobot when starting from a small
offset from the balance position for this extended period.
Keywords: artificial intelligence, control systems, genetic programming, computational intelligence
1. Introduction
Genetic programming (GP) has been shown to perform well in
some difficult control problems, such as the well-known cart-
pole problem (Koza, 1992b), truck backer upper control
(Koza, 1992a) and helicopter flight control (Dracopoulos &
Effraimidis, 2012), even those where traditional control
methods have failed (Dracopoulos, 2011). However, there are
surprisingly few applications of GP to difficult control
problems. The swing up and balance of the acrobot have been
considered such difficult control problems. The discrete, half-
swing-up variant of the acrobot problem (a simpler version of
the problem than the one considered here) was one of the six
problems included in the 2009 Reinforcement Learning
Competition (RLC, 2009).
In the literature, there are only two previous GP
approaches to the acrobot problem that we are aware of
(Fukushima & Uezato, 2009; Doucette & Heywood, 2011).
Neither attempts the balance task; Doucette and Heywood
(2011) only attempt to solve the simplified half-swing-up,
while Fukushima and Uezato (2009) apply GP to solve a
three-section variant of the acrobot problem. Thus, this is
the first application of GP to solve the standard acrobot
swing up and balance tasks as proposed by Spong (1994).
Through GP, two computer programmes are evolved
successfully here: one that is able to swing up the acrobot
in time comparable with that achieved by other methods
and another that can successfully balance the acrobot when
starting from the inverted position.
The performance on the balance task has been
improved since Dracopoulos and Nichols (2012), of which
this paper is an expanded version. Performance has been
improved in both the number of runs, which generate
successful controllers, and the generalization capability of
the resulting controllers: 94% of the runs produced
controllers, which can successfully balance from the
inverted position for twice as long as the time used to
calculate the fitness. Furthermore, a larger proportion of
runs generated controllers, which can balance for longer
from one of the offset positions. This was achieved by
narrowing the search space by reducing the function set
and also by simplifying the fitness function to only include
what we require the controller to achieve, that is, maintain
the acrobot in the inverted balance position for the
maximum time; thus, the fitness function in this
experiment is the time from the start of the run, until
the acrobot leaves the balance region, rather than the
more complicated fitness function, which also included
the distance from the inverted position, as used by
Dracopoulos and Nichols (2012).
The rest of this paper is organized as follows. In Section 2,
the details of the acrobot control problem are presented;
then we describe the GP approach taken here and how it
was applied to the swing up and balance problems in
Section 3; this is followed by the results of the two
experiments (Section 4); and finally, in Section 5, we present
our conclusions and directions of future research.
© 2015 Wiley Publishing Ltd Expert Systems, xxxx 2015, Vol. 00, No. 00