Article DOI: 10.1111/exsy12115 Genetic programming for the minimum time swing up and balance control acrobot problem Dimitris C. Dracopoulos * and Barry D. Nichols Department of Computer Science, University of Westminster, 115 New Cavendish Street, London, W1W 6UW, United Kingdom E-mail: d.dracopoulos@westminster.ac.uk; b.nichols@westminster.ac.uk Abstract: This work describes how genetic programming is applied to evolving controllers for the minimum time swing up and inverted balance tasks of the continuous state and action: limited torque acrobot. The best swing-up controller is able to swing the acrobot up to a position very close to the inverted handstandposition in a very short time, shorter than that of Coulom (2004), who applied the same constraints on the applied torque values, and to take only slightly longer than the approach by Lai et al. (2009) where far larger torque values were allowed. The best balance controller is able to balance the acrobot in the inverted position when starting from the balance position for the length of time used in the tness function in all runs; furthermore, 47 out of 50 of the runs evolve controllers able to maintain the balance position for an extended period, an improvement on the balance controllers generated by Dracopoulos and Nichols (2012), which this paper is extended from. The most successful balance controller is also able to balance the acrobot when starting from a small offset from the balance position for this extended period. Keywords: articial intelligence, control systems, genetic programming, computational intelligence 1. Introduction Genetic programming (GP) has been shown to perform well in some difcult control problems, such as the well-known cart- pole problem (Koza, 1992b), truck backer upper control (Koza, 1992a) and helicopter ight control (Dracopoulos & Effraimidis, 2012), even those where traditional control methods have failed (Dracopoulos, 2011). However, there are surprisingly few applications of GP to difcult control problems. The swing up and balance of the acrobot have been considered such difcult control problems. The discrete, half- swing-up variant of the acrobot problem (a simpler version of the problem than the one considered here) was one of the six problems included in the 2009 Reinforcement Learning Competition (RLC, 2009). In the literature, there are only two previous GP approaches to the acrobot problem that we are aware of (Fukushima & Uezato, 2009; Doucette & Heywood, 2011). Neither attempts the balance task; Doucette and Heywood (2011) only attempt to solve the simplied half-swing-up, while Fukushima and Uezato (2009) apply GP to solve a three-section variant of the acrobot problem. Thus, this is the rst application of GP to solve the standard acrobot swing up and balance tasks as proposed by Spong (1994). Through GP, two computer programmes are evolved successfully here: one that is able to swing up the acrobot in time comparable with that achieved by other methods and another that can successfully balance the acrobot when starting from the inverted position. The performance on the balance task has been improved since Dracopoulos and Nichols (2012), of which this paper is an expanded version. Performance has been improved in both the number of runs, which generate successful controllers, and the generalization capability of the resulting controllers: 94% of the runs produced controllers, which can successfully balance from the inverted position for twice as long as the time used to calculate the tness. Furthermore, a larger proportion of runs generated controllers, which can balance for longer from one of the offset positions. This was achieved by narrowing the search space by reducing the function set and also by simplifying the tness function to only include what we require the controller to achieve, that is, maintain the acrobot in the inverted balance position for the maximum time; thus, the tness function in this experiment is the time from the start of the run, until the acrobot leaves the balance region, rather than the more complicated tness function, which also included the distance from the inverted position, as used by Dracopoulos and Nichols (2012). The rest of this paper is organized as follows. In Section 2, the details of the acrobot control problem are presented; then we describe the GP approach taken here and how it was applied to the swing up and balance problems in Section 3; this is followed by the results of the two experiments (Section 4); and nally, in Section 5, we present our conclusions and directions of future research. © 2015 Wiley Publishing Ltd Expert Systems, xxxx 2015, Vol. 00, No. 00