Simulation of sequential data: An enhanced reinforcement learning approach Marlies Vanhulsel, Davy Janssens, Geert Wets * , Koen Vanhoof Hasselt University – Campus Diepenbeek, Transportation Research Institute, Wetenschapspark 5 bus 6, B-3590 Diepenbeek, Belgium article info Keywords: Reinforcement learning Regression tree Function approximation Activity-based travel demand modelling abstract The present study aims at contributing to the current state-of-the art of activity-based travel demand modelling by presenting a framework to simulate sequential data. To this end, the suitability of a rein- forcement learning approach to reproduce sequential data is explored. Additionally, as traditional rein- forcement learning techniques are not capable of learning efﬁciently in large state and action spaces with respect to memory and computational time requirements on the one hand, and of generalizing based on infrequent visits of all state-action pairs on the other hand, the reinforcement learning tech- nique as used in most applications, is enhanced by means of regression tree function approximation. Three reinforcement learning algorithms are implemented to validate their applicability: the tradi- tional Q-learning and Q-learning with bucket-brigade updating are tested against the improved rein- forcement learning approach with a CART function approximator. These methods are applied on data of 26 diary days. The results are promising and show that the proposed techniques offer great opportu- nity of simulating sequential data. Moreover, the reinforcement learning approach improved by introduc- ing a regression tree function approximator learns a more optimal solution much faster than the two traditional Q-learning approaches. Ó 2008 Elsevier Ltd. All rights reserved. 1. Introduction Models are the result of the human urge to organize facts and behaviour. Moreover, while outlining policies, governments and policy makers wish to be supported by models in order to estimate the impact of their decisions on society as a whole. Travel demand models compromise a major example of such decision supporting models, as they can be applied to evaluate the inﬂuence of both transport- and non-transport-related policies on mobility as well as to assess the impact of mobility on non-transport related issues, such as air quality (Fried, Havens, & Thall, 1977; Shiftan & Surhbier, 2002; Shiftan et al., 2003; Stead & Banister, 2001). To serve this purpose, activity based-transportation models have entered the area of transportation modelling during the past decade. These models are founded on four basic concepts. First, activity-based transportations models assume that travel is de- rived from the demand for activities in space and time which are executed in an attempt to meet individual goals and needs (Chapin, 1974). Next, humans face a number of temporal-spatial constraints that restrict the individual’s action space (Hägerstrand, 1970). Fur- thermore, activity and travel decisions cannot be disconnected from the household context in which the individual operates (Jones, Dix, Clarke, & Heggie, 1983). Last but not least, activity and travel decisions are affected to a large extent by past and antic- ipated future events (Bowman, 1995). Such activity-based trans- portation models offer the opportunity of predicting travel demand more accurately, as they provide more profound insight into individual activity-travel behaviour (Algers, Eliasson, & Matts- son, 2005; Kitamura, 1996). Before being able to account for policy changes, a model needs to be formulated; the core of which consists of simulating daily activity-travel schedules (Arentze & Timmermans, 2004). To this end, the current research wants to explore the use of artiﬁcial intelligence techniques to extract information from sequential data. More particularly, this study focuses on the application of reinforcement learning within this area of research, as it is funded on the rather straightforward principles of human learning by trial and error interactions in a dynamic environment (Kaelbling, Litt- man, & Moore, 1996). Consequently, the contributions of the present study to the cur- rent state-of-the-art are twofold. The ﬁrst added value includes the application of Q-learning, which is a well-known reinforcement learning technique, to simulate sequential data. However, when implementing this technique, both memory and computational time requirements increase rapidly with the dimensionality and granularity of the state-action space (Sutton & Barto, 1998; Van- hulsel, Janssens, Wets, & Vanhoof, 2008). Therefore, an improve- 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.10.056 * Corresponding author. Tel.: +32 (0) 11 26 91 58; fax: +32 (0) 11 26 91 99. E-mail address: geert.wets@uhasselt.be (G. Wets). Expert Systems with Applications 36 (2009) 8032–8039 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa