IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 3, NO. 3, JULY 2016 247 Traffic Signal Timing via Deep Reinforcement Learning Li Li, Senior Member, IEEE, Yisheng Lv, Fei-Yue Wang, Fellow, IEEE Abstract—In this paper, we propose a set of algorithms to design signal timing plans via deep reinforcement learning. The core idea of this approach is to set up a deep neural network (DNN) to learn the Q-function of reinforcement learning from the sampled traffic state/control inputs and the corresponding traffic system performance output. Based on the obtained DNN, we can find the appropriate signal timing policies by implicitly modeling the control actions and the change of system states. We explain the possible benefits and implementation tricks of this new approach. The relationships between this new approach and some existing approaches are also carefully discussed. Index Terms—Traffic control, reinforcement learning, deep learning, deep reinforcement learning. I. I NTRODUCTION T RAFFIC control remains a hard problem for researchers and engineers, due to a number of difficulties. The major two are the modeling difficulty and the optimization difficulty. First, transportation systems are usually distributed, hybrid and complex [1−5] . How to accurately and also conveniently describe the dynamics of transportation systems still leaves not fully solved. As pointed out in [5] and [6], most recent control systems aim to predict future states of transportation systems and make appropriate signal plans in advance. This requirement highlights the importance and hardness of trans- portation systems’ modeling. There are mainly two kinds of approaches to solve this difficulty [5] . One kind is the flow model based approaches, which formulate analytical models to describe the dynamics of macroscopic traffic flow measured at different locations. For example, cell transmission models (CTM) and its variations were frequently considered in reports due to its simplicity and effectiveness [7] . However, when traffic scenarios are complex, the modeling costs and errors need to be carefully considered. Manuscript received April 23, 2016, accepted June 1, 2016. This work was supported by National Natural Science Foundation of China (61533019, 71232006, 61233001). Recommended by Associate Editor Mengchu Zhou. Citation: Li Li, Yisheng Lv, Fei-Yue Wang. Traffic signal timing via deep reinforcement learning. IEEE/CAA Journal of Automatica Sinica, 2016, 3(3): 247−254 Li Li is with the Department of Automation, Tsinghua University, Beijing 100084, China, and also with Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Nanjing 210096, China (e- mail: li-li@tsinghua.edu.cn). Yisheng Lv is with State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China (e-mail: yisheng.lv@ia.ac.cn). Fei-Yue Wang is with State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China (e-mail: feiyue.wang@ia.ac.cn). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. The other kind is the simulation based approaches, which estimate/predict future traffic flow states using either artificial intelligence learning or simulations [8−10] . Artificial intelli- gence models learn and reproduce macroscopic traffic flow dynamics based on recorded traffic flow measurements. In contrast, simulations describe and reproduce the actions of individual microscopic traffic participators, which as a result provides flexible power to better describe macroscopic traffic flow dynamics. However, both artificial intelligence learning and simulation are time-consuming. The tuning of the control performance also becomes hard, since no theoretical analysis tool can be straightforwardly applied for these approaches. Second, when traffic flow descriptions are established, how to determine the best signal plans becomes another problem. For flow model based approaches, we can use mathematical programming methods to solve the given objective functions (usually in terms of delay or queue length) with the explicitly formulated constraints derived from analytical models [7, 11−13] . Differently, for artificial intelligence learning and simulation based approaches, we will reverse the cause-effect based on the learned relationships between control actions and their effect on traffic flows. The try-and-test methods are then used to seek a (sub)optimal signal plan, based on the predicted or simulated effects of the assumed control actions. In literatures, heuristic optimization algorithms, such as genetic algorithms (GA) [14] were often applied to accelerate the seeking process. However, the converging speeds of such algorithms are still questionable in many cases. In this paper, we focus on reinforcement learning approach for traffic signal timing problems [15−18] . Reinforcement learn- ing approach implicitly models the dynamics of complex sys- tems by learning the control actions and the resulted changes of traffic flow. Meanwhile, it seeks the (sub)optimal signal plan from the learned input-output pairs. The major difficulty of reinforcement learning for traffic signal timing lies in the exponentially expanding complexity of signal timing design with the number of the considered traffic flow states and control actions. Recently, a new method is proposed to simultaneously solve the modeling and optimization problems of complex systems by using the so called deep Q network [19] . The deep Q networks indeed combine two hot tools: reinforcement learning [20] and deep learning [21−23] . Herein, deep learning uses multiple layers of artificial neural networks to learn the implicit maximum discounted future reward when we perform a special action given a special state. In this paper, we examine the feasibility and effectiveness of this deep reinforcement learning method in building traffic