Reinforcement Learning for True Adaptive Traffic Signal
Control
Baher Abdulhai
1
; Rob Pringle
2
; and Grigoris J. Karakoulas
3
Abstract: The ability to exert real-time, adaptive control of transportation processes is the core of many intelligent transportation
systems decision support tools. Reinforcement learning, an artificial intelligence approach undergoing development in the machine-
learning community, offers key advantages in this regard. The ability of a control agent to learn relationships between control actions and
their effect on the environment while pursuing a goal is a distinct improvement over prespecified models of the environment. Prespecified
models are a prerequisite of conventional control methods and their accuracy limits the performance of control agents. This paper contains
an introduction to Q-learning, a simple yet powerful reinforcement learning algorithm, and presents a case study involving application to
traffic signal control. Encouraging results of the application to an isolated traffic signal, particularly under variable traffic conditions, are
presented. A broader research effort is outlined, including extension to linear and networked signal systems and integration with dynamic
route guidance. The research objective involves optimal control of heavily congested traffic across a two-dimensional road network—a
challenging task for conventional traffic signal control methodologies.
DOI: 10.1061/ASCE0733-947X2003129:3278
CE Database subject headings: Traffic signal controllers; Intelligent transportation systems; Traffic control; Traffic management;
Adaptive systems.
Introduction
The ability to exert real-time, adaptive control over a transporta-
tion process is potentially useful for a variety of intelligent trans-
portation systems services, including control of a system of traffic
signals, control of the dispatching of paratransit vehicles, and
control of the changeable message displays or other cues in a
dynamic route guidance system, to name a few. In each case, the
controlling actions should respond to actual environmental
conditions—vehicular demand in the case of a signal system, the
demand for multiple paratransit trip origins and destinations, or
the road network topology and traffic conditions in the case of
dynamic route guidance. Even more valuable is the ability to
control in accordance with an optimal strategy defined in terms of
one or more performance objectives. For example, one might
wish to have a signal control strategy that minimizes delay, a
paratransit dispatching system that minimizes wait time and ve-
hicle kilometers traveled, or a dynamic route guidance system
that minimizes travel time.
A key limitation of conventional control systems is a require-
ment for one or more prespecified models of the environment.
The purpose of these might be to convert sensory inputs into a
useful picture of current or impending conditions or provide an
assessment of the probable impacts of alternative control actions
in a given situation. Such models require domain expertise to
construct. Furthermore, they must often be sufficiently general to
cover a variety of conditions, as it is usually impractical to pro-
vide separate models to address each potential situation. For ex-
ample, some state-of-the-art traffic signal control systems rely on
a platoon-dispersion model to predict the arrival pattern of ve-
hicles at a downstream signal based on departures from an up-
stream signal. A generalized model designed to represent all road
links cannot possibly reflect the impacts of the different combi-
nations of side streets and driveways generating and absorbing
traffic between the upstream and downstream signals.
What if a controlling agent could directly learn the various
relationships inherent in its world from its experience with differ-
ent situations in that world? Not only would the need for model
prespecification be obviated or at least minimized, but such an
agent could effectively tailor its control actions to specific situa-
tions based on its past experience with the same or similar situa-
tions. The machine-learning research community, related to the
artificial intelligence community, provides us with a variety of
methods that might be adapted to transportation control problems.
One of these, particularly useful due to its conceptual simplicity,
yet impressive in its potential, is reinforcement learning see Sut-
ton and Barto 1998 or Kaelbling et al. 1996 for comprehensive
overviews, or Bertsekas and Tsitsiklis 1996 for a more rigorous
treatment.
This paper provides a brief introduction to the concept of re-
inforcement learning. As a case study, reinforcement learning is
applied to the case of an isolated traffic signal with encouraging
results. This is the first stage in a research program to develop a
signal system control methodology, based on reinforcement learn-
1
Assistant Professor and Director, Intelligent Transportation Systems
Centre, Dept. of Civil Engineering, Univ. of Toronto, Toronto, ON,
Canada M5S 1A4. E-mail: baher@ecf.utoronto.ca
2
PhD Candidate, Intelligent Transportation Systems Centre, Dept. of
Civil Engineering, Univ. of Toronto, Toronto, ON, Canada M5S 1A4.
E-mail: rob.pringle@utoronto.ca
3
Dept. of Computer Science, Univ. of Toronto, Pratt Building
LP283E, 6 King’s College, Toronto, ON, Canada M5S 1A4. E-mail:
grigoris@cs.toronto.edu
Note. Discussion open until October 1, 2003. Separate discussions
must be submitted for individual papers. To extend the closing date by
one month, a written request must be filed with the ASCE Managing
Editor. The manuscript for this paper was submitted for review and pos-
sible publication on October 30, 2001; approved on May 21, 2002. This
paper is part of the Journal of Transportation Engineering, Vol. 129,
No. 3, May 1, 2003. ©ASCE, ISSN 0733-947X/2003/3-278 –285/$18.00.
278 / JOURNAL OF TRANSPORTATION ENGINEERING © ASCE / MAY/JUNE 2003