Traffic Flow Forecasting: Overcoming Memoryless Property in
Nearest Neighbor Non-Parametric Regression
Taehyung Kim, Hyoungsoo Kim and David J. Lovell
Abstract— Short term traffic flow forecasting has played a
key role in proactive and dynamic traffic control systems. A
variety of methods and techniques have been developed to
forecast traffic flow. Current nearest neighbor non-parametric
traffic flow forecasting models treat the dynamic evolution of
traffic flows at a given state as a memoryless process; i.e., the
current state of traffic flow entirely determines the future state
of traffic flow, with no dependence on the past sequences of
traffic flow patterns that produced the current state (in existing
nearest neighbor non-parametric models, the state includes only
instantaneous conditions, not historic ones). Of course, traffic
flow is not completely random in nature. There should be some
patterns in which the past traffic flow repeats itself. In this
paper, we have proposed a pattern recognition technique, which
enables us to consider the past sequences of traffic flow patterns
to predict the future state. It was found that the pattern
recognition model is capable of predicting the future state of
traffic flow reasonably well compared with the k-nearest
neighbor non-parametric regression model. We hope that this
paper is a good platform for the development of more effective
nearest neighbor non-parametric regression models.
I. INTRODUCTION
T
HE capability to forecast traffic volume has been
identified as a critical need for a proactive and dynamic
traffic control system. Cheslow et al. [1] concluded in an
early report on intelligent transportation systems (ITS)
architecture that the ability to make and continuously update
predictions of traffic flows and link times for several minutes
into the future using real-time data is a major requirement for
providing dynamic traffic control.
A variety of methods and techniques have been developed
to forecast traffic flow and these have been continuously
refined up to the present. Linear regression is perhaps the
most well-known method but other techniques such as
non-linear regression, time-series analysis, neural networks,
and Kalman filtering are commonly used in forecasting traffic
flow. Each method has strengths and weaknesses, and each
might be said to be designed to handle a specific class of
problems. However, during the modeling process,
assumptions about the data are made, which may or may not
be appropriate, thus affecting forecasting performance. For
example, “parametric algorithms assume that the data to be
modeled takes on a structure that can be described by a
known mathematical expression with a few free parameters”
[Kennedy et al., 2] and “If the assumptions are flagrantly
violated, any inferences derived from the regression are
suspect” [Mendenhall et al., 3].
These types of conclusions are frequently used to motivate
the use of non-parametric regression, which is a data-driven
heuristic forecasting technique, for forecasting traffic flow or
travel time using large traffic flow data sets. Nonparametric
regression does not require any prior knowledge about the
process being modeled, only sufficiently large quantities of
data representing the underlying system. It relies on past data
to describe the relationship between input and output states
rather than a (possibly incorrect) model upon the data. Hence
it is useful in situations where a well-defined theory does not
exist but large amounts of data are readily available.
Davis and Nihan [4] used a k-nearest neighbor (k-NN)
formulation of nonparametric regression to estimate
short-term freeway traffic flows. They focused on estimating
the transitions from the uncongested traffic regime to the
congested regime. An empirical study using actual freeway
data was conducted to test the k-NN approach and to compare
it to simple univariate linear time-series forecasts. The k-NN
method performed comparably to, but not better than, the
linear time-series forecasts. Smith et al. [5] also used a
nearest neighbor non-parametric regression to develop a
traffic flow forecasting model for two sites on Northern
Virginia’s Capital Beltway. They showed that the
non-parametric regression model significantly outperformed
other models such as historical average, time-series, and
neural network. Subsequently, Smith et al. [6] and Smith and
Oswald [7] used nearest-neighbor techniques to forecast
traffic flow based on real-time traffic data. Clark [8] recently
examined relationships between flow, occupancy, and speed
in order to generate short-term predictions of traffic flow. He
employed a k-nearest-neighbor regression and relied on
high-quality loop detector data from England. You and Kim
[9] also used this technique to forecast travel time using
traffic flow data on highways in Korea.
Manuscript received March 1, 2005.
Taehyung Kim is a graduate student in the Department of Civil and
Environmental Engineering at the University of Maryland, College Park, MD
20742 USA (corresponding author to provide phone: 301-405-3636; fax:
301-405-2585; e-mail: thkim@ wam.umd.edu).
Hyoungsoo Kim is a graduate student in the Department of Civil and
Environmental Engineering at the University of Maryland, College Park, MD
20742 USA (e-mail: hsookim@ umd.edu).
It is noteworthy that previous nearest neighbor
non-parametric traffic flow forecasting models treat the
dynamic evolution of traffic flows at a given state as a
memoryless process; i.e., the current state of traffic flow
David J. Lovell is an associate professor in the Department of Civil and
Environmental Engineering and Institute for Systems Research at the
University of Maryland, College Park, MD 20742 USA (e-mail:
lovell@eng.umd.edu).
Proceedings of the 8th International
IEEE Conference on Intelligent Transportation Systems
Vienna, Austria, September 13-16, 2005
TC7.2
0-7803-9215-9/05/$20.00 ©2005 IEEE. 965