Traffic Flow Forecasting: Overcoming Memoryless Property in Nearest Neighbor Non-Parametric Regression Taehyung Kim, Hyoungsoo Kim and David J. Lovell Abstract— Short term traffic flow forecasting has played a key role in proactive and dynamic traffic control systems. A variety of methods and techniques have been developed to forecast traffic flow. Current nearest neighbor non-parametric traffic flow forecasting models treat the dynamic evolution of traffic flows at a given state as a memoryless process; i.e., the current state of traffic flow entirely determines the future state of traffic flow, with no dependence on the past sequences of traffic flow patterns that produced the current state (in existing nearest neighbor non-parametric models, the state includes only instantaneous conditions, not historic ones). Of course, traffic flow is not completely random in nature. There should be some patterns in which the past traffic flow repeats itself. In this paper, we have proposed a pattern recognition technique, which enables us to consider the past sequences of traffic flow patterns to predict the future state. It was found that the pattern recognition model is capable of predicting the future state of traffic flow reasonably well compared with the k-nearest neighbor non-parametric regression model. We hope that this paper is a good platform for the development of more effective nearest neighbor non-parametric regression models. I. INTRODUCTION T HE capability to forecast traffic volume has been identified as a critical need for a proactive and dynamic traffic control system. Cheslow et al. [1] concluded in an early report on intelligent transportation systems (ITS) architecture that the ability to make and continuously update predictions of traffic flows and link times for several minutes into the future using real-time data is a major requirement for providing dynamic traffic control. A variety of methods and techniques have been developed to forecast traffic flow and these have been continuously refined up to the present. Linear regression is perhaps the most well-known method but other techniques such as non-linear regression, time-series analysis, neural networks, and Kalman filtering are commonly used in forecasting traffic flow. Each method has strengths and weaknesses, and each might be said to be designed to handle a specific class of problems. However, during the modeling process, assumptions about the data are made, which may or may not be appropriate, thus affecting forecasting performance. For example, “parametric algorithms assume that the data to be modeled takes on a structure that can be described by a known mathematical expression with a few free parameters” [Kennedy et al., 2] and “If the assumptions are flagrantly violated, any inferences derived from the regression are suspect” [Mendenhall et al., 3]. These types of conclusions are frequently used to motivate the use of non-parametric regression, which is a data-driven heuristic forecasting technique, for forecasting traffic flow or travel time using large traffic flow data sets. Nonparametric regression does not require any prior knowledge about the process being modeled, only sufficiently large quantities of data representing the underlying system. It relies on past data to describe the relationship between input and output states rather than a (possibly incorrect) model upon the data. Hence it is useful in situations where a well-defined theory does not exist but large amounts of data are readily available. Davis and Nihan [4] used a k-nearest neighbor (k-NN) formulation of nonparametric regression to estimate short-term freeway traffic flows. They focused on estimating the transitions from the uncongested traffic regime to the congested regime. An empirical study using actual freeway data was conducted to test the k-NN approach and to compare it to simple univariate linear time-series forecasts. The k-NN method performed comparably to, but not better than, the linear time-series forecasts. Smith et al. [5] also used a nearest neighbor non-parametric regression to develop a traffic flow forecasting model for two sites on Northern Virginia’s Capital Beltway. They showed that the non-parametric regression model significantly outperformed other models such as historical average, time-series, and neural network. Subsequently, Smith et al. [6] and Smith and Oswald [7] used nearest-neighbor techniques to forecast traffic flow based on real-time traffic data. Clark [8] recently examined relationships between flow, occupancy, and speed in order to generate short-term predictions of traffic flow. He employed a k-nearest-neighbor regression and relied on high-quality loop detector data from England. You and Kim [9] also used this technique to forecast travel time using traffic flow data on highways in Korea. Manuscript received March 1, 2005. Taehyung Kim is a graduate student in the Department of Civil and Environmental Engineering at the University of Maryland, College Park, MD 20742 USA (corresponding author to provide phone: 301-405-3636; fax: 301-405-2585; e-mail: thkim@ wam.umd.edu). Hyoungsoo Kim is a graduate student in the Department of Civil and Environmental Engineering at the University of Maryland, College Park, MD 20742 USA (e-mail: hsookim@ umd.edu). It is noteworthy that previous nearest neighbor non-parametric traffic flow forecasting models treat the dynamic evolution of traffic flows at a given state as a memoryless process; i.e., the current state of traffic flow David J. Lovell is an associate professor in the Department of Civil and Environmental Engineering and Institute for Systems Research at the University of Maryland, College Park, MD 20742 USA (e-mail: lovell@eng.umd.edu). Proceedings of the 8th International IEEE Conference on Intelligent Transportation Systems Vienna, Austria, September 13-16, 2005 TC7.2 0-7803-9215-9/05/$20.00 ©2005 IEEE. 965