Statistical Modelling and Analysis of Sparse Bus Probe Data in Urban Areas Andrei Iu. Bejan, Richard J. Gibbens, David Evans, Alastair R. Beresford, Jean Bacon Computer Laboratory, University of Cambridge 15 JJ Thomson Avenue Cambridge, UK CB3 0FD firstname.lastname@cl.cam.ac.uk Adrian Friday Computing Department, Lancaster University InfoLab21, South Drive Lancaster University Lancaster, UK LA1 4WA adrian@comp.lancs.ac.uk Abstract— Congestion in urban areas causes financial loss to business and increased use of energy compared with free- flowing traffic. Providing citizens with accurate information on traffic conditions can encourage journeys at times of low congestion and uptake of public transport. Installing the measurement infrastructure in a city to provide this information is expensive and potentially invades privacy. Increasingly, public transport vehicles are equipped with sensors to provide real- time arrival time estimates, but these data are sparse. Our work shows how these data can be used to estimate journey times experienced by road users generally. In this paper we describe (i) what a typical data set from a fleet of over 100 buses looks like; (ii) describe an algorithm to extract bus journeys and estimate their duration along a single route; (iii) show how to visualise journey times and the influence of contextual factors; (iv) validate our approach for recovering speed information from the sparse movement data. I. Introduction Congestion on roads, especially in urban areas, has a large negative social and economic impact on the community and the environment; for example, the cost of congestion to the UK economy was estimated at £12 billion in 2004 [1] and the cost to the US economy in 2007 was estimated at $78 billion [2]. Congestion can be reduced by increasing the capacity of the road network, encouraging drivers to travel on different routes or at different times of the day, or by using public transport. Travellers may often be unaware of alternative means of getting from A to B, regardless of their regular mode of transport, since travelling becomes an automatic and habitual process; fortunately the provision of better information about likely costs and travel times encourages travellers to explore alternative times and modes of transport [3]. Consequently reducing congestion, either through increased capacity or through the provision of better information for travellers, requires good knowledge of the performance of the road network. The traditional approach to measuring vehicle movement and congestion is to use static vehicle sensors. These can be inductive loops in the road itself [4] or video cameras which detect the presence of vehicles at fixed locations; some newer networks of cameras can measure point-to-point travel This research is supported by UK EPSRC grant EP/C547632/1. times between pairs of cameras using automatic number plate recognition. Unfortunately, these approaches require the in- stallation and on-going maintenance of expensive equipment in a harsh outdoor environment. An alternative approach is to use probe data from a sensor, such as a GPS device, attached to a vehicle or person. Probe data consist of a sequence of coordinates recorded over time and contain much more information than is typically available from fixed sensors [5]. Many public transport fleets are now augmented with automated vehicle location (AVL) systems which use GPS to collect probe data [6]. Shalaby and Farhan used AVL data from buses to predict bus arrival and departure times [7]; Uno et al. provided techniques to study variability of travel times and esti- mated travel-time distributions [8]. Krumm and Horovitz have shown how to predict destination from historic GPS traces [9], and Froehlich and Krumm extended this to predict the route a driver will follow [10]. Liao et al. estimated a person’s location and mode of transport together with the individual’s goals and trip segments [11]. More recently, Google’s “Maps for Mobile” combined location data crowd- sourced from mobile phones with traditional sensor infras- tructure to overlay road maps with congestion information on arterial routes [12]. One problem with AVL data is that they are sparse— samples are typically recorded once every 20 or 30 seconds—and therefore techniques developed for probe data having high update rates are not directly applicable. In this paper we analyse sparse probe data collected from a fleet of over 100 buses and we combine these data with descriptions of bus stop locations and the road network as a whole to build a rich vision of traffic patterns and congestion. Specifically, we (i) describe an algorithm to extract bus journeys and estimate their duration along a single route; (ii) show how quantile regression can be used to visualise contextual factors that affect journey times; (iii) show how to recover speed information from sparse probe data using monotonic splines; and (iv) validate this recovery by comparing it to high update-rate probe data. 2010 13th International IEEE Annual Conference on Intelligent Transportation Systems Madeira Island, Portugal, September 19-22, 2010 TC2.1 978-1-4244-7658-9/10/$26.00 ©2010 IEEE 1256