Statistical Modelling and Analysis of
Sparse Bus Probe Data in Urban Areas
Andrei Iu. Bejan, Richard J. Gibbens, David Evans,
Alastair R. Beresford, Jean Bacon
Computer Laboratory, University of Cambridge
15 JJ Thomson Avenue
Cambridge, UK CB3 0FD
firstname.lastname@cl.cam.ac.uk
Adrian Friday
Computing Department, Lancaster University
InfoLab21, South Drive
Lancaster University
Lancaster, UK LA1 4WA
adrian@comp.lancs.ac.uk
Abstract— Congestion in urban areas causes financial loss
to business and increased use of energy compared with free-
flowing traffic. Providing citizens with accurate information
on traffic conditions can encourage journeys at times of
low congestion and uptake of public transport. Installing the
measurement infrastructure in a city to provide this information
is expensive and potentially invades privacy. Increasingly, public
transport vehicles are equipped with sensors to provide real-
time arrival time estimates, but these data are sparse. Our work
shows how these data can be used to estimate journey times
experienced by road users generally. In this paper we describe
(i) what a typical data set from a fleet of over 100 buses looks
like; (ii) describe an algorithm to extract bus journeys and
estimate their duration along a single route; (iii) show how to
visualise journey times and the influence of contextual factors;
(iv) validate our approach for recovering speed information
from the sparse movement data.
I. Introduction
Congestion on roads, especially in urban areas, has a large
negative social and economic impact on the community and
the environment; for example, the cost of congestion to the
UK economy was estimated at £12 billion in 2004 [1] and
the cost to the US economy in 2007 was estimated at $78
billion [2]. Congestion can be reduced by increasing the
capacity of the road network, encouraging drivers to travel
on different routes or at different times of the day, or by
using public transport. Travellers may often be unaware of
alternative means of getting from A to B, regardless of
their regular mode of transport, since travelling becomes
an automatic and habitual process; fortunately the provision
of better information about likely costs and travel times
encourages travellers to explore alternative times and modes
of transport [3]. Consequently reducing congestion, either
through increased capacity or through the provision of better
information for travellers, requires good knowledge of the
performance of the road network.
The traditional approach to measuring vehicle movement
and congestion is to use static vehicle sensors. These can
be inductive loops in the road itself [4] or video cameras
which detect the presence of vehicles at fixed locations; some
newer networks of cameras can measure point-to-point travel
This research is supported by UK EPSRC grant EP/C547632/1.
times between pairs of cameras using automatic number plate
recognition. Unfortunately, these approaches require the in-
stallation and on-going maintenance of expensive equipment
in a harsh outdoor environment.
An alternative approach is to use probe data from a sensor,
such as a GPS device, attached to a vehicle or person. Probe
data consist of a sequence of coordinates recorded over
time and contain much more information than is typically
available from fixed sensors [5]. Many public transport fleets
are now augmented with automated vehicle location (AVL)
systems which use GPS to collect probe data [6].
Shalaby and Farhan used AVL data from buses to predict
bus arrival and departure times [7]; Uno et al. provided
techniques to study variability of travel times and esti-
mated travel-time distributions [8]. Krumm and Horovitz
have shown how to predict destination from historic GPS
traces [9], and Froehlich and Krumm extended this to predict
the route a driver will follow [10]. Liao et al. estimated a
person’s location and mode of transport together with the
individual’s goals and trip segments [11]. More recently,
Google’s “Maps for Mobile” combined location data crowd-
sourced from mobile phones with traditional sensor infras-
tructure to overlay road maps with congestion information
on arterial routes [12].
One problem with AVL data is that they are sparse—
samples are typically recorded once every 20 or 30
seconds—and therefore techniques developed for probe data
having high update rates are not directly applicable. In this
paper we analyse sparse probe data collected from a fleet of
over 100 buses and we combine these data with descriptions
of bus stop locations and the road network as a whole to build
a rich vision of traffic patterns and congestion. Specifically,
we (i) describe an algorithm to extract bus journeys and
estimate their duration along a single route; (ii) show how
quantile regression can be used to visualise contextual factors
that affect journey times; (iii) show how to recover speed
information from sparse probe data using monotonic splines;
and (iv) validate this recovery by comparing it to high
update-rate probe data.
2010 13th International IEEE
Annual Conference on Intelligent Transportation Systems
Madeira Island, Portugal, September 19-22, 2010
TC2.1
978-1-4244-7658-9/10/$26.00 ©2010 IEEE 1256