This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1
Estimating Inefficiency in Bus Trip Choices from a
User Perspective With Schedule, Positioning,
and Ticketing Data
Tarciso Braz , Matheus Maciel, Demetrio Gomes Mestre, Nazareno Andrade, Carlos Eduardo Pires,
Andreza Raquel Queiroz, and Veruska Borges Santos
Abstract—The availability of historical data on the global
positioning systems’ trajectories of vehicles and passenger board-
ing information for public bus fleets of large municipalities has
given researchers and practitioners the opportunity to explore
new challenges regarding the analysis of public transportation
systems. This paper performs one such analysis as a case study
examining the margin of improvement that passengers of a 1.8M
people Brazilian city have when choosing their daily bus trips.
In doing so, we document a number of not readily apparent chal-
lenges that must be overcome to leverage public transportation
big data to policymakers, transportation systems operators, and
citizens. Solutions are devised to each of these challenges and
demonstrated on the analysis of the aforementioned 1.8M people
city.
Index Terms— Public transportation, transit usage
performance improvement, map-matching, origin-destination
estimation.
I. I NTRODUCTION
I
NTELLIGENT transportation systems, and in particular
Traveler Information Systems, have the potential to opti-
mize transit trips according to user preferences and restrictions.
Indeed, a number of systems have been proposed and are daily
used by millions of transport users worldwide, such as Google
Maps
1
and Moovit.
2
Nevertheless, although there has been a constant push for
improving algorithms that predict trip time or comfort, there
has been comparatively little effort on estimating the current
margin for improvement that such algorithms can attain at
scale and in naturalistic settings. It is possible that present
systems are already close to a performance ceiling given the
Manuscript received December 14, 2017; revised March 30, 2018; accepted
May 15, 2018. This work was supported by EUBra-BIGSEA, a Research
and Innovation Action, funded in part by the European Commission through
the Cooperation Programme, Horizon 2020, under Grant 690116, and in
part by the Ministério de Ciência, Tecnologia e Inovação, RNP/Brazil, under
Grant GA-0000000650/04. The Associate Editor for this paper was R. Nair.
(Corresponding author: Tarciso Braz.)
The authors are with the Systems and Computing Department, Universidade
Federal de Campina Grande, Campina Grande 58429-900, Brazil (e-mail:
tarcisocomp@gmail.com; teu.araujo@gmail.com; demetriogm@gmail.com;
nazareno@computacao.ufcg.edu.br; cesp@computacao.ufcg.edu.br; andreza.
queiroz@ccc.ufcg.edu.br; veruska.santos@ccc.ufcg.edu.br).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TITS.2018.2846036
1
https://www.google.com/maps
2
https://www.moovitapp.com/
actual choices available in a city. In other words, it is possible
that transit users typically choose their optimal trips in their
routine. If this is the case, there may be more efficient uses of
research and development efforts than trying to improve the
effectiveness of Traveler Information Systems. If the contrary
is true, it would be useful for the operators and community to
have user-centric information (e.g. [1]) in order to understand
to what degree different types of routes, moments or users
have inefficiency in the trips taken by passengers as part of
their daily travel behavior.
In this context, the present work contributes to fill two gaps
in the literature. First, it performs a citywide analysis of the
efficiency of choices made by bus users in the transit system of
Curitiba, a 1.8M-people city in Brazil. Efficiency is measured
as how close choices made by transit users are to the optimal
choice available for their trip with respect to trip duration.
This analysis leverages historical data from the whole of the
bus system, integrated with ticketing and schedule data.
The second contribution of this work is related to document-
ing and addressing difficulties for integrating and leveraging
historical transport data to perform one such analysis. Irre-
spective of recent advances in the availability and formats for
sharing transport data between transportation companies and
the government or citizens, the formats and inconsistencies
presently prevalent in historical transport data pose a number
of challenges for (i) examining a transport system at trip
level using vehicle location data, (ii) estimating boarding
position when automatic fare collection data is available, and
(iii) inferring user trip destination from the combination of
vehicle location and ticketing data. This work documents and
discusses these challenges as observed in data from multiple
cities, puts forward open solutions to these challenges, and
evaluates such solutions.
The remainder of this paper is organized as follows.
Section II describes the three data sources used in this work
and the challenges usually present on the integration of these
data sources. Section III details the Curitiba bus system and
the data collected to be used in this work. All solutions
used to solve data integration challenges are detailed in
Sections IV, V and VI. Our experiment to quantify inefficiency
in user trip choice and its results are presented in Section VII.
Section VIII reviews previous works. Finally, conclusions and
future directions are discussed in Section IX.
1524-9050 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.