Towards Building a Bus Travel Time Prediction Model for Metro Manila Felan Carlo C. Garcia 1 , Alvin E. Retamar 2 Solutions and Services Engineering Division, Advanced Science and Technology Institute, ASTI Bldg., CP Garcia Ave., University of the Philippines Technology Park, Quezon City, Philippines felan@asti.dost.gov.ph 1 , ning@asti.dost.gov.ph 2 Abstract — Land Transportation Sector is one of the key sectors in the Philippine economy particularly in Metro Manila. With the rapid urbanization of the Philippines, the urban transport infrastructure is expected to experience pressures posing a major risk of urban transport degradation resulting into longer travel times, economic and productivity losses. In light of this, the Land Transportation Franchising and Regulatory Board (LTFRB) along with DOST-ASTI has initiated a project on implementing a bus management system for Public Utility Vehicles utilizing real time GPS location data. This study takes on establishing a travel time prediction for the buses given a specific route. The travel time estimation was performed using Extremely Randomize Trees, a supervised machine learning algorithm. The resulting prediction set had a correlation of determination score indicative of a good predictive performance for travel time prediction. Keywords—Bus Management; Intelligent Transport System; Internet of Things; Machine Learning; I. INTRODUCTION Land Transportation Sector is one of the key sectors in the Philippine economy particularly in Metro Manila where the transport system accounts for 69% of the total trips, with buses and jeepneys providing the bulk of the transportation services. With the rapid urbanization of the Philippines, the urban transport infrastructure is expected to experience pressures posing a major risk of urban transport degradation resulting into longer travel times, economic and productivity losses [1]. In a study released by the Japan International Cooperating Agency (JICA), a two-pronged approach is proposed utilizing Intelligent Transport Systems (ITS) and Traffic Infrastructure Investments to sustain the increasing transport requirements of the county [2]. The utilization of Intelligent Transport Systems allows for better management and maximization of the current transportation infrastructure capacity to sustain business developments, investment growth, and encourage economic activities. In Metro Manila alone, around 16000 urban and provincial Public Utility Buses (PUB) are situated in the region providing both provincial and local transportation services. In light of this, the Land Transportation Franchising and Regulatory Board (LTFRB) along with Department of Science and Technology – Advanced Science and Technology Institute (ASTI) has commenced a project starting 2014, on implementing a bus management system for public utility buses (PUB) utilizing real time GPS data [3]. The aim of the project is to develop an infrastructure not only to monitor buses in terms of location, speed, and route lane but also use the data in developing transport management researches and intelligent transport system applications. Pilot testing of the system has been done during 2015 and has shown to be capable in providing real time tracking of buses and daily reports such as speeding and out of lane events. One challenge on establishing a prediction model especially for travel time in the context of Metro Manila public utility transport is that there are no scheduled bus operations. Travel schedules depend on certain considerations – flow of traffic, time of day, availability of vehicles, and number of passengers. In this paper we discuss our initial work on utilizing historical data of multiple buses in establishing a machine learning-based predictive model for bus travel time prediction using Extremely Randomized Trees. II. RELATED LITERATURES A. Related Works There are varieties of data used to estimate the travel times on road networks and one of which is the floating car data (FCD). Floating car data is defined as the vehicle's position as it traverse the roads throughout the day. One of the advantages of using FCD is that its position fixes are generally accurate, since GPS is used. However, there are also disadvantages seen in this method, for instance a high density of data is required to get meaningful travel time predictions for a given road network. This disadvantage therefore led the construction a more accurate travel time prediction for a given road network by applying Machine Learning Algorithms. The study also explored the use of supervised learning for traffic inference on links on a road network [4]. Another study by Lam, et al. [5] also took advantage of Machine Learning Techniques to present a real-time estimation of destination and travel time for taxis approach. The presented approach leveraged on the pattern observed in a dataset of roughly 1.7 million taxi journeys to predict the corresponding final destination and travel time for ongoing taxi trips. 3805 978-1-5090-2597-8/16/$31.00 c 2016 IEEE