1 The path inference filter: model-based low-latency map matching of probe vehicle data Supplementary Materials Timothy Hunter, Pieter Abbeel, and Alexandre Bayen University of California at Berkeley Abstract—The following article is an expanded version of the article submitted for review at the Workshop on the Algorithmic Foundations of Robotics. We consider the problem of reconstructing vehicle trajectories from sparse sequences of GPS points, for which the sampling interval is between 10 seconds and 2 minutes. We introduce a new class of algorithms, called altogether path inference filter (PIF), that maps GPS data in real time, for a variety of trade- offs and scenarios, and with a high throughput. Numerous prior approaches in map-matching can be shown to be special cases of the path inference filter presented in this article. We present an efficient procedure for automatically training the filter on new data, with or without ground truth observations. The framework is evaluated on a large San Francisco taxi dataset and is shown to improve upon the current state of the art. This filter also provides insights about driving patterns of drivers. The path inference filter has been deployed at an industrial scale inside the Mobile Millennium traffic information system, and is used to map fleets of data in San Francisco, Sacramento, Stockholm and Porto. I. I NTRODUCTION Amongst the modern man-made plagues, traffic congestion is a universally recognized challenge [11]. Building reliable and cost-effective traffic monitoring systems is a prerequisite to addressing this phenomenon. Historically, the estimation of traffic congestion has been limited to highways, and has relied mostly on a static, dedicated sensing infrastructure such as loop detectors or cameras [42]. The estimation problem is more challenging in the case of the secondary road network, also called the arterial network, due to the cost of deploying a wide network of sensors in large metropolitan areas. The most promising source of data is the GPS receiver in personal smartphones and commercial fleet vehicles. According to some studies [33], devices with a data connection and a GPS will represent 80% of the cellphone market by 2015. GPS observations in cities are noisy [10], and are usually provided at low sampling rates (on the order of one minute) [9]. One of the common problems which occurs when dealing with GPS traces is the correct mapping of these observations to the road network, and the reconstruction of the trajectory of the vehicle. We present a new class of algorithms, called the path inference filter, that solves this problem in a principled and efficient way. Specific instantiations of this algorithm have been deployed as part of the Mobile Millennium system, which is a traffic estimation and prediction system developed at the Figure 1. An example of dataset available to Mobile Millennium and processed by the path inference filter: taxicabs in San Francisco from the Cabspotting program [9]. Large circles in red show the position of the taxis at a given time and small dots (in black) show past positions (during the last five hours) of the fleet. The position of each vehicle is observed every minute. University of California [2]. Mobile Millennium infers real- time traffic conditions using GPS measurements from drivers running cell phone applications, taxicabs, and other mobile and static data sources. This system was initially deployed in the San Francisco Bay area and later expanded to other locations such as Sacramento, Stockholm, and Porto. GPS receivers have enjoyed a widespread use in trans- portation and they are rapidly becoming a commodity. They offer unique capabilities for tracking fleets of vehicles (for companies), and routing and navigation (for individuals). These receivers are usually attached to a car or a truck, also called a probe vehicle, and they relay information to a base station using the data channels of cellphone networks (3G, 4G). A typical datum provided by a probe vehicle includes an identifier of the vehicle, a (noisy) position and a timestamp 1 . Figure 1 graphically presents a subset of probe data collected by Mobile Millennium. In addition to these geolocalization 1 The experiments in this article use GPS observations only. However, nothing prevents the application of the algorithms presented in this article to other types of localized data.