173 Transportation Research Record: Journal of the Transportation Research Board, No. 2315, Transportation Research Board of the National Academies, Washington, D.C., 2012, pp. 173–181. DOI: 10.3141/2315-18 Purdue University, 465 Northwestern Avenue, West Lafayette, IN 47907-2035. Corresponding author: J. V. Krogmeier, jvk@purdue.edu. Probe data are emerging as an important source for characterizing trans- portation systems. Travel time distributions have traditionally been char- acterized by the mean and standard deviation. These statistics work well to characterize uncongested freeway systems, which have travel time dis- tributions that are approximately normal. When congested conditions or interrupted-flow facilities are encountered, the travel time distributions become more complex. Recently some additional travel time reliability indexes have been developed to quantify these travel time distribution characteristics. This study develops mathematical techniques for deter- mining the sample size required for estimating the underlying travel time distributions that can be used for assessing changes in travel time distri- butions associated with operational changes of traffic signal controller offsets. The example provided shows that while gross changes in offsets require approximately 7 probe vehicle samples per study interval, subtle changes in offsets require approximately 80 probe vehicle data samples per study interval. Although these guidelines were developed for evalu- ating offset changes, the mathematical framework can be applied for evaluating the impact of other parameters, such as split times and cycle lengths. Further research on applying these mathematical techniques to a broader cross section of traffic conditions is warranted to assess their transferability to oversaturated conditions and freeways. Travel time data have many stakeholders, including the motoring public, the media, and government agencies. Government agencies rely on travel time for developing systemwide metrics (1, 2) and conducting before-and-after studies (3). Several new sources of travel time estimates have emerged in the past decade, including consumer-tracking electronics (3–5), toll tags (6), cell phones (7, 8), automatic license plate readers (4), electro- magnetic signature matching (9–12), and data from Global Position- ing System probes (13). Similar data are expected to become more prevalent in the next decade with deployment of the FHWA Connected Vehicle Research initiative (previously referred to as the Vehicle to Infrastructure Integration Program). In particular, there is a question regarding the penetration level of connected vehicle technology before new traffic engineering techniques can start to be realized. The emerging sources of probe data mentioned allow for better characterization of travel time distributions. With sufficient num- bers of travel time samples, new measures of the quality of progres- sion through a corridor can be realized. For the example of travel time through a single intersection, travel time distributions can be used to estimate the percentage arrival on green and can also be used as a tool to indicate the proper action to improve the travel time. This study focuses on developing statistical evaluation techniques that can be used to improve before-and-after studies using emerging probe data sources. BACKGROUND The accuracy of an estimated travel time distribution depends on the quality of the travel time estimates and the number of estimates. This study focuses on the number of estimates since the quality of probe vehicle travel data is becoming quite good. Further information regarding the quality of probe data can be found elsewhere (9, 10). The number of probe vehicle travel time estimates required to char- acterize travel time has been studied since the 1970s (14). The current work uses the central limit theorem and assumes a normal distri- bution for the estimates of the mean travel time from Equation 2, which is also the recommendation in the Travel Time Data Collection Handbook (15). Since – R in Equation 1 is a function of n, the required value of n is usually found recursively. Quiroga and Bullock recom- mend using Equation 3, which uses a Student’s t-distribution instead of a normal distribution to characterize the sample mean (16). Equa- tion 3 predicts that higher sample sizes will be needed for each given confidence interval. R n v v i i i n = - - - = ∑ 1 1 1 1 2 () n ZR d = α ε 2 2 () n tR d = α ε 2 3 () where v = vector of travel times, – R = ratio that approximates robust estimate of standard deviation, ε = error tolerance, n = sample size, Z = inverse normal distribution, t = Student’s t-distribution, and α = confidence interval. Probe Data Sampling Guidelines for Characterizing Arterial Travel Time Joseph M. Ernst, Christopher M. Day, James V. Krogmeier, and Darcy M. Bullock