Scalable and Accurate Identification of AS-Level Forwarding Paths Z. Morley Mao David Johnson Jennifer Rexford Jia Wang Randy Katz University of Michigan AT&T Labs–Research UC Berkeley zmao@eecs.umich.edu {dsj,jrex,jiawang}@research.att.com randy@cs.berkeley.edu Abstract— Traceroute is used heavily by network operators and researchers to identify the IP forwarding path from a source to a destination. In practice, knowing the Autonomous System (AS) associated with each hop in the path is also quite valuable. In previous work we showed that the IP-to-AS mapping extracted from BGP routing tables is not sufficient for determining the AS- level forwarding paths [1]. By comparing BGP and traceroute AS paths from multiple vantage points, [1] proposed heuristics that identify the root causes of the mismatches and fix the inaccurate IP-to-AS mappings. These heuristics, though effective, are labor- intensive and mostly ad hoc. This paper proposes a systematic way to construct accurate IP-to-AS mappings using dynamic programming and iterative improvement. Our algorithm reduces the initial mismatch ratio of 15% between BGP and traceroute AS paths to 5% while changing only 2.9% of the assignments in the initial IP-to-AS mappings. This is in contrast to the results of [1], where 10% of the assignments were modified and the mismatch ratio was only reduced to 9%. We show that our algorithm is robust and can yield near-optimal results even when the initial mapping is corrupted or when the number of probing sources or destinations is reduced. Our work is a key step towards building a scalable and accurate AS-level traceroute tool. I. I NTRODUCTION Traceroute is widely used to detect routing problems, char- acterize end-to-end paths, and discover the Internet topology. Traceroute sends a series of TTL-limited probes toward a target destination, and reports the interfaces on the forwarding path and the round-trip time for each hop. In Figure 1, the first column shows the output of the traceroute to CNN’s web site. This is invaluable to network operators and researchers. For example, network operators use traceroute to identify forwarding loops, blackholes, routing changes, unexpected paths through the Internet, and the end-to-end latency. Upon detecting a routing or performance anomaly, operators need to identify the Autonomous System (AS) responsible for the problem. The second and third columns of the Figure 1 denote the AS information of the routers along the forwarding path. Inaccurate information about the ASes along the path leads to delays in identifying and correcting the problem. In addition, research studies based on AS paths or graphs derived from traceroute depend on having an effective way to map the traceroute data to an AS-level forwarding path. However, determining the AS-level forwarding path is an inherently difficult problem, due to the operational reali- ties of today’s Internet. Conventional approaches have many This work was conducted while Morley was doing her internship at AT&T Labs–Research. 1 169.229.62.1 2 169.229.59.225 3 128.32.255.169 4 128.32.0.249 5 128.32.0.66 6 209.247.159.109 7 64.159.2.65 8 64.159.1.46 9 209.247.9.170 10 66.185.138.33 12 66.185.136.17 11 66.185.147.208 13 64.236.16.52 AS25 AS11423 AS3356 AS3356 AS3356 AS3356 AS1668 AS1668 AS1668 AS5662 Level3 GNN CNN AS25 AS25 AS25 Calren Berkeley Fig. 1. Example traceroute output with AS information to www.cnn.com. limitations. First, the AS path advertised via BGP (Border Gateway Protocol) could be used as an estimate of the AS- level forwarding path. However, the AS path traversed by BGP update messages may differ from the forwarding path due to route aggregation and routing anomalies such as deflections. Network operators want to know when these kinds of dif- ferences occur in practice. Second, each IP-level hop in the traceroute path could be mapped to an AS number by using an Internet routing registry (e.g., “NANOG traceroute” [2] and prtraceroute [3]). However, the registries are often out-of-date or incomplete. A third alternative is to use the origin AS—the AS that initially announced the prefix—extracted from BGP routing tables. Though this information is more accurate and complete, the approach also has limitations such as multiple origin AS’s (MOAS’s [4]), route aggregation, and unannounced address blocks. For instance, for the fifth hop of the traceroute example in Figure 1, both the whois address registry and the BGP table return AS25 as the owner AS. However, we will show later that this hop is an exchange point actually belonging to AS11423. Based on extensive measurements, previous work [1] dis- covered that a large fraction (around 15%) of the traceroute paths did not match the corresponding BGP paths. They found that most discrepancies between the BGP and traceroute AS paths stemmed from inaccuracies in the IP-to-AS mapping applied to the traceroute data. They proposed heuristics to identify the root causes of the mismatches and fix inaccurate IP-to-AS mappings, based on the comparison of a large collec- tion of BGP and traceroute paths from multiple vantage points. 0-7803-8356-7/04/$20.00 (C) 2004 IEEE IEEE INFOCOM 2004