Network Positioning from the Edge An empirical study of the effectiveness of network positioning in P2P systems David R. Choffnes, Mario A. S´ anchez and Fabi´ an E. Bustamante EECS, Northwestern University {drchoffnes,msanchez,fabianb}@eecs.northwestern.edu Abstract—Network positioning systems provide an important service to large-scale P2P systems, potentially enabling clients to achieve higher performance, reduce cross-ISP traffic and improve the robustness of the system to failures. Because traces representative of this environment are generally unavailable, and there is no platform suited for experimentation at the appropriate scale, network positioning systems have been commonly imple- mented and evaluated in simulation and on research testbeds. The performance of network positioning remains an open question for large deployments at the edges of the network. This paper evaluates how four key classes of network po- sitioning systems fare when deployed at scale and measured in P2P systems where they are used. Using 2 billion network measurements gathered from more than 43,000 IP addresses probing over 8 million other IPs worldwide, we show that network positioning exhibits noticeably worse performance than previously reported in studies conducted on research testbeds. To explain this result, we identify several key properties of this environment that call into question fundamental assumptions driving network positioning research. I. I NTRODUCTION Network positioning systems have been proposed as a scalable way to determine the relative location of hosts in the network, measured in terms of latency or available band- width [6]. Network positioning information has been used in a growing number of large-scale P2P systems [1], [3], [8], [9] that run on hosts located at the edges of the network (e.g., on desktops or appliances behind NAT boxes on residential links). Because traces representative of this environment are generally unavailable, and there is no platform suited for experimentation at the appropriate scale, the corresponding performance of network positioning remains an open question. This paper evaluates how four key classes of network positioning systems fare when deployed and measured at the scale of real, popular P2P systems. For this study, we gathered a large, representative dataset based on information reported by hosts participating in the Vuze BitTorrent system [22] through an extension to this client, currently installed by hundreds of thousands of peers. The Vuze BitTorrent client provides operational deploy- ments of Vivaldi [5], Vivaldi version 2 (Pyxida) [11] and CRP [19], in addition to a rich interface for accessing peers’ positioning information. We sample Vivaldi network coor- dinates and CRP network positions, and perform network measurements to evaluate their accuracy. We additionally use the latency measurements between hosts to understand Meridian [24] and GNP [13] performance in this environment. Finally, we collect traceroute measurements between BitTor- rent peers for diagnosing network positioning performance. This paper makes the following contributions. First, we find that the accuracy of the network coordinate systems is significantly worse when used at the edge of the network than when evaluated from the perspective of a research testbed. Second, we show that this inaccuracy leads to significant loss in performance in the case of low-latency distributed hash tables (DHTs), which use network coordinates to guide neighbor selection. Third, we explore the root causes of errors in network positioning in the P2P environment at an Internet scale, based on latency and topology measurements. To facilitate new research in network positioning, we will make our anonymized dataset publicly available. This data consists of approximately 2 billion latency samples, 30 million traceroute measurements and hundreds of millions of network positions gathered during a two-week period. The remainder of the paper is organized as follows. In the next section, we describe the four classes of network positioning approaches that we evaluate in this study. Sec. III provides details on our dataset and how we use it to evaluate positioning performance. We analyze the accuracy of network positioning and its impact on performance in Sec. IV, then explore sources of their errors in Sec. V. II. BACKGROUND There is a rich body of work that addresses the design and implementation of network positioning systems [5], [12], [13], [19], [20]. In this section, we describe four classes of network positioning systems that we cover in this study. Landmark-based systems estimate network distances to par- ticipating hosts by embedding their network locations in a multi-dimensional Euclidean space based on the hosts’ dis- tances to a set of landmarks. The Global Network Positioning (GNP) system [13] provides efficient implementation of this approach. Landmark-free systems, in contrast, fully decentral- ize the computation of network locations encoded in a low- dimensional coordinate space [5], [18]. Among these systems, the Vivaldi network positioning system [5] is the most widely deployed. Despite the success of these systems, recent studies have called into question the usefulness of network coordinates [25]. For example, Wong et al. [24] note that embedding errors from network coordinates always leads to suboptimal peer selection and instead propose Meridian, a structured approach to direct measurement.