Exploiting Internet Delay Space Properties for Selecting Distinct Network Locations Bo Zhang and T. S. Eugene Ng Department of Computer Science Rice University Abstract—Recent studies have discovered that the Internet delay space has many interesting properties such as triangle inequality violations (TIV), clustering structures and constrained growth. Understanding these properties has so far benefited the design of network models and network-performance-aware systems. In this paper, we consider an interesting, previously unexplored connection between Internet delay space properties and network locations. We show that this connection can be exploited to select nodes from distinct network locations for applications such as replica placement in overlay networks even when an adversary is trying to mis-guide the selection process. I. I NTRODUCTION Recent studies [32] [14] [29] [17] have identified many interesting properties of the Internet delay space 1 , such as triangle inequality violations (TIV), clustering structures and constrained growth. With the increased understanding of Internet delay space properties, researchers have started applying them to solve some practical problems. For examples, [32] proposes a network delay model that takes the delay space properties into account, [29] improves the performance of two neighbor selection systems by making them TIV-aware, and [18] proposes a routing overlay that exploits TIV to select the best peerings. In this paper, we show that the Internet delay space properties can also be leveraged to select nodes from distinct network locations even when an adversary is trying to mis-guide the selection process, so that those applications needing nodes from distinct network locations can be benefited. A. The Need For Distinct Network Locations In a decentralized distributed system such as peer-to-peer (P2P) systems, when a node needs to request service from a set of other nodes, it often prefers a set of nodes from distinct network locations because more network location diversity can generally increase the system’s availability and make the system more resilient to locality specific network failures. Therefore, the ability to select multiple nodes from distinct network locations is an important primitive for many systems. Some specific examples where the network location diver- sity is preferred are as follows: Overlay routing: in an overlay This research was sponsored by the NSF under CAREER Award CNS- 0448546 and grant CNS-0721990. Views and conclusions contained in this document are those of the authors and should not be interpreted as repre- senting the official policies, either expressed or implied, of NSF or the U.S. government. 1 In this paper, “delay” means round-trip delay. network (e.g., [26], [28], [24], [19], [4]), each node needs to use a number of other nodes as its overlay routing neigh- bors. Depending on specific overlay networks, the number of needed routing neighbors can range from tens to hundreds. Choosing neighbors in distinct network locations will increase route diversity and improve the overlay’s robustness against network failures. Proactive object replication: The benefits of proactive object replication in structured overlays in reducing overlay lookup hops and latency have been exploited by Beehive [23]. When replicating an object on a set of overlay nodes, it is better that those nodes have diverse network loca- tions for the sake of both better performance (e.g., minimize the average query latency) and better reliability (e.g., one replica getting disconnected will not affect the whole system). Detecting Sybil identities from same network location: P2P systems use logical identities to distinguish peers, so P2P systems are particularly vulnerable to Sybil attacks [10], where a malicious node assumes multiple identities, which are called Sybil identities. In fact, such Sybil attacks have already been observed in the real world in the Maze P2P file sharing system [15][31]. The implementers of the Maze system instrumented the Maze client so they can obtain and examine the complete user logs of the entire system. By analyzing these logs, they found that most colluding Sybil identities use nearby machines from the same network location. They hypothesize that creating Sybil identities in the same location makes it easier to control them and leverage the network proximity to maximize the throughput and the gain from collusion. Researchers [27] have also demonstrated that it is surprisingly easy to launch Sybil attacks from a single network location in the widely-used eMule system [1]. In their experiments, they created up to 64K distinct Sybil identities (i.e., 64K KAD IDs) on one physical machine, then they were able to spy on the whole system’s traffic and launch DDoS attacks on any content. The ability to select identities from distinct network locations would reduce a system’s susceptibility to Sybil attacks. B. Challenges of Selecting Distinct Network Locations The first strawman solution for selecting nodes or identi- ties (we use these two words interchangeably) from distinct network locations is to select nodes with distinct IP addresses (e.g., [11]). However, a malicious node may steal multiple IP addresses from the local network. What is worse, it may hijack a large number of IP addresses from diverse network