An Optimised Geographically-Aware Overlay Network Hanh Le, Doan Hoang and Andrew Simmonds Advanced Research in Networking Laboratory, University of Technology, Sydney {hanhlh, dhoang, simmonds}@it.uts.edu.au Abstract: The mismatch between current Peer-to-Peer (P2P) overlay structures and the underlying network topology causes high end-to-end latency and inefficient network resource usage. This paper presents a self- organising overlay network that closely reflects the underlying network topology by using the basic idea of a node locating scheme called Geographical Longest Prefix Matching (Geo-LPM) [1]. Geo-LPM combines IP prefixes and a network metric measurement to cluster nodes efficiently. We optimise Geo-LPM to adapt to different geographical locations so that nodes in the same clusters often belong to the same physical network. We propose to implement Geo-LPM in a distributed fashion. As a result, the overlays utilise the underlying network resources more efficiently and reduce the delay from end-to-end. The system is self-organising, distributed, and decentralised with low overhead. Keywords: Peer-to-Peer, Overlay Network, Underlying Network Awareness I. INTRODUCTION Peer-to-peer (P2P) overlay networks have emerged as highly attractive distributed systems that make use of untapped resources such as: storage, idle processor cycles, and services available at ordinary Internet-connected devices, and thereby providing many useful P2P applications such as distributed file systems [2-5], application-layer multicast [6-9], and event notification services [10, 11], etc. However initially P2P overlay networks are normally independent from the underlying network topology (i.e., the Internet). Peers communicate with each other regardless of their position or the distance to the other peer [12, 13]. This results in high end-to-end delay for P2P applications and poor utilisation of the underlying network resources. It could increase Internet access costs unnecessarily and make the system unscalable [14, 15]. Some systems [16, 17] build overlay networks that exploit locality by proximity neighbour selection (PNS) or by proximity route selection (PRS) [18] to reduce the end- to-end latency. However PRS and PNS can not solve the mismatch problem between overlay and underlying network topologies completely because the routing decisions are based on the logical ID relationship. Efforts have been made to construct overlay networks that are more aware of the underlying network infrastructure [19-21] using a small number of well-known nodes, called “landmarks”. The landmarks are used to partition the underlying network into areas. The location of an ordinary node is determined by latency measurement values from the node to the set of the landmarks. Ratnasamy et al. [19] proposed a binning scheme based on host-landmark distances. Nodes partition themselves into bins such that nodes falling within a given bin are relatively close to one another in terms of network latency and further away from nodes not in their bin. This technique is simple, distributed and generates low overhead. However, in large- scale systems hotspots can occur at the landmarks and the system is vulnerable to the availability of the landmarks. An efficient Geographical Longest Prefix Matching scheme[1], called Geo-LPM, has been proposed to locate nodes into clusters. Geo-LPM combines IP prefixes and network proximity (latency threshold T) to cluster nodes that are close to each other. Geo-LPM can cluster nodes precisely, which means that nodes in the same cluster are often in the same physical networks and produces low overhead. Geo-LPM is decentralised, self-organising and does not require external information sources or landmark setup. This scheme is efficient, however, it is sensitive to the setup of the threshold T parameter. In this paper, we present a scheme for constructing topologically-aware, self-organising overlay networks based on the basic Geo-LPM [1] with three new contributions. First, we optimise the setup of the network proximity threshold so that Geo-LPM becomes location- adaptive. The threshold is adjusted according to geographical locations to maximise the effectiveness of the overlay scheme. Second, an optimised Geo-LPM routing structure is proposed to avoid overloading at some nodes (e.g. the root of the IP prefix tree). As a result, the overlays are fully distributed and decentralised. Third, the overlay performance is evaluated and presented in terms of the relative delay penalty (RDP) and the number of overlay hops that span different networks in routing a message via the overlay. The rest of the paper is organised as follows. Section 2 briefly presents the idea of Geo-LPM. Section 3 is our proposed overlay network with the threshold optimisation. Performance evaluation is presented in Section 4. Related work is covered in Section 5 and Conclusions are in Section 6. II. BACKGROUND The Geographical Longest Prefix Matching scheme (Geo-LPM) [1] clusters nodes that are close to each other in terms of network membership and proximity. In Geo-LPM, each cluster has a node that acts as the routing node for the cluster, termed “o-router”. Any node can become an o- router and normally it is the first node that establishes the cluster. After other nodes join the cluster, it is preferable to select a node that remains online for long periods and has a