Abstract—It has been recognised that the performance shown by peer-to-peer based multimedia systems is related to their network capabilities due to the frequent necessity to transfer large items of content between end points. One major dependency that network performance suffers from is the distance that packets are transmitted; nearby hosts provide higher bandwidth and lower latency whilst distant hosts suffer from lower bandwidth, greater packet loss and high latency delivery. This paper outlines a peer- to-peer CDN that groups nodes into topologically aware local clusters forming an unstructured network of nodes with more distant hosts being represented by a larger number of overlay hops. This overlay forms part of a larger content distribution infrastructure which utilizes the local awareness of the clustering to efficiently provide content for other nodes. Index Terms—Clustering, Locality, Overlays, Peer-to-Peer I. INTRODUCTION Peer-to-peer systems represent an important technology in the world today, constantly growing in both their uses and capabilities. However one major limitation that many existing networks suffer from is their lack of locality awareness. This issue prevents nodes from differentiating between computers at opposite ends of the globe, preventing optimal choices being made about communications. Many mechanisms [1][3][7] have attempted to rectify this problem. However they have seen little deployment and often are limited in their accuracy, basing their algorithms on RTT measurements. This paper outlines a topology aware clustering mechanism in which an overlay is constructed that forms nodes into topologically close groups. These clusters will be based on topological information gained from the network, ideally attempting to cluster nodes into groups sharing a physical network. Such an overlay could have a number of uses; one example would be to layer a Gnutella[4] network over it, in order to bring locality awareness to its unstructured search. The overlay construction is based on a two stage process; the first step approximates a node’s position in the overlay using a fully distributed global coordinate system such as Vivaldi[2]; the second step uses local traceroute operations to map out the local topology, providing it with the necessary information to locate its closest neighbours. Once this procedure has been completed an adaptive joining mechanism is employed to separate the clusters out into local groupings of nodes, forcing more distant members from the cluster in favour of closer ones. The construction of this overlay will then form the nucleus for more sophisticated content distribution services such as replication, caching and content location. The locality overlay will therefore provide a first layer platform for more sophisticated technologies to be developed without the need for explicit local awareness. The rest of the paper is structured as follows; in Section II the context of the work will be outlined, describing the application for which the system is designed for. In Section III the design will be described, providing details of how the overlay is structured and built. After this, in Section VI, the approach will be evaluated based a set of criteria. In Section V related work will be provided, outlining various other technologies in the field, then finally in Section VI the accuracy and efficiency of the system will be concluded providing an insight into its limitations and its possible superiorities over alternative approaches, before outlining the future work that will need undertaking. II. APPLICATION CONTEXT It has been recognised that the rigidity of many existing content distribution networks (CDNs) severely limits their optimality. The most notable of these problems is the physical location of content; accessing distant content results in greater delay, packet loss and overhead, effecting distribution when in consideration of services such as video streaming To address these issues the Gorwen architecture is proposed for which the locality mechanism described in this paper has been designed for. The architecture is designed to provide an end-to-end search, caching and delivery service built around the concept of encapsulating delivery paradigms in pluggable components. These components can be added or removed dynamically to better reflect the requirements of the CDN. During their life cycle these components will interact with the Gorwen architecture to assist it in making decisions on such things as replication and caching. Each delivery component encapsulates an individual delivery mechanism and is passed onto a peer when a content transfer is requested. This allows intelligent delivery choices to A Topology Aware Clustering Mechanism Tyson, G and Mauthe, A, Computing Department Lancaster University (g.tyson@comp.lancs.ac.uk, andreas@comp.lancs.ac.uk) ISBN: 1-9025-6016-7 © 2007 PGNet