c 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The paper is accepted for publication in IEEE/ACM Transactions on Networking, doi: 10.1109/TNET.2011.2175246. Caching for BitTorrent-like P2P Systems: A Simple Fluid Model and its Implications Frank Lehrieder ∗ , György Dán ‡ , Tobias Hoßfeld ∗ , Simon Oechsner ∗ , Vlad Singeorzan ∗ ∗ University of Würzburg, Institute of Computer Science, Würzburg, Germany ‡ KTH Royal Institute of Technology, School of Electrical Engineering, ACCESS Linnaeus Centre, Stockholm, Sweden Abstract—Peer-to-peer file-sharing systems are responsible for a significant share of the traffic between Internet service providers (ISPs) in the Internet. In order to decrease their peer- to-peer related transit traffic costs, many ISPs have deployed caches for peer-to-peer traffic in recent years. We consider how the different types of peer-to-peer caches – caches already available on the market and caches expected to become available in the future – can possibly affect the amount of inter-ISP traffic. We develop a fluid model that captures the effects of the caches on the system dynamics of peer-to-peer networks, and show that caches can have adverse effects on the system dynamics depending on the system parameters. We combine the fluid model with a simple model of inter-ISP traffic and show that the impact of caches cannot be accurately assessed without considering the effects of the caches on the system dynamics. We identify scenarios when caching actually leads to increased transit traffic. Motivated by our findings, we propose a proximity- aware peer selection mechanism that avoids the increase of the transit traffic and improves the cache efficiency. We support the analytical results by extensive simulations and experiments with real BitTorrent clients. Keywords-Peer-to-peer, caching, fluid model I. I NTRODUCTION Peer-to-peer (P2P) file-sharing systems are one of the major sources of Internet traffic. They generate an estimated 40 to 70% of the total traffic depending on geographic location [1], and are expected to remain a significant source of traffic in the future [2]. For the users P2P file-sharing systems provide access to a large variety of content, and for content providers they provide a means to distribute data to a large population of users without the need for big investments in server and network resources. The costs of the content distribution are shared among the end users and their Internet service providers (ISPs). The protocols of the most popular P2P file-sharing systems were not designed to be aware of the network topology, and consequently P2P applications generate a large amount of inter-ISP traffic. Increased inter-ISP traffic is a potential source of revenues for ISPs at the top of the ISP hierarchy (called tier-1 ISPs). Their main concern is to keep the traffic to their peering tier-1 ISPs balanced. Nevertheless, for ISPs in the lower levels of This work was partially funded by the EU FP7 Network of Excellence Euro-NF through the Specific Joint Research Project “ISPeer” and by the Federal Ministry of Education and Research of the Federal Republic of Germany (Förderkennzeichen 01 BK 0800, GLab). The authors alone are responsible for the content of the paper. the ISP hierarchy (tier-2 and tier-3 ISPs), which are usually charged by their transit traffic providers, transit traffic is a source of costs, and hence is something to be kept low. The research community has been trying to address the issue of inter-ISP traffic caused by proximity-unaware protocols in two ways. First, by introducing proximity-awareness in the most popular file-sharing protocols, and by trying to understand its effects on the application performance [3], [4]. Second, by proposing localization services for P2P protocols that would make proximity-aware protocols more efficient from the ISPs’ point of view [5], [6]. While these approaches could yield a significant decrease of the inter-ISP traffic, there is no evidence yet of the widespread use of proximity- awareness in deployed systems. ISPs have been addressing the issue of increased transit traffic by deploying commercially available caches for P2P traffic [7], [8]. P2P caches decrease the transit traffic by storing popular contents locally in the ISP so that they do not have to be downloaded from remote peers [9]. The caches provided by the different vendors, e.g., PeerApp’s UltraBand and OverSi’s OverCache P2P, follow fundamentally different design principles, yet all of them promise substantial savings in terms of inter-ISP traffic. The question we address in this paper is how one can assess the efficiency of P2P caches that follow different design principles in terms of decreasing the inter-ISP traffic, without actually deploying them. In order to answer this question we develop a fluid model of the system dynamics of BitTorrent- like file-sharing systems that incorporates the effects of P2P caches. We consider the case of a single and of multiple classes of peers, and provide a closed-form solution for the equilibrium system state as a function of the cache capacities installed at the different ISPs. We show that under certain conditions a system with two classes of peers is sufficient to model multiple classes of peers. We develop a simple model of inter-ISP traffic, and use the model to illustrate that one cannot accurately assess the impact of caches on the amount of inter-ISP traffic without considering the effects of the caches on the peer dynamics. We also show that, contrary to intuition, caches can under certain conditions increase the amount of outgoing transit traffic of an ISP. To avoid this phenomenon, we propose a proximity-aware peer selection scheme and evaluate its impact on the cache efficiency. We validate the analytical results via extensive simulations and provide experimental results with real BitTorrent clients to