Tree Network Coding for Peer-to-Peer Networks Arne Vater Christian Schindelhauer ∗ Christian Ortolf Department of Computer Science University of Freiburg Georges-Köhler-Allee 51 Freiburg im Breisgau, Germany {vater, schindel, ortolf}@informatik.uni-freiburg.de ABSTRACT Partitioning is the dominant technique to transmit large files in peer-to-peer networks. A peer can redistribute each part immediately after its download. BitTorrent combines this approach with incentives for uploads and has thereby be- come the most successful peer-to-peer network. However, BitTorrent fails if files are unpopular and are distributed by irregularly participating peers. It is known that Net- work Coding always provides the optimal data distribution, referred as optimal performance. Yet, for encoding or de- coding a single code block the whole file must be read and users are not willing to read O(n 2 ) data blocks from hard disk for sending n message blocks. We call this the disk read/write complexity of an encoding. It is an open question whether fast network coding schemes exist. In this paper we present a solution for simple com- munication patterns. Here, in a round model each peer can send a limited amount of messages to other peers. We define the depth of this directed acyclic communication graph as the maximum path length (not counting the rounds). In our online model each peer knows the bandwidth of its communi- cation links for the current round, but neither the existence nor the weight of links in future rounds. In this paper we analyze BitTorrent, Network Coding, Tree Coding, and Tree Network Coding. We show that the average encoding and decoding complexity of Tree Coding is bounded by O(kn log 2 n) disk read/write-operations where k is the number of trees and n the number of data blocks. Tree Coding has perfect performance in communication networks of depth two with a disk read/write complexity of O(pnt log 3 n) where p is the number of peers, t is the number of rounds, and n is the number of data blocks. For arbitrary networks Tree Coding performs optimally using 2(δ + 1) t−1 p log 2 n trees which results in a read/write com- plexity of O((δ + 1) t−1 n log 3 n) for t rounds and in-degree δ. ∗ Partly supported by DFG research fund Schi 372/5-1. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SPAA’10, June 13–15, 2010, Thira, Santorini, Greece. Copyright 2010 ACM 978-1-4503-0079-7/10/06 ...$10.00. Categories and Subject Descriptors C.2.4 [Computer-Communication Networks]: Distrib- uted Systems — Distributed applications ; E.4 [Data]: Coding and Information Theory — Nonsecret encoding schemes ; F.2.2 [Theory of Computation]: Analysis of Algorithms and Problem Complexity — Nonnumerical Algorithms and Problems General Terms Algorithms, Performance Keywords Peer-to-Peer Networks, BitTorrent, Network Coding 1. INTRODUCTION The exchange of data without centralized infrastructure is the main motivation for the wide-spread use of peer-to-peer networks. From a user’s perspective the fast distribution of large files is the killer argument to choose peer-to-peer network software. For such a task the IP Multicast protocol seems to be the best solution, allowing routers to duplicate packets on their paths to their destination, thus relieving the bottleneck at the server [17, 6]. However, IP Multicast suffers from the absence of reliable delivery and the lack of support of most Internet service providers. Peer-to-Peer Networks. Peer-to-Peer Networks started in 1999 with Napster and Gnutella which swiftly became very successful although they were not very elaborated. In the following years, researcher focused on finding robust network structures and efficient lookup services, like CAN [13], Chord [16], Pastry [15], and Tapestry [8]. Later on, for some of these networks, effi- cient multicast extensions were proposed, e.g. Bayeux [19], CAN-Multicast [14], and Scribe [3], filling the gap of the unsupported multicast in the Internet network layer. In a multicast tree the leaf position is the most favorable one, since they do not upload any data to others. Usu- ally, the upload is the crucial bottleneck in peer-to-peer networks, since asymmetric connections designed for client- server networks provide larger download than upload ca- pacities (not mentioning the legal distinction between up- loaders and downloaders). A solution was presented with Splitstream [2], where files are partitioned into small blocks, such that a peer can start redistributing blocks immediately