P2P File Sharing Analysis for a Better Performance Martha-Rocio Ceballos Doctoral student Dept. of Telematics Polytechnic University of Catalonia +34 93 401 59 94 ceballos@entel.upc.es Juan-Luis Gorricho Associate professor Dept. of Telematics Polytechnic University of Catalonia +34 93 401 68 30 juanluis@entel.upc.es ABSTRACT The so-called second generation P2P file-sharing applications have with no doubt a better performance than the first implementations. The most remarkable difference is due to the file division into smaller pieces, where a receiving peer of any piece automatically becomes a new source to other peers. But a new question arises on how we distribute all the pieces provided by a seed peer to minimize the global and presumably individual download times. In this paper we summarize part of the work we have developed up until now to answer this general question, in particular, we will analyze how close the present second generation P2P file-sharing applications remain from an ideal solution with the theoretical best performance, that is, where all peers are interconnected with each other and all peers have an altruistic behavior always uploading its contents at any chance. Successive modifications of the ideal solution will lead us to more realistic scenarios. We will estimate the performance on each case and finally present the current studies we are carrying out to improve the overall capacity. Categories and Subject Descriptors H.4.0 [Information Systems Applications]: General General Terms Algorithms, Experimentation, Measurement, Performance. Keywords File sharing, video streaming, network measurements, peer to peer applications, service capacity, performance evaluation. 1. INTRODUCTION P2P file sharing applications have become very popular since the introduction of Napster, allowing users to share MP3 formatted music files, a few years ago. Independently of legal issues, the first P2P file sharing applications such as Napster, Gnutella and KaZaA were intended to satisfy the most relevant P2P properties: scalability, reliability and great efficiency on information delivery. Nevertheless, the free-riding phenomenon became an extended practice, peers downloading from other peers while not contributing to upload to others; and finally, in spite of the proclaimed P2P features, all these file sharing applications became the traditional client/server model, with only a few altruist peers as file servers and all the others as file requesters. To avoid this undesirable situation, recent P2P file sharing applications such as BitTorrent [1] and eMule [2] have defined a new scenario with all peers forced to upload part of their received data if they want to download the complete file. The requested file is divided into chunks, so any peer receiving a chunk may be forced to upload it to other peers. These new P2P file sharing applications propose different algorithms to incentivate the peer collaboration, solving the way all peers establish and temporally renew its connections with other peers interested on the same file. A practical principle suggests that I will upload to you if you also upload to me, a tit-for-tat assumption [3]. In a more general sense, putting aside the free-riding phenomenon, the transmission of any information in smaller pieces always increases the system capacity when more than one node is involved, as re-transmitter or intended receiver. This is the case, because we don’t wait for a completed file transmission to a particular peer, before this one begins the retransmission to the next peer. This feature is magnified if we deliver all chunks from the source peer, also called the seed peer, to the greatest variety of peers, this way we promote an increasing number of parallel transmissions among all peers. The P2P networks are classified as structured and non-structured networks depending on the methodology used to organize the information search. On a structured network the resource location information is stored in a predefined way with the aid of hash tables, while on non-structured networks the resource location registration becomes an ad hoc process leading to a subsequent search procedure usually based on a flooding or a random-walk mechanism; in this case, to increase the search success the non- structured networks are usually organized in logically interconnected peers and super-peers, storing the resource location information only in the super-peers. Independently of the P2P network type we can distinguish two phases: the resource search and the resource access or download, depending on the particular service. Considering a file sharing application the most relevant issue is the file download, not its location, due to the critical time delay we can experience if we have a high demand from the file requesters or a limited upload capacity from the file provider. In this respect it is crucial to achieve a reliable and scalable file delivery mechanism; this way we could even extend the file sharing application to implement Copyright is held by the author/owner(s). ICSE’06, May 20-28, 2006, Shanghai, China. ACM 1-59593-085-X/06/0005. 941