Toward a Global File Popularity Estimation in Unstructured P2P Networks Manel Seddiki * , Mahfoud Benchaiba * , University of Sciences and Technology Houari Boumediene Computer Science Department, LSI laboratory Algiers, Algeria * e-mail: sed.manel@gmail.com e-mail: benchaiba@lsi-usthb.dz Abstract—In unstructured P2P networks, replicating most popular files is one of mechanisms, which improve file lookup performances, such as lookup delay and success rate. However, measuring global file popularity is a challenging task because this estimation must consider requests of all peers for this file whereas in unstructured P2P networks like Gnutella, the peer has no global view of the network. Some researches have been done to measure this parameter. Nevertheless, this estimation is still away from reality because the peer, which calculates file popularity, doesn’t consider file popularity estimations of the other peers. In this paper, we try to define a way to calculate a global file popularity based on local estimation of the peer and estimations done by the other peers participating in the network. Our first simulation results reinforce our theoretical formulas and show that our measurement is closer to the real one. More details will be provided and simulation tests will be added in our future contributions. KeywordsUnstructured P2P networks, global file popularity, file lookup,request packets, replication. I. I NTRODUCTION Peer-to-peer (or P2P) networks came to replace client/server systems and were developed over Internet in recent years. The basic idea of P2P is to link users in order to exchange information without using any intermediate server. Thus, P2P network is a distributed system of interconnected peers, which are both clients and servers. The P2P paradigm was firstly used for file-sharing applications such as Napster [1] and Gnutella [3], which allow users to lookup, share and download files. Napster uses a server which indexes all the information about peers and their files. If a peer wants to lookup for a file, it sends a request to the server, which connects it directly with peers storing this file. The server facilitates the lookup procedure and improves the lookup latency, but it is the weakness of the system because if it breaks down, the whole system stops. Gnutella came after Napster and erased centralization idea. Indeed, Gnutella works on an unstructured P2P network architecture, where there is no server and each peer must know the other peers participating in the P2P network and their shared content by itself. A peer wishing to lookup for a shared content, such as a file, broadcasts its request to all its neighbors, which do the same with their neighbors until the file is found or the Time To Life (TTL) expires. This technique is denoted as flooding [3]. However, the flooding main drawback is the high overhead that causes a scalability issue. Many alternatives to flooding have been proposed to make file lookup technique more efficient, such as using probability based on previous lookup results ([4] and [5]), using progressive TTL called Expending Ring such as [6] or using Random walk technique such as [7]. Another way to improve file lookup performances in P2P unstructured networks is replication, as presented in [8], [9], [10], and [11], which consists in the replication of most popular files in other peers to ensure their availability, increase lookup success rate and decrease lookup hops and delay. Performances of these replication strategies depend on the popularity param- eter precision. Indeed, the closer is the popularity estimation from reality, the better is the replication strategy performance. As a consequence and for our point of view, the file popularity measurement in such replication strategies is then crucial to decide which files have to be replicated. However, most of these strategies don’t focus on this measurement and briefly define file popularity calculation based only on local estimation of the peer. This is maybe due to the fact that in P2P unstruc- tured architectures, the peer is blind and has no global view of the network and this makes global popularity estimation a challenging task. In this paper, we focus completely on this issue and try to define the file popularity notion and four evident criteria that the file popularity estimation must respect. After that, we propose a way to calculate the file popularity according to and respecting those creteria. This calculation is based both on local estimation of the peer and estimations done by the other peers participating in the network. Indeed, considering the estimations of the other peers allows having a global-like estimation of the popularity which is closer to the reality than the local estimation. This paper is organized as the following: In Section II, we introduce some interesting researches which calculate file popularity used in variety of contexts, such as content replica- tion strategies and file lookup enhancement. In Section III, we describe our approach in detaills. We begin first by describing the P2P network architecture and environment that we consider in our approach then, we describe our file cache structures and define the popularity notion according to our point of view. After that, we explain our file popularity measurement and finally, we discuss some points. In Section IV, we introduce simulation environnement, describe the different simulation tests and compare our estimated popularity with the real popularity. In the end of this paper, we give a brief summary of this paper’s content and next contributions to finalize our 77 Copyright (c) IARIA, 2013. ISBN: 978-1-61208-305-6 ICSNC 2013 : The Eighth International Conference on Systems and Networks Communications