Toward a Global File Popularity Estimation in Unstructured P2P Networks Manel Seddiki * , Mahfoud Benchaiba ‡ * , ‡ University of Sciences and Technology Houari Boumediene Computer Science Department, LSI laboratory Algiers, Algeria * e-mail: sed.manel@gmail.com ‡ e-mail: benchaiba@lsi-usthb.dz Abstract—In unstructured P2P networks, replicating most popular ﬁles is one of mechanisms, which improve ﬁle lookup performances, such as lookup delay and success rate. However, measuring global ﬁle popularity is a challenging task because this estimation must consider requests of all peers for this ﬁle whereas in unstructured P2P networks like Gnutella, the peer has no global view of the network. Some researches have been done to measure this parameter. Nevertheless, this estimation is still away from reality because the peer, which calculates ﬁle popularity, doesn’t consider ﬁle popularity estimations of the other peers. In this paper, we try to deﬁne a way to calculate a global ﬁle popularity based on local estimation of the peer and estimations done by the other peers participating in the network. Our ﬁrst simulation results reinforce our theoretical formulas and show that our measurement is closer to the real one. More details will be provided and simulation tests will be added in our future contributions. Keywords—Unstructured P2P networks, global ﬁle popularity, ﬁle lookup,request packets, replication. I. I NTRODUCTION Peer-to-peer (or P2P) networks came to replace client/server systems and were developed over Internet in recent years. The basic idea of P2P is to link users in order to exchange information without using any intermediate server. Thus, P2P network is a distributed system of interconnected peers, which are both clients and servers. The P2P paradigm was ﬁrstly used for ﬁle-sharing applications such as Napster [1] and Gnutella [3], which allow users to lookup, share and download ﬁles. Napster uses a server which indexes all the information about peers and their ﬁles. If a peer wants to lookup for a ﬁle, it sends a request to the server, which connects it directly with peers storing this ﬁle. The server facilitates the lookup procedure and improves the lookup latency, but it is the weakness of the system because if it breaks down, the whole system stops. Gnutella came after Napster and erased centralization idea. Indeed, Gnutella works on an unstructured P2P network architecture, where there is no server and each peer must know the other peers participating in the P2P network and their shared content by itself. A peer wishing to lookup for a shared content, such as a ﬁle, broadcasts its request to all its neighbors, which do the same with their neighbors until the ﬁle is found or the Time To Life (TTL) expires. This technique is denoted as ﬂooding [3]. However, the ﬂooding main drawback is the high overhead that causes a scalability issue. Many alternatives to ﬂooding have been proposed to make ﬁle lookup technique more efﬁcient, such as using probability based on previous lookup results ([4] and [5]), using progressive TTL called Expending Ring such as [6] or using Random walk technique such as [7]. Another way to improve ﬁle lookup performances in P2P unstructured networks is replication, as presented in [8], [9], [10], and [11], which consists in the replication of most popular ﬁles in other peers to ensure their availability, increase lookup success rate and decrease lookup hops and delay. Performances of these replication strategies depend on the popularity param- eter precision. Indeed, the closer is the popularity estimation from reality, the better is the replication strategy performance. As a consequence and for our point of view, the ﬁle popularity measurement in such replication strategies is then crucial to decide which ﬁles have to be replicated. However, most of these strategies don’t focus on this measurement and brieﬂy deﬁne ﬁle popularity calculation based only on local estimation of the peer. This is maybe due to the fact that in P2P unstruc- tured architectures, the peer is blind and has no global view of the network and this makes global popularity estimation a challenging task. In this paper, we focus completely on this issue and try to deﬁne the ﬁle popularity notion and four evident criteria that the ﬁle popularity estimation must respect. After that, we propose a way to calculate the ﬁle popularity according to and respecting those creteria. This calculation is based both on local estimation of the peer and estimations done by the other peers participating in the network. Indeed, considering the estimations of the other peers allows having a global-like estimation of the popularity which is closer to the reality than the local estimation. This paper is organized as the following: In Section II, we introduce some interesting researches which calculate ﬁle popularity used in variety of contexts, such as content replica- tion strategies and ﬁle lookup enhancement. In Section III, we describe our approach in detaills. We begin ﬁrst by describing the P2P network architecture and environment that we consider in our approach then, we describe our ﬁle cache structures and deﬁne the popularity notion according to our point of view. After that, we explain our ﬁle popularity measurement and ﬁnally, we discuss some points. In Section IV, we introduce simulation environnement, describe the different simulation tests and compare our estimated popularity with the real popularity. In the end of this paper, we give a brief summary of this paper’s content and next contributions to ﬁnalize our 77 Copyright (c) IARIA, 2013. ISBN: 978-1-61208-305-6 ICSNC 2013 : The Eighth International Conference on Systems and Networks Communications