Tabu-Search Optimization of Data Distribution in P2P Networks M. Anis uddin Nasir, Antonio L. Severien, and Emmanouil Dimogerontakis Faculty of Informatics, UPC Barcelona Abstract- P2P systems has become a reliable and efficient solution for distributing content across the Internet. It contributes more than 30% of the internet traffic. P2P systems are build on top of application layer without the awareness of low level network parameters. Random selection of peers in P2P systems generates lot of redundant traffic and result in network congestion. Therefore there is a need to optimize the distribution of content in P2P systems. In our work we evaluate the proposal of an offline model for data distribution over the P2P system. We implemented the algorithm in CPLEX to get the optimal solution and compared the results and performance with Tabu Search heuristic algorithm. Results show that the heuristic algorithm can be used to calculate a good approximation of the correct solution in less time as compared to CPLEX. Index terms- peer-to-peer, tabu search, heuristic algo- rithm, data distribution, optimization problems II NTRODUCTION Peer-to-peer (P2P) systems have become very popular in the last years, especially for file sharing, but also for many other applications such as multimedia streaming or VoIP. In fact, in the past between the 50% and 85% of the overall internet traffic is from these kind of applications [1]. Now this percentage has declined to 30% but still remains an very important portion of total Internet traffic. P2P systems usually implement an abstract overlay network in the Application layer and all the participant comput- ers (i.e. nodes) can act as clients and servers at the same time, thus eliminating the classical centralized client-server paradigm. In pure P2P systems, all the nodes have the same role and capabilities, however some applications may use supernodes (hybrid P2P) or a central server for indexing functions (centralized P2P). On the other hand, P2P systems can be categorized depending on their architecture into two main groups: structured and unstructured. Structured P2P make use of a globally consistent protocol to ensure that every node in the network can efficiently route a search to a peer that has a desired file. These kind of P2P systems often make use of Distributed Hash Tables (DHTs) in order to index the shared files. An example of structured P2P system is Chord [2]. On the other hand, unstructured P2P networks establish the links between nodes in an arbi- trary way, making easier for new nodes to join the network but forcing the peers to broadcast a query to the whole net- work in order to find a desired file, with no guarantee of success. An example of unstructured P2P is Gnutella. Given the significant percentage of exising P2P traffic we have to consider using optimizations techniques to make more efficient data dissemination. Trying to minimize the effect on the system performance we choose to elaborate on offline data flow optimization. Though performance aware, offline modeling of flows can result to inaccurate models as it has to deal with the dynamic and stochastic nature of P2P systems. As a result, we have to build complex models that fit into specific cases and cannot provide a generic model. The complexity of the models is usually forbiding us to use linear programming solvers, forcing us to adopt heuristics. In this paper we are going to provide a CPLEX implemen- tation for the formulated problem and compare its perfor- mance with that of the corresponding Tabu Search heuris- tics. The rest of this paper is organized the following way. In the next section (§II) we present the Integer Linear Program (ILP) as formulated in [3]. In section III we describe the Tabu Search algorithm that was used as an heuristic approach. In section IV we present our experimental results, concerning a CPLEX-based implentation and propietary implemention of the Tabu Search algorithm for the problem. Finally, in section V we discuss about our conclusions and possible topics for future research. II P ROBLEM F ORMULATION Approach and assumptions In this section we are going to present the ILP as described by Skowron and Walkowiak in [3]. The authors claim that their model is generic enough to represent different P2P ar- chitectures and implementations, decentralized or central- ized, structured or unstructured and so on. It is also worth mentioning that the ILP given in [3] is a simple version that covers basic dissemination in a P2P systems. This ap- proach was adopted because the goal is to keep as simple and generic as possible the ILP solution in order to compar it in terms of efficiency with an heuristic solution. Mod- elling of dissemination over P2P is studied more in-depth in other works of Walkowiak like [4, 5]. We will now present some assupmtions made by the authors to facilitate the modelling: To face the stochastic nature of P2P systems the time scale is divided in time slots of the same length, each time slot corresponds to an iteration. 1