International Journal of Computer Applications (0975 – 8887) Volume 62– No.12, January 2013 27 Tree-based Indexing for DHT-based P2P Systems Yi Yi Mar University of Computer Studies, Yangon, Myanmar Aung Htein Maw University of Computer Studies, Yangon, Myanmar Khine Moe Nwe University of Computer Studies, Yangon, Myanmar ABSTRACT Nowadays, DHT-based P2P technology is used as a basis in many wide spread applications because of its scalability, robustness, and load balance. Many applications, including file sharing, communication and live video streaming are in a large distributed network environment. For an efficient and effective search in large data repositories, complex query processing becomes a major issue for DHT. Towards the goal of supporting complex queries in DHT-based P2P systems, this paper focuses on the usage of k-dimensional tree to build a tree-based index. The proposed index is built without modifying the structure of the overlay network. In this paper, the load balancing among peers is also considered according to the usage of kd-tree. Therefore the performance of kd-tree is studied and show that how it can affect the proposed index over P2P network. In this paper, PlanetSim simulator is used to implement the proposed index and evaluate the performance of the index by using various metrics. Keywords Indexing over DHT, DHT-based indexing system, Query processing over DHT, Indexing in structured P2P systems 1. INTRODUCTION P2P system is a distributed system which facilitates the direct exchange of information and services between individual peers rather than relying on a centralized server. P2P forms the basis of many distributed computer systems, permitting each peer node to act as both a client and a server, consuming services from other available peers, whilst providing its own service to the rest of the network. P2P offers many advantages. These include scalability, high resource availability, no need for a centralized authority, and robustness. Peer-to-peer (P2P) technology is an increasingly popular vehicle for highly fault-tolerant, lightweight, and low- cost distributed computing environments. Cloud computing [1], digital music exchange [2], digital libraries [3], communication (e.g., Skype [4]), live video streaming (e.g., Zattoo [5]), secure data management systems [6], and bioinformatics databases [7] are just a few of the venues where this technology is being used today. Since the late 1990, there have been P2P systems for organizing distributed systems in a way that there is no need for a global authority required to run the system. When the networks grew larger, Gnutella [8], one of the early P2P systems became inefficient in query processing because of message flooding through network. As a result, starting in 2001, distributed hash tables (DHT) such as Chord [9], CAN [10], Pastry [11], Tapestry [12], and Kademlia [13], were published. DHTs brought a great hype to peer-to-peer systems as they allow for large networks working autonomously using normal Internet connections. DHT is actually a distributed data structure for storing of key and values pairs. DHT allows fast locating of data and can support exact match lookup when a key is given. For example, a DHT-based P2P system can use the exact match query interface with the file name as the key to publish and lookup a file. With the increase in the number of computers connected to the Internet and the emergence of a range of mobile computational devices which are equipped with mobile IP technology, the Internet is converging to a more dynamic, huge, extremely heterogeneous network. For the purpose of information dissemination and file sharing, P2P data management technology is being used in this large distributed environment. As a consequence, complex query processing may be a challenge for DHT-based P2P systems. For the purpose of supporting complex query processing over DHTs, a multi-dimensional index is needed to be built. In this paper, k-dimensional tree or kd-tree to build an indexing scheme over structure P2P systems. The proposed tree-based index needs to consider two facts. The first is to keep resource sharing among peers balanced and the other is to be able to support the complex query (multi-dimensional and/or range query) in DHT-based P2P systems. The rest of the paper is organized as follows. In Section 2, the existing indexing approaches over DHT-based P2P systems are described. And then Section 3 describes the architecture of proposed indexing scheme. The over view of kd-tree is discussed in Section 4. In Section 5, the proposed tree-based index is mentioned and the required steps to build this indexing scheme are described. In this Section, the process flow of of complex query by using the proposed index is also described. Then the performance of the proposed index is evaluated in Section 6. And then the summarization about the proposed system is mentioned in Section 7. 2. RELATED WORK Complex query such as range query or multi-dimensional query are needed in many distributed applications, including content distribution, locality aware services, and resource discovery services such as file sharing applications. More and more applications require P2P systems to support complex queries over multi-dimensional data. For example, a P2P auction network [14] for real estate frequently needs to answer queries such as ‘select five available buildings closest to the airport’. In any DHT-based P2P system, resources or data are distributed among peers by using keys of data. In a real DHT system, key of data may be keyword or one attribute of data value. For example, in a file sharing application, file name or keyword of file is defined as key of file. These keys are one- dimensional. Therefore DHT is very efficient in keyword query or exact match query. For the purpose of processing complex query over DHT, keys of data must be multi- dimensional There are many approaches where data structures are fused with DHT to support complex queries [15]. Prefix hash tree [PHT] [16] is a distributed data structure that enables more sophisticated query over DHT. For efficiently processing one- dimensional query over DHT, it implemented trie-based distributed data structure. In range query processing, PHT