Future Generation Computer Systems 25 (2009) 77–88 www.elsevier.com/locate/fgcs An efficient peer-to-peer indexing tree structure for multidimensional data Rong Zhang a , Weining Qian b , Aoying Zhou b,∗ , Minqi Zhou a a Department of Computer Science and Engineering, Fudan University, China b Software Engineering Institute, East China Normal University, China Received 31 August 2007; received in revised form 7 February 2008; accepted 16 February 2008 Available online 10 March 2008 Abstract As one of the most important technologies for implementing large-scale distributed systems, peer-to-peer (P2P) computing has attracted much attention in both research and industrial communities, for its advantages such as high availability, high performance, and high flexibility to the dynamics of networks. However, multidimensional data indexing remains as a big challenge to P2P computing, because of the inefficiency in search and network maintenance caused by the complicated existing index structures, which greatly limits the scalability of applications and dimensionality of the data to be indexed. We propose SDI (Swift tree structure for multidimensional Data Indexing), a swift index scheme with a simple tree structure for multidimensional data indexing in large-scale distributed systems. While keeping the query efficiency in O(log N ) in terms of routing hops, SDI has extremely low maintenance costs which is proved through theoretical analysis. Furthermore, SDI overcomes the root-bottleneck problem existing in most other tree-based distributed indexing systems. Extensive empirical study verifies the superiority of SDI in both query and maintenance performance. c 2008 Elsevier B.V. All rights reserved. Keywords: Multidimensional data; Peer-to-peer (P2P); Point query; Range query; Distributed networks 1. Introduction The improvements in broad-band network connectivity and the storage capability of computers have resulted in increasing demands on data management in large-scale distributed systems. Because of the massive scale of distributed data, it can easily overwhelm the storage and processing capability of any single node. Peer-to-Peer (P2P) systems open an exciting possibility for such data management tasks, due to the advantages they provide, such as high availability, high performance achieved from large-scale parallel processing, and high flexibility to the dynamics of networks. Though current P2P systems have achieved great success in file-sharing and file management with the help of mature tech- nologies such as keyword-based search and one dimensional data indexing, extending P2P technologies to applications with ∗ Corresponding author. E-mail addresses: rongzh@fudan.edu.cn (R. Zhang), wnqian@sei.ecnu.edu.cn (W. Qian), ayzhou@sei.ecnu.edu.cn (A. Zhou), zhouminqi@fudan.edu.cn (M. Zhou). more complicated data management tasks is nontrivial. There are several difficulties in implementing multidimensional data indexing and supporting multidimensional complex/similarity queries, which can be either k-nearest-neighbor (KNN) queries or range queries. More formally, a multidimensional space is a pair M = ( D, d ), where D is the domain objects and d is the distance function. Let Φ ⊆ D be a subset of D indexed by an index structure. Definition 1 (Range Query). R(q , r ) q (q ∈ D) retrieves all elements that within distance r : S ={ p ∈ Φ|d (q , p) ≤ r }. Definition 2 (K-Nearest-Neighbor(KNN) Query). KNN(q , k )q (q ∈ D, k > 0) retrieves a data set KS ⊆ Φ: | KS|= k , ∀x ∈ KS, ∀ y ∈ Φ \ KS, KS ={x ∈ Φ|d (q , x ) ≤ d (q , y )}. For these queries, we shall get a set of data objects that are the most relevant to the search criteria according to some semantic distance function. Almost every existing overlay network protocol underlying the structured P2P systems employs a one-dimensional identifier (or ID, for short) space. The only exception protocol, CAN [22], uses a low dimensional 0167-739X/$ - see front matter c 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2008.02.010