Q. Li, G. Wang, and L. Feng (Eds.): WAIM 2004, LNCS 3129, pp. 208–217, 2004. © Springer-Verlag Berlin Heidelberg 2004 DHT Based Searching Improved by Sliding Window Shen Huang, Gui-Rong Xue, Xing Zhu, Yan-Feng Ge, and Yong Yu Department of Computer Science and Engineering, Shanghai Jiao Tong University, Huashan Avenue 1954, 200030 Shanghai, P.R.China {Huangshen,grxue,ZhuXing,gyf}@sjtu.edu.cn yyu@cs.sjtu.edu.cn Abstract. Efficient full-text searching is a big challenge in Peer-to-Peer (P2P) system. Recently, Distributed Hash Table (DHT) becomes one of the reliable communication schemes for P2P. Some research efforts perform keyword searching and result intersection on DHT substrate. Two or more search re- quests must be issued for multi-keyword query. This article proposes a Sliding Window improved Multi-keyword Searching method (SWMS) to index and search full-text for short queries on DHT. The main assumptions behind SWMS are: (1) query overhead to do standard inverted list intersection is prohibitive in a distributed P2P system; (2) most of the documents relevant to a multi- keyword query have those keywords appearing near each other. The experimen- tal results demonstrate that our method guarantees the search quality while re- duce the cost of communication. 1 Introduction Peer-to-peer (P2P) system becomes popular in recent years. In such system, DHT [5, 16, 19, 24] efficiently performs object location and application routing in a poten- tially very large overlay network. Normally, the object to be placed or located has an ID. Meanwhile the node where the object is placed also has an ID taken from the same space as the object’s key. When a node receives a query for a key for which it is not responsible, the node routes the query to the neighbor node that makes the most “progress” towards resolving the query. Each node maintains a routing table consist- ing of a small subset of nodes in the system and ensure a relative short route path. One main problem for P2P is how to retrieve global information on the distributed infrastructure. Napster [15] and Gnutella [26] provide good strategies for title-based file retrieval. However, indexing full-text in such environment hasn’t been solved well enough. Some researches [3, 17, 25] performed keyword searching on DHT substrate. All of them adopted storing (term, index) pairs for each term appearing in each document on corresponding node. Such indexing method is a kind of global index defined in [2]. Using global index, multi-keyword searching will generate sev- eral requests and intersect the result sets. Ribeiro-Neto [4] showed that only in a tightly coupled network which has high bandwidth connection, such index will work well. However, such connection is always unavailable in today’s P2P network.