Q. Li, G. Wang, and L. Feng (Eds.): WAIM 2004, LNCS 3129, pp. 208–217, 2004.
© Springer-Verlag Berlin Heidelberg 2004
DHT Based Searching Improved by Sliding Window
Shen Huang, Gui-Rong Xue, Xing Zhu, Yan-Feng Ge, and Yong Yu
Department of Computer Science and Engineering, Shanghai Jiao Tong University,
Huashan Avenue 1954, 200030 Shanghai, P.R.China
{Huangshen,grxue,ZhuXing,gyf}@sjtu.edu.cn
yyu@cs.sjtu.edu.cn
Abstract. Efficient full-text searching is a big challenge in Peer-to-Peer (P2P)
system. Recently, Distributed Hash Table (DHT) becomes one of the reliable
communication schemes for P2P. Some research efforts perform keyword
searching and result intersection on DHT substrate. Two or more search re-
quests must be issued for multi-keyword query. This article proposes a Sliding
Window improved Multi-keyword Searching method (SWMS) to index and
search full-text for short queries on DHT. The main assumptions behind SWMS
are: (1) query overhead to do standard inverted list intersection is prohibitive in
a distributed P2P system; (2) most of the documents relevant to a multi-
keyword query have those keywords appearing near each other. The experimen-
tal results demonstrate that our method guarantees the search quality while re-
duce the cost of communication.
1 Introduction
Peer-to-peer (P2P) system becomes popular in recent years. In such system, DHT [5,
16, 19, 24] efficiently performs object location and application routing in a poten-
tially very large overlay network. Normally, the object to be placed or located has an
ID. Meanwhile the node where the object is placed also has an ID taken from the
same space as the object’s key. When a node receives a query for a key for which it is
not responsible, the node routes the query to the neighbor node that makes the most
“progress” towards resolving the query. Each node maintains a routing table consist-
ing of a small subset of nodes in the system and ensure a relative short route path.
One main problem for P2P is how to retrieve global information on the distributed
infrastructure. Napster [15] and Gnutella [26] provide good strategies for title-based
file retrieval. However, indexing full-text in such environment hasn’t been solved
well enough. Some researches [3, 17, 25] performed keyword searching on DHT
substrate. All of them adopted storing (term, index) pairs for each term appearing in
each document on corresponding node. Such indexing method is a kind of global
index defined in [2]. Using global index, multi-keyword searching will generate sev-
eral requests and intersect the result sets. Ribeiro-Neto [4] showed that only in a
tightly coupled network which has high bandwidth connection, such index will work
well. However, such connection is always unavailable in today’s P2P network.