Tadhg O’Meara and Ahmed Patel University College Dublin IEEE INTERNET COMPUTING 1089-7801/01/$10.00 ©2001 IEEE http://computer.org/internet/ MARCH • APRIL 2001 27 Search Technologies A Topic-Specific Web Robot Model Based on Restless Bandits W eb search engine design is pri- marily concerned with two distinct processes: ranking and indexing. 1 Ranking returns a list of the most relevant documents in response to a given query. Efficient ranking requires indexing, in which search engines con- struct and maintain a database, or index, of available documents. Document acquisition can follow either a push or pull model. In the push model, publishers submit documents to a search engine for indexing. In the pull model, search engines acquire documents. Web robots—Web crawlers or spiders—acquire documents from Web servers by following hyperlinks. Robots require little or no cooperation from document publishers, and give search engines control over what is indexed. Today, most robots attempt to build an index of all documents on the Web, or of a representative sample. In the future, however, the use of topic-specific Web robots, which automatically build and maintain indexes of topically related Web pages, will increase significantly. In this article, we outline the potential role of topic-specific robots in distributed search engine design, and we model the complex problem of automatically con- structing and maintaining topic-specific Web indexes. Experimental results estab- lish the viability of a topic-specific Web robot design based on the restless bandit model. The results indicate that our pro- posed algorithm is a good foundation on which to build a complete solution. A Distributed Search Architecture Search engine design that can scale with Web growth is a long-standing research goal. Today’s predominant engines (such as AltaVista, Fast, Google, and Inktomi) employ a centralized search architecture. Each provides a ranking service for all queries in the search services market. The ranking, indexing, and database compo- nents of these engines can be distributed across many computers. Efficient distrib- ution is achieved by enabling the ranking and indexing processes to access and con- Constructing and maintaining topic-specific Web indexes is modeled by a restless-bandits generalization and resolved by a reinforcement-learning algorithm.