Agent Space Architecture for Search Engines Ben Choi & Rohit Dhawan Computer Science, College of Science and Engineering Louisiana Tech University, LA 71272, USA pro@BenChoi.org This research was supported in part by a grant from the Center for Entrepreneurship and Information Technology (CEnIT), Louisiana Tech University. Abstract The future of computing is moving from individual processing units to communities of self organizing agents. In this paper we propose a new agent and network based architecture for parallel and distributed computing called Agent Space Architecture. Our architecture builds upon the notions of agent and Object Space and utilizes multicast networks. The building blocks for our proposed architecture consist of an active processing unit called agent, a shared place for communication called space, and a communication medium called multicast network. One unique feature of our architecture is that we extend the concept of Object Space to become an Active Space. Our Active Space functions as a rendezvous, a repository, a cache, a responder, a notifier, and a manager of its own resources. The organization of our architecture is as general as network topology. Any number of agents, spaces, or networks can be added to achieve high performance. It is as scalable as Ethernet and adding agents or spaces is as easy as plug and play. High availability and fault tolerance is achieved through multiple agents, spaces, and networks. All these features are particularly beneficial for challenging applications such as search engine, which is used as a test case to implement and to test our proposed architecture. 1 Introduction Parallel and distributed computing has great potential for exploiting the vast computational power of millions of personal computers all over the world. Although there are successful cases of using large number of PC’s, the current difficulty for scalability is due largely to the highly coupling on the underlying management software and parallel programming interfaces such as Message Passing Interface (MPI). For instance, Google architecture utilizes over 15000 PC’s and continues to adding more for keeping up with the explosive growth of the number of Web pages [2-4]. It also utilizes separated fault tolerance software [4] and MPI, which make management, administration, and configuration of such a large server farm become a major issue. The search engine architecture of Inktomi Corporation [1,7] serves portals such as Yahoo, HotBot, Microsoft MSN, Geocities, NTT “goo” Tokyo. It is a cluster based architecture utilizing RAID arrays [6] and Myrinet [14]. AltaVista, Lycos, and Excite make use of large SMP supercomputers [1] and as such fault tolerance is done through multiple replicated SMP, which results in limited scalability but costly replications. Search engine represents a challenging application for parallel and distributed processing, which is used as a test case to implement and to test our proposed architecture. In this paper we propose a parallel and distributed computing architecture that is highly modular and requires least human intervention. Our architecture builds upon the notions of agent and Object Space and utilizes multicast networks. The building blocks for our proposed architecture consist of an active processing unit called agent, a shared place for communication called space, and a communication medium called multicast network. Multiple building blocks are organized to form a parallel and distributed computing architecture. 2 Related Research Our proposed architecture is built upon an extended notion of Object Space [5]. An Object Space is a shared medium that simply acts as a rendezvous for agents to meet there either to serve or be served without the knowledge of each others identity, location, or specialization. Other variations of Object Space are JavaSpace [5], IBM’s TSpaces [10], TONIC [11], JINI [15], and TupleSpace [12, 16]. Several architectures based on the notion of Object Space have been proposed. One of the proposed architecture [8] utilizes an Object Space as a repository of various roles where agents adapt to changing demands placed on the system by dynamically requesting their behavior from the space. A framework for cluster computing using JavaSpace [5], Object Space for Java, has been described in [9]. Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’04) 0-7695-2101-0/04 $ 20.00 IEEE