International Journal of Scientific & Engineering Research, Volume 6, Issue 9, September-2015 1082 ISSN 2229-5518 IJSER © 2015 http://www.ijser.org An Arabic Web search engine using grid computing and Artificial Intelligence techniques Mohammed Mahmoud Ibrahim Sakre Abstract—This research is considered as a result of an accumulative work for many years to demonstrate a model for the heavy computational components of any World Wide Web (WWW) search engine. This architecture is based on the grid computing. The crawling load is distributed over a set of computers to retrieve more crawled pages in less time. The proposed architecture of the indexer distributes the indexing load over a set of computers and supports the dynamic indexing to deal with the frequent changes in the web contents. So, the proposed architectures of the crawler and the indexer support the freshness of the web pages. The used freshness technique is considered in the crawler and the indexer where the dynamic indexer is responsible of determining the old pages and sending them to the crawler to revisit them for updating. The Search module is implemented including Arabic morphological analysis/generation and synonym dictionary which are combined to produce an Intelligent Arabic Internet Search module. The use of these linguistic tools is proved, experimentally, to have positive effects on both Precision and Recall measures where the average precision exceeds the value of 0.92. This design is implemented for Arabic language but it suits any other language with language-related modifications. Index TermsGrid Computing, Internet Search Engine, Crawling, Indexing, Artificial Intelligence, Natural Language processing. —————————— —————————— 1 INTRODUCTION cently, the World Wide Web (WWW) is one of the main sources of information for a large number of people. WWW search engines are considered as the mediators between online information and people. WWW search engines require computers with high computation resources for processing to crawl web pages and require huge data storage to store billions of pages collected from the WWW after parsing and indexing these pages. The proposed key for this problem is offered by the use of the Grid Computing. The Grid computing term recognized in the mid of 1990s and it refers to a proposed distributed computing infrastructure [1,2]. The typical design of any search engine consists of three stages in which a Web crawler creates a collection of pages which is indexed and searched. This model, in which operations are executed in strict order: first (Crawling, then indexing as pre-processing phases), and then (searching as a run-time phase) is explained in figure (1). [3,4]. The crawling starts with a set of URLs to fetch their pages and parses them to extract the new URLs exist in these pages. Each extracted URL is either a new discovered URL which should be visited next [4], or an old URL for which the weight of its page should be increased. This will affect the page rank during the searching stage. The indexing stage operates on the pages collected during the crawling stage. It parses the pages and generates the inverted index as has been described in a previous research of the author and others [5]. The searching stage gets answers to the users' queries based on the non-stop words of the query terms. Freshness of the web pages is an important factor that affects the efficiency of the search engine. There are different techniques to keep the web pages up-to-date [6].Page Ranking is the process which estimates the quality of a set of results retrieved by a search engine and presented to the user. Search engines have taken a lot of effort to rank Web objects and to retrieve the correct and desired information contained on the data bases of the WWW. Freshness and Page Ranking topics are considered in the proposed model of this research. The searcher uses the indexed database to find the proper web pages which contains an answer to the user query. The search results are ordered according to their relativity to the query using the page ranking parameters, calculated during the execution of the crawler and the indexer, and presented to the user. The search engine modules of different search engines differ from each other by the way of working. Some search modules use the query words as they keyed in by the users. Another search engine give the user the ability to use Boolean functions. However, more advanced search engines perform some lexical and/or morphological analysis on the keywords like the one presented in this research for Arabic language. There are a number of research groups that have been working in the field of distributed computing. These groups have created middleware, libraries and tools that allow the cooperative use of geographically distributed resources unified to act as a single powerful platform for the execution of parallel and distributed applications. This approach of computing has been known by several names, such as metacomputing, scalable computing, global computing, Internet computing and lately as grid computing [1], [2], [7]. Alchemi system is an open source software toolkits developed at the University of Melbourne, which provides middleware for creating an enterprise grid computing environment. Alchemi consists of two main components are manager and executer. More than one computer runs the executor program and only one computer run a manager R IJSER