—Stemming is defined as the conflation of all variations of specific words to a single form called the root or stem. Stemming plays a vital role in natural language processing and understanding. As in other languages, there is a need for an effective stemming algorithm for Arabic words. Arabic is a language having a rich and complex morphological word structures and rules. An Arabic stemming algorithm based on morphological rules has been developed, and to enhance its effectiveness, a dictionary of root words is used to determine the right stems. The Arabic stemming algorithm developed by AlOmari is studied and a new algorithm is proposed to enhance the performance. The improvements obtained relate to the order in which the dictionary is lookedup and the order in which the morphological rules are applied. —Stemming, indexing, information retrieval, natural language processing. I. INTRODUCTION ne of the main modules of a document retrieval system is the text processing and indexing of the input documents to obtain the representation of the documents in the form of indexes. These indexes will be the surrogates to the documents and facilitate the process of retrieving relevant documents with respect to the given query. The process of selecting the representation or index terms constitutes a major operation and technique applied in information retrieval systems. Word stemming is one technique normally applied in the indexing process because it helps in reducing the size of the index terms and also proved to help in improving the degree of relevancy in retrieving documents. The stemming process constitutes word morphological analysis based on the language used in order to get the words’ stems to represent the documents as well as to function as indexes to the documents for efficient and effective retrieval. Stemming is defined as the conflation of all variations of specific words to a single form called the root or stem. Stemming algorithms for some languages have been published and applied in building of information retrieval systems, among which for English is the well known Porter’s algorithm Manuscript received April 9, 2011: Revised version received May 28, 2011.. Tengku M. T. Sembok is with the National University of Malaysia, Bangi, Selangor, Malaysia (phone: +60123373539; fax: +60389256732; email: tmts@ftsm.ukm.my). Zainab A. Bakar, is with Universiti Teknologi MARA, Shah Alam, Malaysia.. (email: zainabab@uitm.edu.my). Belal Abu Ata is with Bahrain University, Bahrain (email: belal@yahoo. com). [1], for French we have Savoy’s algorithm [2], and for the Malay language we have Fatimah’s et al. [3]. Stemming techniques play a vital part in the development of a good document retrieval system. The stemming process will reduce the size of the documents representations by 2050% compared to full words representations, according to van Rijsbergen [4]. Furthermore, the relevancy of the retrieved documents will be improved and their number will also be increased. Stemming algorithms for the Arabic language are not widely available and published in journals. The current algorithms reported are either general in nature, or lack in the morphological aspect of getting to the correct Arabic stems. Pioneer works on Arabic stemming have been published by researchers such as Gheith & ElSadany[5], ElSadany & Hashish[6], Saliba & AlDannan [7], Hilal [8], AlKharashi & Even [9], and AlOmari [10]. II. ARABIC LANGUAGE STEMMING ALGORITHM In the previous sections, we mentioned English, Malay and French language stemmers. Approaches adopted by these stemmers are not fully appropriate for the development of Arabic stemmers due to differences in the morphological structures peculiar to each of the languages as well as their semantic differences. The main differences as put forward by ElSadany & Hashish [6] are as follows: i. Arabic is one of Semitic languages which differ in structure of affixes from IndoEuropean type of languages such as English and French; ii. Arabic is mainly roots and templates dependent in the formation of words; iii. Arabic roots consonants might be changed or deleted during the morphological process; Stemmers such as Porter’s are developed mainly to improve the retrieval performance of document retrieval systems. As a result, these stemmers do not make use of dictionary that checks for the correctness of the resulted stems. Whereas, for languages such as Malay, French and Arabic, it will be somehow impossible to develop a stemming algorithm that does not make use of such dictionaries for stems and phrases checking. More precisely, if such stemmers are developed, their accuracy and performance will be low [9]. A Rule and Template Based Stemming Algorithm for Arabic Language Tengku Mohd T. Sembok, Belal Mustafa Abu Ata and Zainab Abu Bakar O INTERNATIONAL JOURNAL OF MATHEMATICAL MODELS AND METHODS IN APPLIED SCIENCES Issue 5, Volume 5, 2011 974